Tim Davis
๐ค SpeakerAppearances Over Time
Podcast Appearances
And so the challenge they had was as you scale throughput, the latency explodes.
So the problem is as more and more users come onto their system, those concurrent requests end up
Each one ends up getting higher and higher on the latency threshold.
And so, you know, it's really interesting.
There's a couple other customers that I can't name yet, but there was this really interesting user observation in that, right?
And what the user observation is, irrespective of how intelligent a model is, if the latency threshold goes up, you know, an everyday person interacting with AI thinks the model is dumber.
Right?
And so what that means is that as a consumer of technology, you're sitting there and you're like, you know, if it's a text-to-speech model, you're saying, you know, maybe it's a nursing agent or a healthcare agent or there's something going on, right?
And you say, it rings you up or it does whatever and it says, hey, you know, how are you feeling today?
And if you say, oh, well, you know, I'm not actually feeling that great.
And then 10 seconds later, the model goes...
Well, that's no good.
You're like, what is going on with the person I'm speaking to?
sort of tries to stop and then you're like, oh, you know, even in our conversation and, you know, I apologize anyway, if I, I interrupt you guys, but if, if, you know, you go to talk and then I go to talk, I'm like, oh no, Grant, you go, or no, Corey, you go.
And so there's this weird, uh, back and forth interaction.
Yeah.
The internet's still a factor.
And so it's wonderful to hear those stories because at the end of the day, like, you know, infrastructure is a tool and you want to understand, but how is this actually impacting, you know, what you're trying to build and really understand what is the application, what is the end state?
And so for them, using our infrastructure on, you know, they were deploying on Blackwell machines, which is, you know, incredible piece of silicon from NVIDIA.
We were actually able, you know, for them to 4X the performance over VLLM