Kwasi Ankomah
👤 PersonAppearances Over Time
Podcast Appearances
certain applications, not only do they not have a latency budget, but they're like, they're almost like latency critical applications, right?
So voice is one of those applications where it simply doesn't work.
You know, if the latency isn't there.
So I think other things you can say, oh, it works.
It's just a bit stuck.
Yeah.
Exactly that, Gauri.
You've got someone who's, and then you go get your lunch and you get a response.
Now, that just doesn't work.
So you've got these kind of new crop of applications that simply cannot have high latency.
And that's what, for me, the three key kind of differences are if that makes sense.
Yeah, they really do.
And one of the things that we see a lot is as you started AI projects, you tended to start them on these kind of huge models, right?
And they'd be like a Claude or like a GPT-4 or something like that.
Now, as you say,
If you run it for like a proof of concept, then this is completely fine, you know the cost, but as you scale that out to many users, it becomes a huge, huge cost.
And as you're serving tokens, what we're seeing again, this is the kind of the agentic stuff we'll get into is that
Old chat applications used to serve maybe 1,000 to 2,000 tokens, or that's what they needed.
Now these new agentic era of applications, we're seeing like 10x in that.
So that's what I think has caused this real focus, like this narrow focus on like, hey, come on, do we need this model?