Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Kwasi Ankomah

👤 Person
536 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

certain applications, not only do they not have a latency budget, but they're like, they're almost like latency critical applications, right?

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So voice is one of those applications where it simply doesn't work.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You know, if the latency isn't there.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So I think other things you can say, oh, it works.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

It's just a bit stuck.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Exactly that, Gauri.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You've got someone who's, and then you go get your lunch and you get a response.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, that just doesn't work.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So you've got these kind of new crop of applications that simply cannot have high latency.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And that's what, for me, the three key kind of differences are if that makes sense.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah, they really do.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And one of the things that we see a lot is as you started AI projects, you tended to start them on these kind of huge models, right?

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And they'd be like a Claude or like a GPT-4 or something like that.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, as you say,

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

If you run it for like a proof of concept, then this is completely fine, you know the cost, but as you scale that out to many users, it becomes a huge, huge cost.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And as you're serving tokens, what we're seeing again, this is the kind of the agentic stuff we'll get into is that

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Old chat applications used to serve maybe 1,000 to 2,000 tokens, or that's what they needed.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now these new agentic era of applications, we're seeing like 10x in that.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So that's what I think has caused this real focus, like this narrow focus on like, hey, come on, do we need this model?