Kwasi Ankomah

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

certain applications, not only do they not have a latency budget, but they're like, they're almost like latency critical applications, right?

364.345 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So voice is one of those applications where it simply doesn't work.

371.501 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You know, if the latency isn't there.

376.632 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So I think other things you can say, oh, it works.

378.877 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

It's just a bit stuck.

380.701 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah.

381.903 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Exactly that, Gauri.

382.705 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You've got someone who's, and then you go get your lunch and you get a response.

384.427 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, that just doesn't work.

387.63 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So you've got these kind of new crop of applications that simply cannot have high latency.

389.192 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And that's what, for me, the three key kind of differences are if that makes sense.

396.42 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah, they really do.

441.98 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And one of the things that we see a lot is as you started AI projects, you tended to start them on these kind of huge models, right?

442.861 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And they'd be like a Claude or like a GPT-4 or something like that.

454.577 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, as you say,

461.287 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

If you run it for like a proof of concept, then this is completely fine, you know the cost, but as you scale that out to many users, it becomes a huge, huge cost.

464.684 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And as you're serving tokens, what we're seeing again, this is the kind of the agentic stuff we'll get into is that

474.408 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Old chat applications used to serve maybe 1,000 to 2,000 tokens, or that's what they needed.

483.398 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now these new agentic era of applications, we're seeing like 10x in that.

488.543 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So that's what I think has caused this real focus, like this narrow focus on like, hey, come on, do we need this model?

493.168 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment