Kwasi Ankomah

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And again, at a small scale, maybe not a huge problem.

905.161 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

At a big scale, with thousands of compute units, that does begin to add up.

909.331 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So that...

914.482 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

That innovation is big.

915.685 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And the reason that's big is it allows us to store bigger models.

917.128 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So to give you an example of this, the DeepSeek models, so that would be DeepSeek R1 and DeepSeek V3, they're 670 billion parameters, right?

920.696 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Absolutely huge.

932.762 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, a lot of providers don't serve that model.

933.864 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

They can't physically.

936.927 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And we can, basically because of the way that we've architected our chip.

938.369 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And again, the folks that started the company, they had all of this in mind when they designed the chip.

942.174 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So that's the big thing.

948.321 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

It allows us to run very large models.

949.363 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, the second thing that that allows us to do is it allows us to run many models.

951.886 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So that DDR bit allows us to store.

957.653 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So if you imagine that you, you know, a GPU or alternative architecture could only store like this one model.

960.24 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And in order for you to get another model, you need another unit of computing, another kind of, you know, let's call it a node.

966.458 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, because of our kind of large DDR, it allows us to kind of store these other models so you can switch.

972.655 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And this becomes super important for agentic applications because you might have an application that maybe uses the GPT-OSS model that we're running at the moment, or it might use a Lama 8B, but we can have those on the same node.

977.941 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So your inference and hardware cost stays flat because we can go and get the model.

991.597 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment