Kwasi Ankomah

👤 Speaker

See mentions of this person in podcasts

536 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

already there's like six things that have happened there each of those calls is inference so once that adds up our inference speed starts to make a big difference yeah so that's right that's probably where we big you know heard the biggest um you know kind of praise from our clients is that something that was taking running you know let's say 150 tokens per second on nvidia or like we're running it at like 700 800 tokens as a guy who's running local models on a laptop i just can't

1406.41 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah.

1440.314 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And you can, you can go on Samba Nova cloud and you can see, you can see our kind of, you know, our, our token speeds and that makes a huge difference.

1440.875 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You know, that's the big, you know, we, we, we just did one of our partnerships and they actually showed a video of us versus like naked.

1447.124 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So, you know, they were just kind of, you know, couldn't believe the speed because it makes it in those real time applications, it makes a big difference.

1453.934 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So that's one place I think that we're, we're able to kind of be more, um,

1459.682 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

I would say I think we're able to outperform GPUs in that sense.

1464.95 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

The second is around that kind of model coordination and model bundling.

1470.068 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And what I mean by that is

1475.146 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You don't always need the same model or a huge model for certain tasks.

1477.066 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And to give you an example, if we just stay with that example about the coding agent, right?

1483.193 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So in a GPU or other architectures, you might use the frontier model, which is super expensive and huge for all of those tasks.

1488.159 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, that isn't super efficient, right?

1499.172 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Because, and if you wanted to swap to a different model, you would still have another piece of infrastructure because you can't have this kind of concept of model swapping due to the memory limitations of the chip.

1501.596 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now we, because we're able to

1513.114 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

allow you to swap out models on the fly on the same amount of hardware.

1515.558 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

It means that the efficiency is a lot better and the total cost of ownership, especially when you have the rack is a lot cheaper.

1521.223 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So what I mean by that is, let's say that you're using that coding agent and we want our top level agent to use the funky model because it's doing all the planning, but the model that actually just goes and reads like the code and does like some note taking,

1527.149 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

We can have a much smaller model.

1542.423 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So to give you an example, we have clients who have done this.

1544.245 View full episode →

← Previous Page 13 of 27 Next →

Report any issue