Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Kwasi Ankomah

👤 Person
536 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

already there's like six things that have happened there each of those calls is inference so once that adds up our inference speed starts to make a big difference yeah so that's right that's probably where we big you know heard the biggest um you know kind of praise from our clients is that something that was taking running you know let's say 150 tokens per second on nvidia or like we're running it at like 700 800 tokens as a guy who's running local models on a laptop i just can't

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And you can, you can go on Samba Nova cloud and you can see, you can see our kind of, you know, our, our token speeds and that makes a huge difference.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You know, that's the big, you know, we, we, we just did one of our partnerships and they actually showed a video of us versus like naked.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So, you know, they were just kind of, you know, couldn't believe the speed because it makes it in those real time applications, it makes a big difference.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So that's one place I think that we're, we're able to kind of be more, um,

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

I would say I think we're able to outperform GPUs in that sense.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

The second is around that kind of model coordination and model bundling.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And what I mean by that is

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You don't always need the same model or a huge model for certain tasks.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And to give you an example, if we just stay with that example about the coding agent, right?

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So in a GPU or other architectures, you might use the frontier model, which is super expensive and huge for all of those tasks.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, that isn't super efficient, right?

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Because, and if you wanted to swap to a different model, you would still have another piece of infrastructure because you can't have this kind of concept of model swapping due to the memory limitations of the chip.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now we, because we're able to

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

allow you to swap out models on the fly on the same amount of hardware.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

It means that the efficiency is a lot better and the total cost of ownership, especially when you have the rack is a lot cheaper.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So what I mean by that is, let's say that you're using that coding agent and we want our top level agent to use the funky model because it's doing all the planning, but the model that actually just goes and reads like the code and does like some note taking,

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

We can have a much smaller model.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So to give you an example, we have clients who have done this.