Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Kwasi Ankomah

👤 Person
536 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And that's a really interesting area as well.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So, yeah, that's where we're trying to focus on at the moment.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah, so inference is coming from the word to infer.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So it's the model going along and then making a prediction of some sort.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So it's taking your input and then it's basically doing the thing that large language models do, which is the next token.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And that is the actual process of inference.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

It goes in, it runs through the model and we get an output and that output

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

keeps going and essentially all you're all that you're doing is that we have a model that has already been kind of put on some sort of architecture and then we are basically giving you the answer or the next token and then of course as you see that stream the next token then goes back in and we get the prediction based on the next token as well so that in a nutshell it's essentially giving you we have a model that's already been trained and we are just giving you the output of that model yeah

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah, so I think inference speeds kind of...

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

kind of directly dictates how the user interacts with the application, right?

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So we've all been there on, you know, your favorite chat application, be that what it may.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And when you kind of press that button to inference, right?

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So when I've talked about inference, you know, you're making a pass through the model and you're getting output at the end.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now that pass can,

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

take a long time depending on the size of the model, the amount of parameters and the hardware that it's running on.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, if there is a big latency with a real-time application, that does become an issue.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And we've seen that for, you know, if you try to run certain models on certain architecture, you can have, you know, a time to first token.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

That's when the first, you know, the time to first token arriving at the user screen, you know, between kind of like 20 and 30 seconds.

The Neuron: AI Explained
AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You know, if you're in a real world production system,