Kwasi Ankomah

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And that's a really interesting area as well.

106.176 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So, yeah, that's where we're trying to focus on at the moment.

108.359 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah.

111.303 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah, so inference is coming from the word to infer.

131.437 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So it's the model going along and then making a prediction of some sort.

135.725 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So it's taking your input and then it's basically doing the thing that large language models do, which is the next token.

139.914 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And that is the actual process of inference.

145.866 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

It goes in, it runs through the model and we get an output and that output

149.413 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

keeps going and essentially all you're all that you're doing is that we have a model that has already been kind of put on some sort of architecture and then we are basically giving you the answer or the next token and then of course as you see that stream the next token then goes back in and we get the prediction based on the next token as well so that in a nutshell it's essentially giving you we have a model that's already been trained and we are just giving you the output of that model yeah

153.661 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Yeah, so I think inference speeds kind of...

197.507 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

kind of directly dictates how the user interacts with the application, right?

204.556 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So we've all been there on, you know, your favorite chat application, be that what it may.

208.382 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And when you kind of press that button to inference, right?

213.289 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

So when I've talked about inference, you know, you're making a pass through the model and you're getting output at the end.

216.533 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now that pass can,

221 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

take a long time depending on the size of the model, the amount of parameters and the hardware that it's running on.

222.562 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

Now, if there is a big latency with a real-time application, that does become an issue.

228.127 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

And we've seen that for, you know, if you try to run certain models on certain architecture, you can have, you know, a time to first token.

234.092 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

That's when the first, you know, the time to first token arriving at the user screen, you know, between kind of like 20 and 30 seconds.

242.039 View full episode →

The Neuron: AI Explained

AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

You know, if you're in a real world production system,

249.085 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment