Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stefano Ermon

๐Ÿ‘ค Speaker
359 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You know, there are benchmarks.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So we use benchmarks to see, you know, how well does it know, you know, does it have general knowledge, instruction following, does it make up things?

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And we're seeing that it does really well.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

But it's hard to actually precisely quantify or even qualitatively figure out, okay, there are certain things that it doesn't do well or there are certain things where it does better.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

That's actually a pretty hard problem.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like even in...

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

from an academic theoretical perspective, kind of like understanding how generalization work, how these models are actually able to combine all of the knowledge that they see in the training data in ways that make sense and ways that don't make sense.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It's widely open.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Nobody really understands how these models work.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so unfortunately, that also means that it's very hard to compare two kinds of models.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, I think fundamentally it is a training problem in the sense that it's, you know, you're fitting a statistical model.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And fortunately, this is a very, you know, it's a very high dimensional space, what we would say.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like there is an extremely large number of possible combinations that you could come up with.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like if you think about all the different sentences that you can generate, it's just like a completely large space.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And no matter how big your training set is,

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

it's only ever going to be like a tiny little fraction of all the possible things that you could, all the possible sentences that you could generate.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so what this means is that the model has to essentially, you know, the training data itself will not tell you everything and you have to interpolate and extrapolate.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You have to generalize and nobody actually knows how this model is generalized.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Even in simpler settings, like even if you take

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You know, supervised learning, just training a neural network to classify images.