Stefano Ermon

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You know, there are benchmarks.

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So we use benchmarks to see, you know, how well does it know, you know, does it have general knowledge, instruction following, does it make up things?

2240.015 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And we're seeing that it does really well.

2248.79 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

But it's hard to actually precisely quantify or even qualitatively figure out, okay, there are certain things that it doesn't do well or there are certain things where it does better.

2251.257 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

That's actually a pretty hard problem.

2260.806 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like even in...

2264.437 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

from an academic theoretical perspective, kind of like understanding how generalization work, how these models are actually able to combine all of the knowledge that they see in the training data in ways that make sense and ways that don't make sense.

2265.871 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It's widely open.

2280.817 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Nobody really understands how these models work.

2282.059 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so unfortunately, that also means that it's very hard to compare two kinds of models.

2285.926 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, I think fundamentally it is a training problem in the sense that it's, you know, you're fitting a statistical model.

2315.72 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And fortunately, this is a very, you know, it's a very high dimensional space, what we would say.

2322.23 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like there is an extremely large number of possible combinations that you could come up with.

2328.92 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like if you think about all the different sentences that you can generate, it's just like a completely large space.

2335.27 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And no matter how big your training set is,

2340.958 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

it's only ever going to be like a tiny little fraction of all the possible things that you could, all the possible sentences that you could generate.

2343.327 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so what this means is that the model has to essentially, you know, the training data itself will not tell you everything and you have to interpolate and extrapolate.

2351.336 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You have to generalize and nobody actually knows how this model is generalized.

2361.608 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Even in simpler settings, like even if you take

2367.835 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You know, supervised learning, just training a neural network to classify images.

2370.209 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment