Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stefano Ermon

๐Ÿ‘ค Speaker
359 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Nobody understands how these models generalize.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Here we're talking about something even more complicated where you're not just putting a binary label or a thousand classes in ImageNet, but you have like an extremely large, combinatorially large space of outputs.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And nobody understands how generalization works in that space.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

I mean, I started working on this more than 10 years ago, and I thought it was going to keep me busy for my whole career because it's such a hard problem.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And, you know, it almost feels impossible for this to work.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

There's just so many combinations.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Even if you think about fitting, let's say, a model or images,

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

right, there is so many different kinds of images and different kinds of combinations of features, right?

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so let's say you train a model or a data set where there is red cars and blue buses and red buses, then should the model generate a blue car, right?

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And it's now clear, right?

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

But fundamentally, that's what these models do.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And it's, of course, much more complicated because there's many more different things and many different shapes and colors and some combinations make sense, some don't.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And they're able to do it the right way, but it's just a very, very hard problem.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

not too much so if you think about the quality metrics uh they are often very similar so we can use the same benchmarks uh as we bent lcb or you know if eva like a lot of the existing benchmarks that people use to test you know how good are the models uh doing various things that we care about following instructions writing code for us software engineering like

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So we can still essentially test end-to-end.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You know, we just feed the same prompt to our model.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

We look at the answer and then we see how well it does at the task.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And that's a very useful way of measuring, you know, how useful the models are.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

There are other metrics that are specific to diffusion models, like how many...