Stefano Ermon

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Nobody understands how these models generalize.

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Here we're talking about something even more complicated where you're not just putting a binary label or a thousand classes in ImageNet, but you have like an extremely large, combinatorially large space of outputs.

2377.78 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And nobody understands how generalization works in that space.

2390.178 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

I mean, I started working on this more than 10 years ago, and I thought it was going to keep me busy for my whole career because it's such a hard problem.

2400.244 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And, you know, it almost feels impossible for this to work.

2410.256 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

There's just so many combinations.

2416.784 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Even if you think about fitting, let's say, a model or images,

2418.666 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

right, there is so many different kinds of images and different kinds of combinations of features, right?

2423.517 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so let's say you train a model or a data set where there is red cars and blue buses and red buses, then should the model generate a blue car, right?

2428.582 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And it's now clear, right?

2440.014 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah.

2441.795 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

But fundamentally, that's what these models do.

2443.137 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And it's, of course, much more complicated because there's many more different things and many different shapes and colors and some combinations make sense, some don't.

2445.8 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And they're able to do it the right way, but it's just a very, very hard problem.

2453.69 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

not too much so if you think about the quality metrics uh they are often very similar so we can use the same benchmarks uh as we bent lcb or you know if eva like a lot of the existing benchmarks that people use to test you know how good are the models uh doing various things that we care about following instructions writing code for us software engineering like

2525.324 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So we can still essentially test end-to-end.

2548.105 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You know, we just feed the same prompt to our model.

2551.191 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

We look at the answer and then we see how well it does at the task.