Stefano Ermon

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, I think it changes a little bit.

1186.144 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

the design space in terms of how many flops you have access to and how memory bound you are, but not fundamentally.

1190.157 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You still need to process and you still need to be able to look at all the context, all the past information to be able to generate good quality answers, whether you do them one token at a time or you do them in parallel.

1202.355 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

kind of have to look at the past.

1216.735 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so there is something pretty fundamental there.

1219.018 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, so essentially there is an element of error correction and sort of like the models are trained to fix mistakes and then they

1250.415 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

they constantly revise the answer.

1259.277 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so initially the answers are not coherent and then they get increasingly better as you throw essentially more compute.

1262.161 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You can also think of it as another dimension over which you can scale compute at test time.

1271.915 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So test time inference, test time compute to trade off quality for speed or cost.

1278.484 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so that's kind of like the fundamental trade-off that is exposed by a diffusion language model.

1284.657 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It just provides you an axis to control quality versus speed.

1292.007 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And there is a fundamental trade-off.

1297.375 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like if you want to be, you know, if you don't want to do too many denoising steps, too many passes over the output, the quality is not going to be as good as what you would get if you refine it many, many, many times.

1298.837 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

That's exactly the same.

1320.491 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Exactly, exactly.

1321.473 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So, you know, even in the context of image and video generation, you can usually control the number of denoising steps.