Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Stefano Ermon

๐Ÿ‘ค Speaker
359 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, I think it changes a little bit.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

the design space in terms of how many flops you have access to and how memory bound you are, but not fundamentally.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You still need to process and you still need to be able to look at all the context, all the past information to be able to generate good quality answers, whether you do them one token at a time or you do them in parallel.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

kind of have to look at the past.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so there is something pretty fundamental there.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, so essentially there is an element of error correction and sort of like the models are trained to fix mistakes and then they

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

they constantly revise the answer.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so initially the answers are not coherent and then they get increasingly better as you throw essentially more compute.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

You can also think of it as another dimension over which you can scale compute at test time.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So test time inference, test time compute to trade off quality for speed or cost.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so that's kind of like the fundamental trade-off that is exposed by a diffusion language model.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It just provides you an axis to control quality versus speed.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And there is a fundamental trade-off.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Like if you want to be, you know, if you don't want to do too many denoising steps, too many passes over the output, the quality is not going to be as good as what you would get if you refine it many, many, many times.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

That's exactly the same.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Exactly, exactly.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So, you know, even in the context of image and video generation, you can usually control the number of denoising steps.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

The more denoising steps you take, the higher the quality, but of course, the more expensive it becomes, the more time it takes.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

So it's related in the sense that it needs to output something.

The Neuron: AI Explained
Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And then if there is a tool call involved, then you need to essentially wait until the result of the tool call comes back.