Stefano Ermon

You still need big neural networks, but each forward pass, you need to still evaluate the whole thing, but then at the end you are able to modify more than one thing.

369.888 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And so as long as you don't need too many denoising steps, too many diffusion steps, this can be really, really fast.

381.02 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Yeah, so it started, I mean, I've always been passionate about diffusion models.

403.639 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

The kind of like original idea came out from my lab at Stanford back in 2019.

407.886 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Back then, I

413.836 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

Pretty much all the image generative models were based on GANs, generative adversarial networks, which were very difficult to train, very unstable.

415.46 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

There is this kind of game between a generator and a discriminator.

426.335 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

It's a pretty complex kind of model to train, and there is all kinds of issues in scaling that approach up.

430.201 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And we came up with this alternative approach of sort of like training the model to remove noise and, and then, uh, kind of like generating in a course defined way.

436.908 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And, you know, we showed that it was working much better and eventually that took off and everybody kind of switched to diffusion models for image, video generation, mid-journey, SORA, stable diffusion.

445.381 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

They were all based on those original ideas from, from my lab at Stanford.

457.9 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And since back then, I kind of tried to see, can we get this to work also in text and code generation?

462.541 View full episode →

The Neuron: AI Explained

Diffusion for Text: Why Mercury Could Make LLMs 10x Faster

And it took a few years to figure out how to do it properly.