Stefano Ermon
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's a very different way of using the models at the inference time.
It's very fast.
It's very fast because the key thing is, you know, in an autoregressive model, you know, you get one token at a time.
You have to process a massive neural network with
hundreds of billions or trillions of parameters, and at the end you only get a single token.
Very inefficient if you think of it that way.
You still need big neural networks, but each forward pass, you need to still evaluate the whole thing, but then at the end you are able to modify more than one thing.
And so as long as you don't need too many denoising steps, too many diffusion steps, this can be really, really fast.
Yeah, so it started, I mean, I've always been passionate about diffusion models.
The kind of like original idea came out from my lab at Stanford back in 2019.
Back then, I
Pretty much all the image generative models were based on GANs, generative adversarial networks, which were very difficult to train, very unstable.
There is this kind of game between a generator and a discriminator.
It's a pretty complex kind of model to train, and there is all kinds of issues in scaling that approach up.
And we came up with this alternative approach of sort of like training the model to remove noise and, and then, uh, kind of like generating in a course defined way.
And, you know, we showed that it was working much better and eventually that took off and everybody kind of switched to diffusion models for image, video generation, mid-journey, SORA, stable diffusion.
They were all based on those original ideas from, from my lab at Stanford.
And since back then, I kind of tried to see, can we get this to work also in text and code generation?
And it took a few years to figure out how to do it properly.
But then in 2024, we kind of had a breakthrough.