Stefano Ermon
๐ค SpeakerAppearances Over Time
Podcast Appearances
So diffusion is a type of generative AI model.
It's the kind of model that is commonly used to generate images, video, music.
And you're probably familiar with the, you know, ChatGPTs or Geminis or Cloud, where you kind of like see the models generate text, kind of like left to right, one token at a time.
A diffusion model works very differently in the sense that it generates the full object from the beginning, and then it refines it by kind of like
fixing mistakes, making it sharper, making it look better and better.
And it's a very different kind of like solution that it's more parallel in the sense that the neural network is able to modify many components of the image or the text at the same time.
And that's why diffusion models tend to be a lot faster than traditional autoregressive models that kind of like work left to right one token at a time.
Yeah.
That's a great question.
Yeah, that's a great question.
And it really stems from the way the models are trained.
A traditional autoregressive model, like a GPT model, is trained to, there is a neural network and it's trained to predict the next token, the next word.
And that's how you use it at inference time.
You give it a question and then it will try to predict the answer left to right, one token at a time.
The fusion language model, it's trained to remove mistakes, fix mistakes.
So you kind of like start with clean text or clean code.
You artificially add mistakes and then you train the model to fix those mistakes.
And that's how the model is also used at inference time.
You start with kind of like a full answer and then you refine it.
And so it's a very different way of training the models.