Stefano Ermon
๐ค SpeakerAppearances Over Time
Podcast Appearances
Nobody understands how these models generalize.
Here we're talking about something even more complicated where you're not just putting a binary label or a thousand classes in ImageNet, but you have like an extremely large, combinatorially large space of outputs.
And nobody understands how generalization works in that space.
I mean, I started working on this more than 10 years ago, and I thought it was going to keep me busy for my whole career because it's such a hard problem.
And, you know, it almost feels impossible for this to work.
There's just so many combinations.
Even if you think about fitting, let's say, a model or images,
right, there is so many different kinds of images and different kinds of combinations of features, right?
And so let's say you train a model or a data set where there is red cars and blue buses and red buses, then should the model generate a blue car, right?
And it's now clear, right?
Yeah.
But fundamentally, that's what these models do.
And it's, of course, much more complicated because there's many more different things and many different shapes and colors and some combinations make sense, some don't.
And they're able to do it the right way, but it's just a very, very hard problem.
not too much so if you think about the quality metrics uh they are often very similar so we can use the same benchmarks uh as we bent lcb or you know if eva like a lot of the existing benchmarks that people use to test you know how good are the models uh doing various things that we care about following instructions writing code for us software engineering like
So we can still essentially test end-to-end.
You know, we just feed the same prompt to our model.
We look at the answer and then we see how well it does at the task.
And that's a very useful way of measuring, you know, how useful the models are.
There are other metrics that are specific to diffusion models, like how many...