Stefano Ermon
๐ค SpeakerAppearances Over Time
Podcast Appearances
You know, there are benchmarks.
So we use benchmarks to see, you know, how well does it know, you know, does it have general knowledge, instruction following, does it make up things?
And we're seeing that it does really well.
But it's hard to actually precisely quantify or even qualitatively figure out, okay, there are certain things that it doesn't do well or there are certain things where it does better.
That's actually a pretty hard problem.
Like even in...
from an academic theoretical perspective, kind of like understanding how generalization work, how these models are actually able to combine all of the knowledge that they see in the training data in ways that make sense and ways that don't make sense.
It's widely open.
Nobody really understands how these models work.
And so unfortunately, that also means that it's very hard to compare two kinds of models.
Yeah, I think fundamentally it is a training problem in the sense that it's, you know, you're fitting a statistical model.
And fortunately, this is a very, you know, it's a very high dimensional space, what we would say.
Like there is an extremely large number of possible combinations that you could come up with.
Like if you think about all the different sentences that you can generate, it's just like a completely large space.
And no matter how big your training set is,
it's only ever going to be like a tiny little fraction of all the possible things that you could, all the possible sentences that you could generate.
And so what this means is that the model has to essentially, you know, the training data itself will not tell you everything and you have to interpolate and extrapolate.
You have to generalize and nobody actually knows how this model is generalized.
Even in simpler settings, like even if you take
You know, supervised learning, just training a neural network to classify images.