Francois Chollet
👤 SpeakerAppearances Over Time
Podcast Appearances
And it doesn't mean that you're doing anything other than memorization, but you're doing memorization plus regularization.
Right.
AKA generalization.
Yeah.
And that leads absolutely, that leads to generalization.
That's correct.
That's correct.
And, you know, LLMs, they're not infinitely large.
They have only a fixed number of parameters, and so they have to compress their knowledge as much as possible.
And in practice, LLMs are mostly storing reusable bits of programs, like vector programs.
And because they have this need for compression, it means that every time they're learning a new program, they're going to try to express it in terms of existing bits and pieces of programs that they've already learned before.
Absolutely.
Oh, wait, so... This is why, you know, clearly LLMs have some degree of generalization.
Yeah.
And this is precisely why.
It's because they have to compress.
It's intrinsically limited because the substrate of your model is a big parametric curve.
And all you can do with this is local generalization.
If you want to go beyond this towards broader or even extreme generalization, you have to move to a different type of model.
And my paradigm of choice is discrete program search, program synthesis.