Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
And that's a feature, not a bug almost, because it forces you to like only learn the generalizable components.
Whereas LLMs are distracted by all the memory that they have of the pre-trained documents.
And it's probably very distracting to them in a certain sense.
So that's why when I talk about the cognitive core, I actually want to remove the memory, which is what we talked about.
I'd love to have less memory so that they have to look things up.
And they only maintain the algorithms for like thought and the idea of an experiment and all this cognitive glue of acting.
I'm not sure.
I think it's almost like a separate axis.
It's almost like the models are way too good at memorization and somehow we should remove that.
And I think people are much worse, but it's a good thing.
Yeah, I think that's a great question.
I mean, you can imagine having a regularization for entropy and things like that.
I guess they just don't work as well empirically because right now, like, the models are collapsed.
But I will say...
Most of the tasks that we want of them don't actually demand the diversity.
It's probably the answer of what's going on.
And so it's just that the frontier labs are trying to make the models useful.
And I kind of just feel like the diversity of the outputs is not so much.
Number one, it's much harder to work with and evaluate and all this kind of stuff.
But maybe it's not what's actually capturing most of the value.