Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
They will recite passages from all these training sources.
You can give them completely nonsensical data, like you can hash some amount of text or something like that.
You get a completely random sequence.
If you train on it, even just, I think, a single iteration or two, it can suddenly regurgitate the entire thing.
It will memorize it.
There's no way a person can read a single sequence of random numbers and recite it to you.
And that's a feature, not a bug almost, because it forces you to like only learn the generalizable components.
Whereas LLMs are distracted by all the memory that they have of the pre-trained documents.
And it's probably very distracting to them in a certain sense.
So that's why when I talk about the cognitive core, I actually want to remove the memory, which is what we talked about.
I'd love to have less memory so that they have to look things up.
And they only maintain the algorithms for like thought and the idea of an experiment and all this cognitive glue of acting.
I'm not sure.
I think it's almost like a separate axis.
It's almost like the models are way too good at memorization and somehow we should remove that.
And I think people are much worse, but it's a good thing.
Yeah, I think that's a great question.
I mean, you can imagine having a regularization for entropy and things like that.
I guess they just don't work as well empirically because right now, like, the models are collapsed.
But I will say...