Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
actually even come down.
State-of-the-art models are smaller.
And even then, I actually think they memorized way too much.
So I think I had a prediction a while back that I almost feel like we can get cognitive cores that are very good at even like a billion, billion parameters.
It should be all very like, like if you talk to a billion parameter model, I think in 20 years, you can actually have a very productive conversation, it thinks.
And it's a lot more like a human.
But if you ask it some factual question, it might have to look it up.
But it knows that it doesn't know and it might have to look it up and it will just do all the reasonable things.
No, because I basically think that the training data is, so here's the issue.
The training data is the internet, which is really terrible.
So there's a huge amount of gains to be made because the internet is terrible.
Like if you actually, and even the internet, when you and I think of the internet, you're thinking of like a Wall Street Journal or that's not what this is.
When you're actually looking at a pre-training data set in the front of your lab and you look at a random internet document, it's total garbage.
Like I don't even know how this works at all.
It's some like stock ticker symbols.
It's a huge amount of slop and garbage from like all the corners of the internet.
It's not like your Wall Street Journal article that's extremely rare.
So I almost feel like because the internet is so terrible, we actually have to sort of like build really big models to compress all that.
Most of that compression is memory work instead of like cognitive work.
But what we really want is the cognitive part to actually delete the memory.