Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's going to start using words that are extremely rare.
So it's going to drift too much from the distribution.
So I think controlling the distribution is just like a tricky... It's just like someone just has to...
It's probably not trivial in that sense.
So it's really interesting in the history of the field because at one point everything was very scaling-pilled in terms of like, oh, we're going to make much bigger models, trillions of parameter models.
And actually what the models have done in size is they've gone up and now they've actually kind of like
actually even come down.
State-of-the-art models are smaller.
And even then, I actually think they memorized way too much.
So I think I had a prediction a while back that I almost feel like we can get cognitive cores that are very good at even like a billion, billion parameters.
It should be all very like, like if you talk to a billion parameter model, I think in 20 years, you can actually have a very productive conversation, it thinks.
And it's a lot more like a human.
But if you ask it some factual question, it might have to look it up.
But it knows that it doesn't know and it might have to look it up and it will just do all the reasonable things.
No, because I basically think that the training data is, so here's the issue.
The training data is the internet, which is really terrible.
So there's a huge amount of gains to be made because the internet is terrible.
Like if you actually, and even the internet, when you and I think of the internet, you're thinking of like a Wall Street Journal or that's not what this is.
When you're actually looking at a pre-training data set in the front of your lab and you look at a random internet document, it's total garbage.
Like I don't even know how this works at all.