Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
Yeah, maybe it can be smaller.
I do think that, practically speaking, you want the model to have some knowledge.
You don't want it to be looking up everything.
Because then you can't think in your head.
You're looking up way too much stuff all the time.
So I do think it needs to be some basic curriculum needs to be there for knowledge.
But it doesn't have esoteric knowledge, you know?
Yeah, I don't know that I have a super strong prediction.
I do think that the labs are just being practical.
They have a flops budget and a cost budget.
And it just turns out that pre-training is not where you want to put most of your flops or your cost.
So that's why the models have gotten smaller, because they are a bit smaller, the pre-training stage is smaller, et cetera, but they make it up in reinforcement learning and all this kind of stuff, mid-training and all this kind of stuff that follows.
So they're just being practical in terms of all the stages and how you get the most bang for the buck.
So I guess like forecasting that trend, I think, is quite hard.
I do still expect that there's so much longing for it.
That's my basic expectation.
Yeah.
And so I have a very wide distribution here.
Probably most part, yeah.
I expect the data sets to get much, much better because when you look at the average data sets, they're extremely terrible.