Andrej Karpathy
π€ SpeakerAppearances Over Time
Podcast Appearances
But it doesn't have esoteric knowledge, you know?
Yeah, I don't know that I have a super strong prediction.
I do think that the labs are just being practical.
They have a flops budget and a cost budget.
And it just turns out that pre-training is not where you want to put most of your flops or your cost.
So that's why the models have gotten smaller, because they are a bit smaller, the pre-training stage is smaller, et cetera, but they make it up in reinforcement learning and all this kind of stuff, mid-training and all this kind of stuff that follows.
So they're just being practical in terms of all the stages and how you get the most bang for the buck.
So I guess like forecasting that trend, I think, is quite hard.
I do still expect that there's so much longing for it.
That's my basic expectation.
Yeah.
And so I have a very wide distribution here.
Probably most part, yeah.
I expect the data sets to get much, much better because when you look at the average data sets, they're extremely terrible.
Like so bad that I don't even know how anything works, to be honest.
Like look at the average example in the training set.
Like factual mistakes, errors, nonsensical things.
Somehow when you do it at scale, the noise washes away and you're left with some of the signal.
um so data sets will improve a ton it's just everything gets better so um our hardware um our all the kernels um all the kernels for running the hardware and maximizing what you get with the hardware you know so nvidia is slowly tuning the actual hardware itself tensor course and so on all that needs to happen and will continue to happen uh all the kernels will get better and utilize the chip to the max extent all the algorithms will probably improve over optimization architecture and just all the modeling components of how everything is done and what the algorithms are that we're even training with
So I do kind of expect like a just very just everything.