Andrej Karpathy
π€ SpeakerAppearances Over Time
Podcast Appearances
And then, so I guess what I'm saying is like we need
intelligent models to help us refine even the pre-training set to just narrow it down to the cognitive components.
And then I think you get away with a much smaller model because it's a much better data set and you could train it on it.
But probably it's not trained directly on it.
It's probably distilled for a much better model still.
I just feel like distillation works extremely well.
So almost every small model, if you have a small model, it's almost certainly distilled.
I mean, come on, right?
I don't know.
At some point, it should take at least a billion knobs to do something interesting.
You're thinking it should be even smaller?
I mean, I almost feel like I'm already contrarian by talking about a billion-parameter cognitive core, and you're outdoing me.
I think, yeah, maybe we could get a little bit smaller.
I mean, I still think that there should be enough.
Yeah, maybe it can be smaller.
I do think that, practically speaking, you want the model to have some knowledge.
You don't want it to be looking up everything.
Because then you can't think in your head.
You're looking up way too much stuff all the time.
So I do think it needs to be some basic curriculum needs to be there for knowledge.