Andrej Karpathy
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's some like stock ticker symbols.
It's a huge amount of slop and garbage from like all the corners of the internet.
It's not like your Wall Street Journal article that's extremely rare.
So I almost feel like because the internet is so terrible, we actually have to sort of like build really big models to compress all that.
Most of that compression is memory work instead of like cognitive work.
But what we really want is the cognitive part to actually delete the memory.
And then, so I guess what I'm saying is like we need
intelligent models to help us refine even the pre-training set to just narrow it down to the cognitive components.
And then I think you get away with a much smaller model because it's a much better data set and you could train it on it.
But probably it's not trained directly on it.
It's probably distilled for a much better model still.
I just feel like distillation works extremely well.
So almost every small model, if you have a small model, it's almost certainly distilled.
I mean, come on, right?
I don't know.
At some point, it should take at least a billion knobs to do something interesting.
You're thinking it should be even smaller?
I mean, I almost feel like I'm already contrarian by talking about a billion-parameter cognitive core, and you're outdoing me.
I think, yeah, maybe we could get a little bit smaller.
I mean, I still think that there should be enough.