Dwarkesh Patel
๐ค PersonAppearances Over Time
Podcast Appearances
That's actually surprising that you think it will take a billion, because already we have a billion parameter models, or a couple billion parameter models that are like very intelligent.
Well, some of our models are like a trillion parameters, right?
But they remember so much stuff.
Yeah, but I'm surprised that in 10 years, given the pace, okay, we have GPT-OSS-20b, that's way better than GPT-4 original, which was a trillion plus parameters.
Yeah.
So given that trend, I'm actually surprised you think in 10 years, the cognitive core is still a billion parameters.
Yeah, I'm surprised you're not like, oh, it's going to be like tens of millions or millions.
But why is the distilled version still a billion?
Is I guess the thing I'm curious about.
Why would you train on... Right, no, no, but why is the distillation in 10 years not getting below 1 billion?
Oh, you think it should be smaller than a billion?
Yeah, I mean, just like if you look at the trend over the last few years, just finding low-hanging fruit and going from like trillion-plus models that are like literally two orders of magnitude smaller in a matter of two years and having better performance.
It makes me think the sort of core of intelligence might be even way, way smaller.
Like plenty of room at the bottom, to paraphrase Feynman.
Yeah.
So we're discussing what, like, plausibly could be the cognitive core.
There's a separate question, which is, what will actually be the size of furniture models over time?
And I'm curious to have a prediction.
So we had increasing scale up to maybe 4.5, and now we're seeing decreasing slash plateauing scale.
There's many reasons that could be going on.