Dwarkesh Patel
👤 PersonAppearances Over Time
Podcast Appearances
But why is the distilled version still a billion?
Is I guess the thing I'm curious about.
Why would you train on... Right, no, no, but why is the distillation in 10 years not getting below 1 billion?
Oh, you think it should be smaller than a billion?
Yeah, I mean, just like if you look at the trend over the last few years, just finding low-hanging fruit and going from like trillion-plus models that are like literally two orders of magnitude smaller in a matter of two years and having better performance.
It makes me think the sort of core of intelligence might be even way, way smaller.
Like plenty of room at the bottom, to paraphrase Feynman.
So we're discussing what, like, plausibly could be the cognitive core.
There's a separate question, which is, what will actually be the size of furniture models over time?
And I'm curious to have a prediction.
So we had increasing scale up to maybe 4.5, and now we're seeing decreasing slash plateauing scale.
There's many reasons that could be going on.
But do you have a prediction about going forward?
Will the biggest models be bigger?
Will they be smaller?
Will they be the same?
Do you think they're looking for it to be similar in kind to the kinds of things that have been happening over the last two to five years?
Like just in terms of like if I look at Nano Chat versus Nano GPT and then the architectural tweaks you made, is that basically like the flavor of things you continue to keep happening?
Or is there – you're not expecting any giant paradigm shift?