Dwarkesh Patel
👤 PersonAppearances Over Time
Podcast Appearances
What is a solution to model collapse?
I mean, there's very naive things you could attempt.
It's just like the distribution over logits should be wider or something.
Like, there's many naive things you could try.
What ends up being the problem with the naive approaches?
In fact, it's actively penalized, right?
If you're like super creative in RL, it's like not good.
And then I think you hinted that it's a very fundamental problem.
It won't be easy to solve.
What's your intuition for that?
How many bits should the optimal core of intelligence end up being if you just had to make a guess?
The thing we put on the von Neumann probes, how big does it have to be?
That's actually surprising that you think it will take a billion, because already we have a billion parameter models, or a couple billion parameter models that are like very intelligent.
Well, some of our models are like a trillion parameters, right?
But they remember so much stuff.
Yeah, but I'm surprised that in 10 years, given the pace, okay, we have GPT-OSS-20b, that's way better than GPT-4 original, which was a trillion plus parameters.
So given that trend, I'm actually surprised you think in 10 years, the cognitive core is still a billion parameters.
Yeah, I'm surprised you're not like, oh, it's going to be like tens of millions or millions.