Dwarkesh
👤 PersonAppearances Over Time
Podcast Appearances
becoming superhuman at coding competitions doesn't make you a more tasteful programmer more generally.
Maybe the thing to do is not to keep stacking up the amount of environments and the diversity of environments to figure out an approach which lets you learn from one environment and improve your performance on something else.
But then what is the analogy for what the second student is doing before they do the 100 hours of fine-tuning?
I think it's like they have it.
I think it's interesting to distinguish it from whatever pre-training does.
So one way to understand what you just said about we don't have to choose the data in pre-training is to say, actually, it's not dissimilar to the 10,000 hours of practice.
It's just that you get that 10,000 hours of practice for free because it's already somewhere in the pre-training distribution.
But it's like maybe you're suggesting actually there's actually not that much generalization from pre-training.
There's just so much data in pre-training.
But it's like it's not necessarily generalizing better than RL.
Here's analogies that people have proposed for what the human analogy to pre-training is.
And I'm curious to get your thoughts on why they're potentially wrong.
One is to think about the first 18 or 15 or 13 years of a person's life when they aren't necessarily economically productive, but they are doing something that is making them understand the world better and so forth.
And the other is to think about evolution and
as doing some kind of search for 3 billion years, which then results in a human lifetime instance.
And then I'm curious if you think either of these are actually analogous to pre-training or how would you think about at least what lifetime human learning is like if not pre-training?
What is that?
Clearly not just directly emotion.
It seems like some almost value function-like thing, which is telling you which decision to be made, like what the end reward for any decision should be.
And you think that doesn't sort of implicitly come from... I think it could.