Dwarkesh
👤 PersonAppearances Over Time
Podcast Appearances
What is the kind of relationship we should be seeking and how should we think about what this new recipe might look like?
So if we're back in the era of research, stepping back, what is the part of the recipe that we need to think most about?
When you say value function, people are already trying the current recipe, but then having LLM as a judge and so forth.
You could say that's a value function, but it sounds like you have something much more fundamental in mind.
Do we need to go back to, should we even rethink pre-training at all and not just add more steps to the end of that process?
Okay, so this is the crux, generalization.
And there's two sub-questions.
There's one which is about sample efficiency, which is why should it take so much more data for these models to learn than humans?
There's a second about even separate from the amount of data it takes.
There's a question of why is it so hard to teach the thing we want to a model than to a human, which is to say for to a human that we don't necessarily need a verifiable reward to be able to.
you're probably mentoring a bunch of researchers right now and you're, you know, talking with them, you're showing them your code and you're showing them how you think.
And from that, they're picking up your way of thinking and how they should do research.
You don't have to set like a verifiable reward for them.
That's like, okay, this is the next part of your curriculum.
And now this is the next part of your curriculum.
And oh, it was, this training was unstable and we got to, there's not this schleppy bespoke process.
So perhaps these two issues are actually related in some way, but I'd be curious to,
explore this second thing, which feels more like continual learning, and this first thing, which feels just like sample efficiency.
It still seems better than models.
I mean, obviously models are better than the average human at language and math and coding, but are they better at the average human at learning?