Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
It'll say, okay, I'm going to approach this problem using this approach at first, and it'll write this out and be like, oh, wait, I just realized this is the wrong conceptual way to approach the problem.
I'm going to restart by this another approach.
And that flexibility is
does exist in context, right?
Do you have something else in mind, or do you just think that you need to extend this capability across longer horizons?
Isn't that literally what next token prediction is?
Prediction of what was next and then updating on the surprise?
Next token is what they should say, what the action should be.
Oh, yeah.
It's not a goal about the external world.
I guess maybe the bigger question I want to understand is why you don't think doing RL on top of LLMs is a productive direction.
Because we seem to be able to give these models the goal of solving difficult math problems.
And they're in many ways at the very peaks of human level in the capacity to solve Math Olympia-type problems, right?
They got gold at IMO.
So it seems like the model which got gold at the International Math Olympia does have the goal of getting math problems, right?
So why can't we extend this to different domains?
Right.
So, I mean, it's interesting because you wrote this essay in 2019 titled The Bitter Lesson, and this is the most influential essay perhaps in the history of AI, but people have used that as a justification for,
for scaling up LLMs, because in their view, this is the one scalable way we have found to pour ungodly amounts of compute into learning about the world.
And so it's interesting that your perspective is that the LLMs are actually not bitter lesson told.