Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Now, these are ground truth examinations.
Can you solve this unseen Math Olympiad question?
Can you build this application to match the specific features request?
But you couldn't have RL'd a model to accomplish these tasks from scratch, or at least we don't know how to do that yet.
You needed a reasonable prior over human data in order to kickstart this RL process.
Whether you want to call this prior a proper world model or just a model of humans, I don't think is that important.
It honestly seems like a semantic debate.
Because what you really care about is whether this model of humans has...
helps you start learning from ground truth, aka become a true world model.
It's a bit like saying to somebody pasteurizing milk, hey, you should stop boiling that milk because eventually you want to serve it cold.
Of course, but this is an intermediate step to facilitate the final output.
By the way, LLMs are clearly developing a deep representation of the world because their training process is incentivizing them to develop one.
I use LLMs to teach me about everything from biology to AI to history, and they are able to do so with remarkable flexibility and coherence.
Now, are LLMs specifically trained to model how their actions will affect the world?
No, they are not.
But if we're not allowed to call their representations a world model,
then we're defining the term world model by the process that we think is necessary to build one, rather than the obvious capabilities that this concept implies.
Okay, continual learning.
I'm sorry to bring up my hobby horse again.
I'm like a comedian who has only come up with one good bit, but I'm going to milk it for all it's worth.