Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
With LLMs, we're going the opposite way.
We have first made this base model that does pure imitation learning, and then we're hoping that we do enough RL on it to make a coherent agent with goals and self-awareness.
Maybe this won't work.
But I don't think these super first principles arguments about, for example, how these LMs don't have a true world model are actually proving much.
And I also don't think they're strictly accurate for the models we have today, which are actually undergoing a lot of RL on ground truth.
Even if Sutton's platonic ideal doesn't end up being the path to the first AGI,
His first principles critique is identifying some genuine basic gaps that these models have.
And we don't even notice them because they're so pervasive in the current paradigm, but because he has this decades-long perspective, they're obvious to him.
It's the lack of continual learning.
It's the abysmal sample efficiency of these models.
It's their dependence on exhaustible human data.
If the LLMs do get to HEI first, which is what I expect to happen, the successor systems that they build will almost certainly be based on Richard's vision.
Today, I'm chatting with Richard Sutton, who is one of the founding fathers of reinforcement learning and inventor of many of the main techniques used there, like TD learning and policy gradient methods.
And for that, he received this year's Turing Award, which, if you don't know, is basically the Nobel Prize for Computer Science.
Richard, congratulations.
Thank you, Dvarkis.
And thanks for coming on the podcast.
It's my pleasure.
Okay, so first question is,
My audience and I are familiar with the LLM way of thinking about AI.