Dwarkesh Patel
π€ SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
It's the abysmal sample efficiency of these models.
It's their dependence on exhaustible human data.
If the LLMs do get to HEI first, which is what I expect to happen, the successor systems that they build will almost certainly be based on Richard's vision.
Today, I'm chatting with Richard Sutton, who is one of the founding fathers of reinforcement learning and inventor of many of the main techniques used there, like TD learning and policy gradient methods.
And for that, he received this year's Turing Award, which, if you don't know, is basically the Nobel Prize for Computer Science.
Richard, congratulations.
Thank you, Dvarkis.
And thanks for coming on the podcast.
It's my pleasure.
Okay, so first question is,
My audience and I are familiar with the LLM way of thinking about AI.
Conceptually, what are we missing in terms of thinking about AI from the RL perspective?
Huh.
I guess you would think that to emulate the trillions of tokens in the corpus of internet text, you would have to build a world model.
In fact, these models do seem to have very robust world models, and they're the best world models we've made to date in AI, right?
So what do you think that's missing?
Great.
Yeah.
Right.
I guess maybe the crux, and I'm curious if you disagree with this, is some people will say, okay, so...