Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

Dwarkesh Podcast

Richard Sutton – Father of RL thinks LLMs are a dead-end

26 Sep 2025

Transcription

Full Episode

0.031 - 18.725 Dwarkesh Patel

Today, I'm chatting with Richard Sutton, who is one of the founding fathers of reinforcement learning and inventor of many of the main techniques used there, like TD learning and policy gradient methods. And for that, he received this year's Turing Award, which, if you don't know, is basically the Nobel Prize for Computer Science. Richard, congratulations. Thank you, Dvarkis.

0

19.126 - 33.086 Dwarkesh Patel

And thanks for coming on the podcast. It's my pleasure. Okay, so first question is, My audience and I are familiar with the LLM way of thinking about AI. Conceptually, what are we missing in terms of thinking about AI from the RL perspective?

0

34.048 - 62.099 Richard Sutton

Well, yes, I think it's really quite a different point of view. And it can easily get separated and lose the ability to talk to each other. And yeah, large language models have become such a big thing. Generative AI in general, a big thing. And our field is subject to bandwagons and fashions. So we lose track of the basic, basic things. Because I consider reinforcement learning to be basic AI.

0

62.299 - 79.195 Richard Sutton

And what is intelligence? The problem is to understand your world. And reinforcement learning is about understanding your world. Whereas large language models are about mimicking people, doing what people say you should do. They're not about figuring out what to do.

0

79.394 - 97.862 Dwarkesh Patel

Huh. I guess you would think that to emulate the trillions of tokens in the corpus of internet text, you would have to build a world model. In fact, these models do seem to have very robust world models, and they're the best world models we've made to date in AI, right? So what do you think that's missing?

98.664 - 100.887 Richard Sutton

I would disagree with most of the things you just said.

100.907 - 102.87 Dwarkesh Patel

Great.

102.85 - 125.038 Richard Sutton

Just to mimic what people say is not really to build a model of the world at all, I don't think. You know, you're mimicking things that have a model of the world, the people. But I don't want to approach the question in an adversarial way. But I would question the idea that they have a world model. So a world model would enable you to predict what would happen.

126.059 - 148.416 Richard Sutton

They have the ability to predict what a person would say. They don't have the ability to predict what will happen. What we want, I think, to quote Alan Turing, what we want is a machine that can learn from experience. Right. Where experience is the things that actually happen in your life. You do things, you see what happens, and that's what you learn from.

Comments

There are no comments yet.

Please log in to write the first comment.