Richard Sutton
👤 PersonAppearances Over Time
Podcast Appearances
the instances.
And there would be lots of possibilities for doing that.
Like there is not today.
You can't have one child grow up and learn about the world and then every new child has to repeat that process.
Whereas with AIs, with the digital intelligence, you could hope to do it once and then copy it into the next one as a starting place.
So this would be a huge savings and I think actually it would be much more important than trying to learn from people.
So this is something we know very well, and the basis of it is temporal difference learning, where the same thing happens in a less grandiose scale, like when you learn to play chess.
The long-term goal is winning the game, and yet you want to be able to learn from shorter-term things, like taking your opponent's pieces.
And so you do that by having a value function, which predicts the long-term outcome.
And then if you take the guy's pieces, well, your prediction about the long-term outcome is changed.
It goes up.
You think you're going to win.
And then that increase in your belief changes.
immediately quote reinforces the uh the move that led to taking the piece okay so we have this long-term 10-year goal of making a startup and making a lot of money and so when we make progress we say oh i'm i'm i'm more likely to uh achieve the long-term goal and that rewards the the steps along the way
I think the crux of this, and I'm not sure, but...
The big world hypothesis seems very relevant, and the reason why humans become useful on their job is because they are encountering the particular part of the world, and it can't have been anticipated, and it can't all have been put in in advance.
The world is so huge that you can't... The dream, as I see it, the dream of large language models is you can teach the agent everything and it will know everything and it won't have to learn anything online.
right during its life okay and and your examples are all well really you have to because you can there's a lot to you can teach it but there's all little idiosyncrasies of the particular life they're leading and the the particular people they're working with and what they like as opposed to what average people like right and so that's just saying the world is really big and so you're going to have to learn it uh along the way
And I'm- So I would say you're just doing regular learning.
Maybe using context, because in large language models, all that information has to go into the context window.