Richard Sutton
👤 PersonAppearances Over Time
Podcast Appearances
It's the content of the knowledge is statements about the stream.
And so because it's a statement about the stream, you can test it by comparing it to the stream and you can learn it continually.
So when you're imagining this future continual learning agent.
They're not future.
Of course, they exist all the time.
This is what reinforcement learning paradigm is, learning from experience.
The reward function is arbitrary.
And so if you're playing chess, it's to win the game of chess.
If you're a squirrel, maybe the reward has to do with getting nuts.
Right.
In general, for an animal, you would say the reward is to avoid pain and to acquire pleasure.
Right.
And there's also would be a component having to do with, I think there should be a component having to do with your increasing understanding of your environment.
That would be sort of an intrinsic motivation.
I don't like the word model when used the way you just did.
I think a better word would be the network.
So I think you mean the network.
Maybe there's many networks.
So anyway, things would be learned and then you'd have copies and many instances.
And sure, you'd want to share knowledge across all.