Marcus Hutter
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Small children, you have reasonably well under control.
They become older.
The reward technique doesn't work so well anymore.
So then finally,
um so this would be agents which are just you could say slaves to the humans yeah so if you are more ambitious and just say we want to build a new species of intelligent beings we put them on a new planet and we want them to develop this planet or whatever so we don't give them any reward
So what could we do?
And you could try to come up with some reward functions like, you know, it should maintain itself, the robot.
It should maybe multiply, build more robots, right?
And, you know, maybe...
or all kinds of things that you find useful.
But that's pretty hard, right?
What does self-maintenance mean?
What does it mean to build a copy?
Should it be an exact copy, an approximate copy?
And so that's really hard.
But Laurent Assault, also at DeepMind, developed a beautiful model
So it just took the ICSI model and coupled the rewards to information gain.
So he said the reward is proportional to how much the agent had learned about the world.
And you can rigorously, formally, uniquely define that in terms of other versions.
So if you put that in, you get a completely autonomous agent.