Marcus Hutter
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
So, for instance, usually you need an aggodicity assumption in the MDP frameworks in order to learn.
Aggodicity essentially means that you can recover from your mistakes and that there are no traps in the environment.
And if you make this assumption, then essentially you can, you know, go back to a previous state, go there a couple of times and then learn what statistics and what the state is like and then in the long run perform well in this state.
But there are no fundamental principles.
problems but in real life we know you know there can be one single action you know one second of being inattentive while driving a car fast you know can ruin the rest of my life i can become quadriplegic or whatever so and there's no recovery anymore so the real world is not ergodic i always say you know there are traps and there are situations where you are not recovered from and um very little theory has been developed for this case what about uh
Yeah, I say the good thing is that there are no parameters to control.
Some other people drag knobs to control.
And you can do that.
I mean, you can modify axes so that you have some knobs to play with if you want to.
But the exploration is directly baked in.
And that comes from the Bayesian learning and the long-term planning.
So these together already imply exploration.
You can nicely and explicitly prove that
for simple problems like so-called banded problems, where you say, to give a real world example, say you have two medical treatments, A and B, you don't know the effectiveness, you try A a little bit, B a little bit, but you don't want to harm too many patients, so you have to sort of trade off exploring, and at some point you want to explore, and you can do the mathematics and figure out the optimal strategy.
The so-called Bayesian agents are also non-Bayesian agents, but it shows that this Bayesian framework by taking a prior over possible worlds, doing the Bayesian mixture, then the Bayes optimal decision with long-term planning that is important automatically implies exploration also to the proper extent, not too much exploration and not too little in this very simple settings.
In the ICSI model,
I was also able to prove that it is a self-optimizing theorem or asymptotic optimality theorems, although they're only asymptotic, not finite time bounds.
I think it's absolutely crucial.
The question is whether there's a way to deal with it in a more heuristic and still sufficiently well way.
So I have to come up with an example and fly.