Marcus Hutter
๐ค SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Yeah, so you have a planning problem up to horizon M, and that's exponential time in the horizon M, which is, I mean, it's computable, but intractable.
I mean, even for chess, it's already intractable to do that exactly for Go.
But it could be also discounted kind of framework where... Yeah, so having a hard horizon at 100 years, it's just for simplicity of discussing the model, and also sometimes the math is simple.
But there are lots of variations.
It's actually a quite interesting parameter.
There's nothing...
really problematic about it, but it's very interesting.
So for instance, you think, no, let's let the parameter m tend to infinity, right?
You want an agent which lives forever, right?
If you do it naively, you have two problems.
First, the mathematics breaks down because you have an infinite reward sum, which may give infinity, and getting reward 0.1 every time step is infinity, and giving reward 1 every time step is infinity, so equally good.
Not really what we want.
Other problem is that if you have an infinite life,
You can be lazy for as long as you want for 10 years and then catch up with the same expected reward.
And, you know, think about yourself or, you know, or maybe, you know, some friends or so.
If they knew they lived forever, you know, why work hard now?
You know, just enjoy your life, you know, and then catch up later.
So that's another problem with the infinite horizon.
And you mentioned, yes, we can go to discounting.
But then the standard discounting is so-called geometric discounting.