Dr. Jeff Beck
๐ค SpeakerAppearances Over Time
Podcast Appearances
That is a real possibility.
But whose fault was that?
The fault was the person who was very, very naively specified their goals.
There are, in fact, relatively straightforward ways to specify the reward function that don't run that risk nearly as badly.
And the best one is, so are you familiar with like maximum entropy inverse reinforcement learning?
I like to call it active inference because it's really similar.
And so there what you're doing is you're basically observing someone's policy and then you're trying to do a maximum entropy model.
You're doing maximum entropy model on the reward function itself.
At the end of the day, what ends up happening when you do this is this is why it's basically just like active inference.
You get a reward function.
So you have some organism or whatever, and you're trying to do this for it.
And it's got some stationary distribution over actions and outcomes.
It's inputs and outputs of a stationary distribution.
That becomes your reward function.
Not directly.
There's some math involved.
But basically, your reward function is a function of the steady state distributions over actions and outcomes.
So we could do this.
We could take the current manner in which humans are making decisions.
And we could write down, right, what's the stationary, what is the current estimate of the stationary distribution of reactions and outcomes?