Ilya Sutskever
👤 PersonAppearances Over Time
Podcast Appearances
That's how reinforcement learning is done naively.
That's how 01, R1 ostensibly are done.
The value function says something like, okay, look, maybe I could sometimes, not always, could tell you if you are doing well or badly.
The notion of a value function is more useful in some domains than others.
For example, when you play chess and you lose a piece, you know, I messed up.
You don't need to play the whole game to know that what I just did was bad and therefore whatever preceded it was also bad.
So the value function lets you short circuit the weight until the very end.
Like let's suppose that you started to pursue some kind of, okay, let's suppose that you are doing some kind of a math thing or a programming thing, and you're trying to explore a particular solution direction.
And after, let's say after a thousand steps of thinking, you concluded that this direction is unpromising.
As soon as you conclude this, you could already get a reward signal a thousand time steps previously when you decided to pursue down this path.
You say, oh, next time I shouldn't pursue this path in a similar situation long before you actually came up with the proposed solution.
This sounds like such lack of faith in deep learning.
Like, I mean, sure, it might be difficult, but nothing deep learning can't do.
So my expectation is that value functions should be useful and I fully expect that they will be used in the future if not already.
What was I alluding to with the person whose emotional center got damaged is more that
Maybe what it suggests is that the value function of humans is modulated by emotions in some important way that's hard-coded by evolution.
And maybe that is important for people to be effective in the world.
I do agree that compared to the kind of things that we learn and the things that we are talking about, the kind of ways we are talking about emotions are relatively simple.
might even be so simple that maybe you could map them out in a human understandable way.
I think it would be cool to do.