Noam Brown
๐ค SpeakerAppearances Over Time
Podcast Appearances
And the person responded that it's just so much harder to make an AI that can talk with you and cooperate with you than it is to make an AI that can fight you.
And I think once this technology develops further and you can reach a point where not every single line of dialogue has to be scripted, it unlocks a lot of potential for new kinds of games, much more positive interactions that are not so focused on fighting.
And I'm really looking forward to that.
All right.
So there's different ways to find a Nash Equilibrium.
So the way that we do it is with this process called self-play.
Basically, we have this algorithm that starts by playing totally randomly, and it learns how to play the game by playing against itself.
So it will start playing the game totally randomly.
And then if it's playing poker, it'll eventually get to the end of the game and make $50.
And then it will review all of the decisions that it made along the way and say, what would have happened if I had chosen this other action instead?
You know, if I had raised here instead of called, what would the other player have done?
And because it's playing against a copy of itself, it's able to do that counterfactual reasoning.
So they can say, okay, well, if I took this action and the other person takes this action and then I take this action and eventually I make $150 instead of 50.
And so it updates the regret value for that action.
Regret is basically like how much does it regret having not played that action in the past?
And when it encounters that same situation again, it's going to pick actions that have higher regret with higher probability.
Now, it'll just keep simulating the games this way.
It'll keep accumulating regrets for different situations.
And in the long run, if you pick actions that have higher regret with higher probability in the correct way, it's proven to converge to a Nash equilibrium.
It's true for all games.