Noam Brown
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's true for chess.
It's true for poker.
It's particularly useful for poker.
This is counterfactual regret minimization.
So this counterfactual regret minimization is a kind of self-play.
It's a principled kind of self-play that's proven to converge to Nash Equilibria, even in imperfect information games.
Now you can have other forms of self-play and people use other forms of self-play for perfect information games where you have more flexibility.
The algorithm doesn't have to be as theoretically sound in order to converge to that class of games because it's a simpler setting.
Exactly.
Yeah.
Self-play is not tied specifically to neural nets.
It's a kind of reinforcement learning basically.
Okay.
And I would also say this process of like trying to reason, oh, what would the value have been if I had taken this other action instead?
This is very similar to how humans learn to play a game like poker, right?
Like you probably played poker before and with your friends, you probably ask like, oh, would you have called me if I raised there?
And that's a person trying to do the same kind of like learning from a counterfactual that the AI is doing.
Yeah.
Now where the neural nets come in, I said like, okay, if it's in that situation again, then it will choose the action that has high regret.
Now the problem is that poker is such a huge game.