Jonathan Ross
👤 PersonAppearances Over Time
Podcast Appearances
So going back to my background, one of the fun things that I got to witness, I wasn't directly involved, was AlphaGo. Google beat the world champion, Lee Sedol, in Go. That model was trained on a bunch of existing games. But later on, they created a new one called AlphaGo Zero, which was trained on no existing games. It just played against itself. So how do you play against yourself and win?
So going back to my background, one of the fun things that I got to witness, I wasn't directly involved, was AlphaGo. Google beat the world champion, Lee Sedol, in Go. That model was trained on a bunch of existing games. But later on, they created a new one called AlphaGo Zero, which was trained on no existing games. It just played against itself. So how do you play against yourself and win?
Well, you train a model on some terrible moves. It does okay. And then you have it play against itself. And when it does better, you train on those better games. And then you keep leveling up like this, right? So you get better, better data. The better your model is when it outputs something, the better the result, the better the data.
Well, you train a model on some terrible moves. It does okay. And then you have it play against itself. And when it does better, you train on those better games. And then you keep leveling up like this, right? So you get better, better data. The better your model is when it outputs something, the better the result, the better the data.
Well, you train a model on some terrible moves. It does okay. And then you have it play against itself. And when it does better, you train on those better games. And then you keep leveling up like this, right? So you get better, better data. The better your model is when it outputs something, the better the result, the better the data.
So what you do is you train a model, you use it to generate data, and then you train a model and you use it to generate data and you keep getting better and better and better. So you can sort of beat the scaling law problem.
So what you do is you train a model, you use it to generate data, and then you train a model and you use it to generate data and you keep getting better and better and better. So you can sort of beat the scaling law problem.
So what you do is you train a model, you use it to generate data, and then you train a model and you use it to generate data and you keep getting better and better and better. So you can sort of beat the scaling law problem.
One quick hack to get past all of that in the stepping up is if there's a really good model already right here, just have it generate the data and you go right up to where it is. And that's what they did. It is true that they spent about six million or whatever it was on the training. They spent a lot more distilling or scraping the open AI model.
One quick hack to get past all of that in the stepping up is if there's a really good model already right here, just have it generate the data and you go right up to where it is. And that's what they did. It is true that they spent about six million or whatever it was on the training. They spent a lot more distilling or scraping the open AI model.
One quick hack to get past all of that in the stepping up is if there's a really good model already right here, just have it generate the data and you go right up to where it is. And that's what they did. It is true that they spent about six million or whatever it was on the training. They spent a lot more distilling or scraping the open AI model.
Correct. And all that said, they did a lot of really innovative things. That's what makes it so complicated, because on the one hand, they kind of just scraped the open AI model. On the other hand, they came up with some unique reinforcement learning techniques that are so similar. What did they do that was so impressive?
Correct. And all that said, they did a lot of really innovative things. That's what makes it so complicated, because on the one hand, they kind of just scraped the open AI model. On the other hand, they came up with some unique reinforcement learning techniques that are so similar. What did they do that was so impressive?
Correct. And all that said, they did a lot of really innovative things. That's what makes it so complicated, because on the one hand, they kind of just scraped the open AI model. On the other hand, they came up with some unique reinforcement learning techniques that are so similar. What did they do that was so impressive?
No, they came up with innovative stuff. But actually, the best way to describe it, have you ever taken a test before you got an answer right, and your professor marked it wrong. And then you go back to the professor and you have to argue with them and everything. And it's a pain, right?
No, they came up with innovative stuff. But actually, the best way to describe it, have you ever taken a test before you got an answer right, and your professor marked it wrong. And then you go back to the professor and you have to argue with them and everything. And it's a pain, right?
No, they came up with innovative stuff. But actually, the best way to describe it, have you ever taken a test before you got an answer right, and your professor marked it wrong. And then you go back to the professor and you have to argue with them and everything. And it's a pain, right?
Well, if there is only one answer, and it's a very simple answer, and you say, write that answer in this box, then there is no arguing. You either get it right or not, right? So what they did was, rather than having human beings check the output and say yes or no or whatever, what they did was they said, here's the box. There's literally some code to say here's a box.
Well, if there is only one answer, and it's a very simple answer, and you say, write that answer in this box, then there is no arguing. You either get it right or not, right? So what they did was, rather than having human beings check the output and say yes or no or whatever, what they did was they said, here's the box. There's literally some code to say here's a box.
Well, if there is only one answer, and it's a very simple answer, and you say, write that answer in this box, then there is no arguing. You either get it right or not, right? So what they did was, rather than having human beings check the output and say yes or no or whatever, what they did was they said, here's the box. There's literally some code to say here's a box.