Dylan Patel
๐ค SpeakerAppearances Over Time
Podcast Appearances
We don't yet have the equivalent of turn 37, which is the famous turn where the DeepMind's AI playing ghost dumped Lee Sedol completely. We don't have something that's that level of focal point, but that doesn't mean that the approach to technology is different and the impact of the general training. It's still incredibly new. What do you think that point would be?
What would be move 37 for chain of thought, for reasoning? scientific discovery. When you use this sort of reasoning problem and it's just something we fully don't expect.
What would be move 37 for chain of thought, for reasoning? scientific discovery. When you use this sort of reasoning problem and it's just something we fully don't expect.
What would be move 37 for chain of thought, for reasoning? scientific discovery. When you use this sort of reasoning problem and it's just something we fully don't expect.
All math and code benchmarks were pretty much solved, except for frontier math, which is designed to be almost questions that aren't practical to most people. Because they're exam-level, open math problem-type things. So it's on the math problems that are somewhat reasonable, which is somewhat complicated word problems or coding problems. It's just what Dylan is saying.
All math and code benchmarks were pretty much solved, except for frontier math, which is designed to be almost questions that aren't practical to most people. Because they're exam-level, open math problem-type things. So it's on the math problems that are somewhat reasonable, which is somewhat complicated word problems or coding problems. It's just what Dylan is saying.
All math and code benchmarks were pretty much solved, except for frontier math, which is designed to be almost questions that aren't practical to most people. Because they're exam-level, open math problem-type things. So it's on the math problems that are somewhat reasonable, which is somewhat complicated word problems or coding problems. It's just what Dylan is saying.
The bank account can't lie. Exactly. There's surprising evidence that once you set up the ways of collecting the verifiable domain that this can work. There's been a lot of research before this R1 on math problems, and they approach math with language models just by increasing the number of samples. So you can just try again and again and again. And you look at the...
The bank account can't lie. Exactly. There's surprising evidence that once you set up the ways of collecting the verifiable domain that this can work. There's been a lot of research before this R1 on math problems, and they approach math with language models just by increasing the number of samples. So you can just try again and again and again. And you look at the...
The bank account can't lie. Exactly. There's surprising evidence that once you set up the ways of collecting the verifiable domain that this can work. There's been a lot of research before this R1 on math problems, and they approach math with language models just by increasing the number of samples. So you can just try again and again and again. And you look at the...
amount of times that the language models get it right. And what we see is that even very bad models get it right sometimes. And the whole idea behind reinforcement learning is that you can learn from very sparse rewards.
amount of times that the language models get it right. And what we see is that even very bad models get it right sometimes. And the whole idea behind reinforcement learning is that you can learn from very sparse rewards.
amount of times that the language models get it right. And what we see is that even very bad models get it right sometimes. And the whole idea behind reinforcement learning is that you can learn from very sparse rewards.
So it doesn't... The space of language and the space of tokens, whether you're generating language or tasks for a robot, is so big that you might say that it's like... I mean, the tokenizer for a language model can be like 200,000 things. So at each step, it can sample from that big of a space. So if it... can generate a bit of a signal that it can climb onto.
So it doesn't... The space of language and the space of tokens, whether you're generating language or tasks for a robot, is so big that you might say that it's like... I mean, the tokenizer for a language model can be like 200,000 things. So at each step, it can sample from that big of a space. So if it... can generate a bit of a signal that it can climb onto.
So it doesn't... The space of language and the space of tokens, whether you're generating language or tasks for a robot, is so big that you might say that it's like... I mean, the tokenizer for a language model can be like 200,000 things. So at each step, it can sample from that big of a space. So if it... can generate a bit of a signal that it can climb onto.
That's what the whole field of RL is around is learning from sparse rewards. And the same thing has played out in math where it's like very weak models that sometimes generate answers where you see research already that you can boost their math scores. You can do this sort of RL training
That's what the whole field of RL is around is learning from sparse rewards. And the same thing has played out in math where it's like very weak models that sometimes generate answers where you see research already that you can boost their math scores. You can do this sort of RL training
That's what the whole field of RL is around is learning from sparse rewards. And the same thing has played out in math where it's like very weak models that sometimes generate answers where you see research already that you can boost their math scores. You can do this sort of RL training
For math, it might not be as effective, but if you take a 1 billion parameter model, so something 600 times smaller than DeepSeq, you can boost its grade school math scores very directly with a small amount of this training. So it's not to say that this is coming soon.