Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
Now, it does have some skew because more of the text is skewed a certain way, which is general, slight left, but also somewhat intellectual. It's just the general internet is a certain way. And then as Nathan's about to describe eloquently, you can elicit certain things out.
Now, it does have some skew because more of the text is skewed a certain way, which is general, slight left, but also somewhat intellectual. It's just the general internet is a certain way. And then as Nathan's about to describe eloquently, you can elicit certain things out.
And then maybe OpenAI does less and Anthropic does less. And then on the other end of the spectrum is XAI. But they all have different forms of RLHF trying to make them a certain way.
And then maybe OpenAI does less and Anthropic does less. And then on the other end of the spectrum is XAI. But they all have different forms of RLHF trying to make them a certain way.
And then maybe OpenAI does less and Anthropic does less. And then on the other end of the spectrum is XAI. But they all have different forms of RLHF trying to make them a certain way.
I think it's actually probably simpler than that. It's probably something related to computer user robotics rather than science discovery. Because the important aspect here is models take so much data to learn, they're not sample efficient, right? Trillions, they take the entire web, right? Over 10 trillion tokens to train on, right? This would take a human... thousands of years to read, right?
I think it's actually probably simpler than that. It's probably something related to computer user robotics rather than science discovery. Because the important aspect here is models take so much data to learn, they're not sample efficient, right? Trillions, they take the entire web, right? Over 10 trillion tokens to train on, right? This would take a human... thousands of years to read, right?
I think it's actually probably simpler than that. It's probably something related to computer user robotics rather than science discovery. Because the important aspect here is models take so much data to learn, they're not sample efficient, right? Trillions, they take the entire web, right? Over 10 trillion tokens to train on, right? This would take a human... thousands of years to read, right?
And humans know most of the stuff, a lot of the stuff models know better than it, right? Humans are way, way, way more sample efficient. That is because of the self-play, right? How does a baby learn what its body is? As it sticks its foot in its mouth and it says, oh, this is my body.
And humans know most of the stuff, a lot of the stuff models know better than it, right? Humans are way, way, way more sample efficient. That is because of the self-play, right? How does a baby learn what its body is? As it sticks its foot in its mouth and it says, oh, this is my body.
And humans know most of the stuff, a lot of the stuff models know better than it, right? Humans are way, way, way more sample efficient. That is because of the self-play, right? How does a baby learn what its body is? As it sticks its foot in its mouth and it says, oh, this is my body.
It sticks its hand in its mouth and it calibrates its touch on its fingers with the most sensitive touch thing on its tongue. This is how babies learn. And it's just self-play over and over and over and over again. And now we have something that is similar to that with these verifiable proofs, whether it's a unit test in code or...
It sticks its hand in its mouth and it calibrates its touch on its fingers with the most sensitive touch thing on its tongue. This is how babies learn. And it's just self-play over and over and over and over again. And now we have something that is similar to that with these verifiable proofs, whether it's a unit test in code or...
It sticks its hand in its mouth and it calibrates its touch on its fingers with the most sensitive touch thing on its tongue. This is how babies learn. And it's just self-play over and over and over and over again. And now we have something that is similar to that with these verifiable proofs, whether it's a unit test in code or...
mathematical verifiable task, generate many traces of reasoning, right? And keep branching them out, keep branching them out. And then check at the end, hey, which one actually has the right answer? Most of them are wrong. Great. These are the few that are right. Maybe we use some sort of reward model outside of this to select even the best one to preference as well.
mathematical verifiable task, generate many traces of reasoning, right? And keep branching them out, keep branching them out. And then check at the end, hey, which one actually has the right answer? Most of them are wrong. Great. These are the few that are right. Maybe we use some sort of reward model outside of this to select even the best one to preference as well.
mathematical verifiable task, generate many traces of reasoning, right? And keep branching them out, keep branching them out. And then check at the end, hey, which one actually has the right answer? Most of them are wrong. Great. These are the few that are right. Maybe we use some sort of reward model outside of this to select even the best one to preference as well.
But now you've started to get better and better at these benchmarks. And so you've seen over the last six months, a skyrocketing in a lot of different benchmarks, right?
But now you've started to get better and better at these benchmarks. And so you've seen over the last six months, a skyrocketing in a lot of different benchmarks, right?
But now you've started to get better and better at these benchmarks. And so you've seen over the last six months, a skyrocketing in a lot of different benchmarks, right?