Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
trajectory to speed run or you know what models can do but at some point i truly think that like you know we'll spawn models and initially all the training will be in sandboxes but then at some point you know the language model pre-training is going to be dwarfed by what is this reinforcement learning you know you'll pre-train a multimodal model that can see that can read that can write you know blah blah blah whatever vision audio etc but then you'll have it play in a sandbox and
infinitely and figure out figure out math figure out code figure out navigating the web figure out operating a robot arm right and then it'll learn so much and the aha moment i think will be when this is available to then create something that's not good right like oh cool part of it was like figuring out how to use the web now all of a sudden it's figured out really well how to just get hundreds of thousands of followers that are real and real engagement on twitter because all of a sudden this is one of the things that are verifiable
infinitely and figure out figure out math figure out code figure out navigating the web figure out operating a robot arm right and then it'll learn so much and the aha moment i think will be when this is available to then create something that's not good right like oh cool part of it was like figuring out how to use the web now all of a sudden it's figured out really well how to just get hundreds of thousands of followers that are real and real engagement on twitter because all of a sudden this is one of the things that are verifiable
infinitely and figure out figure out math figure out code figure out navigating the web figure out operating a robot arm right and then it'll learn so much and the aha moment i think will be when this is available to then create something that's not good right like oh cool part of it was like figuring out how to use the web now all of a sudden it's figured out really well how to just get hundreds of thousands of followers that are real and real engagement on twitter because all of a sudden this is one of the things that are verifiable
And it's verifiable, right?
And it's verifiable, right?
And it's verifiable, right?
I think one of the things that people are ignoring is Google's Gemini flash thinking is both cheaper than R1 and better. And they released it in the beginning of December. And nobody's talking about it. No one cares.
I think one of the things that people are ignoring is Google's Gemini flash thinking is both cheaper than R1 and better. And they released it in the beginning of December. And nobody's talking about it. No one cares.
I think one of the things that people are ignoring is Google's Gemini flash thinking is both cheaper than R1 and better. And they released it in the beginning of December. And nobody's talking about it. No one cares.
I get the same question from earlier. Uh, the one about the, the human nature.
I get the same question from earlier. Uh, the one about the, the human nature.
I get the same question from earlier. Uh, the one about the, the human nature.
Oh, and it latched onto human, and then it went into organisms, and oh, wow. Yeah.
Oh, and it latched onto human, and then it went into organisms, and oh, wow. Yeah.
Oh, and it latched onto human, and then it went into organisms, and oh, wow. Yeah.
I think when you, you know, to Nathan's point, when you look at like the reasoning models, to me, even when I used R1 versus O1, there was like that sort of rough edges around the corner feeling, right? And flash thinking, you know, earlier, I didn't use this version, but the one from December, and it definitely had that rough edges around the corner feeling, right?
I think when you, you know, to Nathan's point, when you look at like the reasoning models, to me, even when I used R1 versus O1, there was like that sort of rough edges around the corner feeling, right? And flash thinking, you know, earlier, I didn't use this version, but the one from December, and it definitely had that rough edges around the corner feeling, right?
I think when you, you know, to Nathan's point, when you look at like the reasoning models, to me, even when I used R1 versus O1, there was like that sort of rough edges around the corner feeling, right? And flash thinking, you know, earlier, I didn't use this version, but the one from December, and it definitely had that rough edges around the corner feeling, right?
Where it's just not fleshed out in as many ways, right? Sure, they added math and coding capabilities via these verifiers in RL, but it feels like they lost something in certain areas. And O1 is worse performing than chat in many areas as well, to be clear. Not by a lot. Not by a lot though, right?