Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
So the thing here is that... These are only with verifiable tasks. We earlier showed an example of the, you know, the really interesting, like what happens when chain of thought is to a non-verifiable thing. It's just like a human, you know, chatting, right? With the, you know, thinking about what's novel for humans, right? A unique thought.
So the thing here is that... These are only with verifiable tasks. We earlier showed an example of the, you know, the really interesting, like what happens when chain of thought is to a non-verifiable thing. It's just like a human, you know, chatting, right? With the, you know, thinking about what's novel for humans, right? A unique thought.
So the thing here is that... These are only with verifiable tasks. We earlier showed an example of the, you know, the really interesting, like what happens when chain of thought is to a non-verifiable thing. It's just like a human, you know, chatting, right? With the, you know, thinking about what's novel for humans, right? A unique thought.
But this task and form of training only works when it's verifiable. And from here, the thought is, okay, we can continue to scale this current training method by increasing the number of verifiable tasks. In math and coding, coding probably has a lot more to go. Math has a lot less to go in terms of what are verifiable things.
But this task and form of training only works when it's verifiable. And from here, the thought is, okay, we can continue to scale this current training method by increasing the number of verifiable tasks. In math and coding, coding probably has a lot more to go. Math has a lot less to go in terms of what are verifiable things.
But this task and form of training only works when it's verifiable. And from here, the thought is, okay, we can continue to scale this current training method by increasing the number of verifiable tasks. In math and coding, coding probably has a lot more to go. Math has a lot less to go in terms of what are verifiable things.
Can I create a solver that then I generate trajectories toward or reasoning traces towards and then prune the ones that don't work and keep the ones that do work? Well, those are going to be solved pretty quickly, but even if you've solved math, you have not... actually created intelligence, right?
Can I create a solver that then I generate trajectories toward or reasoning traces towards and then prune the ones that don't work and keep the ones that do work? Well, those are going to be solved pretty quickly, but even if you've solved math, you have not... actually created intelligence, right?
Can I create a solver that then I generate trajectories toward or reasoning traces towards and then prune the ones that don't work and keep the ones that do work? Well, those are going to be solved pretty quickly, but even if you've solved math, you have not... actually created intelligence, right?
And so this is where I think the like, aha moment of computer user robotics will come in because now you have a sandbox or a playground that is infinitely verifiable, right? Did you, you know, messing around on the internet, there are so many actions that you can do that are verifiable. It'll start off with like, log into a website, create an account, click a button here, blah, blah, blah.
And so this is where I think the like, aha moment of computer user robotics will come in because now you have a sandbox or a playground that is infinitely verifiable, right? Did you, you know, messing around on the internet, there are so many actions that you can do that are verifiable. It'll start off with like, log into a website, create an account, click a button here, blah, blah, blah.
And so this is where I think the like, aha moment of computer user robotics will come in because now you have a sandbox or a playground that is infinitely verifiable, right? Did you, you know, messing around on the internet, there are so many actions that you can do that are verifiable. It'll start off with like, log into a website, create an account, click a button here, blah, blah, blah.
But it'll then get to the point where it's, hey, go do a task on Tasker or whatever these other, all these various task websites. hey, go get hundreds of likes, right? And it's going to fail. It's going to spawn hundreds of accounts. It's going to fail on most of them. But this one got to a thousand. Great. Now you've reached the verifiable thing.
But it'll then get to the point where it's, hey, go do a task on Tasker or whatever these other, all these various task websites. hey, go get hundreds of likes, right? And it's going to fail. It's going to spawn hundreds of accounts. It's going to fail on most of them. But this one got to a thousand. Great. Now you've reached the verifiable thing.
But it'll then get to the point where it's, hey, go do a task on Tasker or whatever these other, all these various task websites. hey, go get hundreds of likes, right? And it's going to fail. It's going to spawn hundreds of accounts. It's going to fail on most of them. But this one got to a thousand. Great. Now you've reached the verifiable thing.
And you just keep iterating this loop over and over. And that's when... And same with robotics, right? That's where, you know, where you have an infinite playground of tasks like, hey, did I put the ball in the bucket? All the way to like, oh, did I like build a car, right? Like, you know, there's a whole...
And you just keep iterating this loop over and over. And that's when... And same with robotics, right? That's where, you know, where you have an infinite playground of tasks like, hey, did I put the ball in the bucket? All the way to like, oh, did I like build a car, right? Like, you know, there's a whole...
And you just keep iterating this loop over and over. And that's when... And same with robotics, right? That's where, you know, where you have an infinite playground of tasks like, hey, did I put the ball in the bucket? All the way to like, oh, did I like build a car, right? Like, you know, there's a whole...
trajectory to speed run or you know what models can do but at some point i truly think that like you know we'll spawn models and initially all the training will be in sandboxes but then at some point you know the language model pre-training is going to be dwarfed by what is this reinforcement learning you know you'll pre-train a multimodal model that can see that can read that can write you know blah blah blah whatever vision audio etc but then you'll have it play in a sandbox and
trajectory to speed run or you know what models can do but at some point i truly think that like you know we'll spawn models and initially all the training will be in sandboxes but then at some point you know the language model pre-training is going to be dwarfed by what is this reinforcement learning you know you'll pre-train a multimodal model that can see that can read that can write you know blah blah blah whatever vision audio etc but then you'll have it play in a sandbox and