Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
Where it's just not fleshed out in as many ways, right? Sure, they added math and coding capabilities via these verifiers in RL, but it feels like they lost something in certain areas. And O1 is worse performing than chat in many areas as well, to be clear. Not by a lot. Not by a lot though, right?
Where it's just not fleshed out in as many ways, right? Sure, they added math and coding capabilities via these verifiers in RL, but it feels like they lost something in certain areas. And O1 is worse performing than chat in many areas as well, to be clear. Not by a lot. Not by a lot though, right?
And it's like R1 definitely felt to me like it was worse than V3 in certain areas, like doing this RL expressed and learned a lot, but then it weakened in other areas. And so I think that's one of the big differences between these models is
And it's like R1 definitely felt to me like it was worse than V3 in certain areas, like doing this RL expressed and learned a lot, but then it weakened in other areas. And so I think that's one of the big differences between these models is
And it's like R1 definitely felt to me like it was worse than V3 in certain areas, like doing this RL expressed and learned a lot, but then it weakened in other areas. And so I think that's one of the big differences between these models is
and the and and and what oh one offers and then open ai has oh one pro and what they did with oh three which is like also very unique is that they stacked search on top of chain of thought right um and so chain of thought is one thing where it's able it's one chain it backtracks goes back and forth but how they served solved the arc agi challenge was not just the chain of thought
and the and and and what oh one offers and then open ai has oh one pro and what they did with oh three which is like also very unique is that they stacked search on top of chain of thought right um and so chain of thought is one thing where it's able it's one chain it backtracks goes back and forth but how they served solved the arc agi challenge was not just the chain of thought
and the and and and what oh one offers and then open ai has oh one pro and what they did with oh three which is like also very unique is that they stacked search on top of chain of thought right um and so chain of thought is one thing where it's able it's one chain it backtracks goes back and forth but how they served solved the arc agi challenge was not just the chain of thought
It was also sampling many times, i.e. running them in parallel and then selecting.
It was also sampling many times, i.e. running them in parallel and then selecting.
It was also sampling many times, i.e. running them in parallel and then selecting.
And then what?
And then what?
And then what?
Another form of search is just asking five different people and then taking the majority answers. Yes.
Another form of search is just asking five different people and then taking the majority answers. Yes.
Another form of search is just asking five different people and then taking the majority answers. Yes.
right there's a variety of like you know it could be complicated it could be simple we don't know what it is just that they are they are not just issuing one chain of thought in sequence they're launching many in parallel and in the arc hgi they launched a thousand in parallel for their uh the one that like really shocked everyone that beat the benchmark was they they would launch a thousand in parallel and then they would get the right answer like 80 of the time or 70 of the time 90 maybe even
right there's a variety of like you know it could be complicated it could be simple we don't know what it is just that they are they are not just issuing one chain of thought in sequence they're launching many in parallel and in the arc hgi they launched a thousand in parallel for their uh the one that like really shocked everyone that beat the benchmark was they they would launch a thousand in parallel and then they would get the right answer like 80 of the time or 70 of the time 90 maybe even
right there's a variety of like you know it could be complicated it could be simple we don't know what it is just that they are they are not just issuing one chain of thought in sequence they're launching many in parallel and in the arc hgi they launched a thousand in parallel for their uh the one that like really shocked everyone that beat the benchmark was they they would launch a thousand in parallel and then they would get the right answer like 80 of the time or 70 of the time 90 maybe even