Nathan Lambert
๐ค SpeakerAppearances Over Time
Podcast Appearances
Dario explicitly said Claude 3.5 Sonnet was trained like nine months or nine to 10 months ago, nine to 10 months ago. And I think it took them another like handful of months to release it. Right. So it's like there is there is a significant gap here. Right. And especially with reasoning models, the word in the San Francisco street is that like Anthropic has a better model than 03. Right.
Dario explicitly said Claude 3.5 Sonnet was trained like nine months or nine to 10 months ago, nine to 10 months ago. And I think it took them another like handful of months to release it. Right. So it's like there is there is a significant gap here. Right. And especially with reasoning models, the word in the San Francisco street is that like Anthropic has a better model than 03. Right.
Dario explicitly said Claude 3.5 Sonnet was trained like nine months or nine to 10 months ago, nine to 10 months ago. And I think it took them another like handful of months to release it. Right. So it's like there is there is a significant gap here. Right. And especially with reasoning models, the word in the San Francisco street is that like Anthropic has a better model than 03. Right.
And they won't release it. Why? Because chains of thought are scary. Right. And they are legitimately scary. Right. If you look at R1, it flips back and forth between Chinese and English. Sometimes it's gibberish. And then the right answer comes out. Right. And like for you and I, it's like great.
And they won't release it. Why? Because chains of thought are scary. Right. And they are legitimately scary. Right. If you look at R1, it flips back and forth between Chinese and English. Sometimes it's gibberish. And then the right answer comes out. Right. And like for you and I, it's like great.
And they won't release it. Why? Because chains of thought are scary. Right. And they are legitimately scary. Right. If you look at R1, it flips back and forth between Chinese and English. Sometimes it's gibberish. And then the right answer comes out. Right. And like for you and I, it's like great.
doing this it's amazing I mean you talked about that sort of like chain of thought for that philosophical thing which is not something they trained it to be philosophically good it's just sort of an artifact of the chain of thought training it did but like that's super important in that like Can I inspect your mind and what you're thinking right now? No.
doing this it's amazing I mean you talked about that sort of like chain of thought for that philosophical thing which is not something they trained it to be philosophically good it's just sort of an artifact of the chain of thought training it did but like that's super important in that like Can I inspect your mind and what you're thinking right now? No.
doing this it's amazing I mean you talked about that sort of like chain of thought for that philosophical thing which is not something they trained it to be philosophically good it's just sort of an artifact of the chain of thought training it did but like that's super important in that like Can I inspect your mind and what you're thinking right now? No.
And so I don't know if you're lying to my face. And chain of thought models are that way, right? Like this is a true quote unquote risk between, you know, a chat application where, hey, I asked the model to say, you know, bad words or whatever, or how to make anthrax. And it tells me that's unsafe. Sure. But that's something I can get out relatively easily.
And so I don't know if you're lying to my face. And chain of thought models are that way, right? Like this is a true quote unquote risk between, you know, a chat application where, hey, I asked the model to say, you know, bad words or whatever, or how to make anthrax. And it tells me that's unsafe. Sure. But that's something I can get out relatively easily.
And so I don't know if you're lying to my face. And chain of thought models are that way, right? Like this is a true quote unquote risk between, you know, a chat application where, hey, I asked the model to say, you know, bad words or whatever, or how to make anthrax. And it tells me that's unsafe. Sure. But that's something I can get out relatively easily.
What if I tell the AI to do a task and then it does the task automatically? all of a sudden randomly in a way that I don't want it, right? And now that has like much more task versus like response is very different, right? So the bar for safety is much higher. At least this is Anthropic's case, right? Like for deep seek, they're like ship, right? Yeah.
What if I tell the AI to do a task and then it does the task automatically? all of a sudden randomly in a way that I don't want it, right? And now that has like much more task versus like response is very different, right? So the bar for safety is much higher. At least this is Anthropic's case, right? Like for deep seek, they're like ship, right? Yeah.
What if I tell the AI to do a task and then it does the task automatically? all of a sudden randomly in a way that I don't want it, right? And now that has like much more task versus like response is very different, right? So the bar for safety is much higher. At least this is Anthropic's case, right? Like for deep seek, they're like ship, right? Yeah.
And they killed that dog, right? And all these things, right? So it's like.
And they killed that dog, right? And all these things, right? So it's like.
And they killed that dog, right? And all these things, right? So it's like.
And there's an interesting aspect of just because it's open-weighted or open-sourced doesn't mean it can't be subverted, right? There have been many open-source software bugs that have been like... For example, there was a Linux bug that was found after 10 years, which was clearly a backdoor because somebody was like, why is this taking half a second to load? This is the recent one.
And there's an interesting aspect of just because it's open-weighted or open-sourced doesn't mean it can't be subverted, right? There have been many open-source software bugs that have been like... For example, there was a Linux bug that was found after 10 years, which was clearly a backdoor because somebody was like, why is this taking half a second to load? This is the recent one.