Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dario Amodei

πŸ‘€ Speaker
1816 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So I think the ASL-3 measures are, I won't say straightforward, they're rigorous, but they're easier to reason about. I think once we get to ASL-4, we start to have worries about the models being smart enough that they might sandbag tests. They might not tell the truth about tests.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We had some results came out about like sleeper agents and there was a more recent paper about, you know, can the models mislead attempts to, you know, sandbag their own abilities, right? Show them, you know, present themselves as being less capable than they are. And so I think with ASL 4, there's going to be an important component of using other things than just interacting with the models.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

For example, interpretability or hidden chains of thought, where you have to look inside the model and verify via some other mechanism that is not as easily corrupted as what the model says, that the model indeed has some property. So we're still working on ASL 4. One of the

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

properties of the rsp is that we we don't specify asl4 until we've hit asl3 be and and i think that's proven to be a wise decision because even with asl3 it again it's hard to know this stuff in detail and and it we want to take as much time as we can possibly take to get these things right so for asl3 the bad actor will be the humans humans yes and so there's a little bit more um

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

For ASL 4, it's both, I think.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, I mean, of course you can hook up the mechanistic interpretability to the model itself, but then you've kind of lost it as a reliable indicator of the model state. There are a bunch of exotic ways you can think of that it might also not be reliable. Like if the model gets smart enough that it can like, jump computers and read the code where you're looking at its internal state.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We've thought about some of those. I think they're exotic enough. There are ways to render them unlikely. But yeah, generally, you want to preserve mechanistic interpretability as a kind of verification set or test set that's separate from the training process of the model.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Oh, yeah. Yeah. It's actually like, you know, we've seen lots of examples of demagoguery in our life from humans. And, you know, there's a concern that models could do that as well.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, it's actually relatively simple. So Claude has had for a long time since Claude 3 back in March, the ability to analyze images and respond to them with text. The only new thing we added is those images can be screenshots of a computer. And in response, we train the model to give a location on the screen where you can click and or buttons on the keyboard you can press in order to take action.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And it turns out that with actually not all that much additional training, the models can get quite good at that task. It's a good example of generalization. You know, people sometimes say if you get to low Earth orbit, you're like halfway to anywhere, right, because of how much it takes to escape the gravity.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Well, if you have a strong pre-trained model, I feel like you're halfway to anywhere in terms of the intelligence space. And so actually it didn't take all that much to get Claude to do this. And you can just set that in a loop, give the model a screenshot, tell it what to click on, give it the next screenshot, tell it what to click on.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And that turns into a full kind of almost 3D video interaction of the model. And it's able to do all of these tasks, right? You know, we, we showed these demos where it's able to like fill out spreadsheets. It's able to kind of like interact with a website.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's able to, you know, um, you know, it's able to open all kinds of, you know, programs, different operating systems, windows, Linux, Mac, uh, uh, So, you know, I think all of that is very exciting. I will say, while in theory, there's nothing you could do there that you couldn't have done through just giving the model the API to drive the computer screen, this really lowers the barrier.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And, you know, there's a lot of folks who either, you know, kind of aren't in a position to interact with those APIs or it takes them a long time to do. It's just the screen is just a universal interface that's a lot easier to interact with. And so I expect... Over time, this is going to lower a bunch of barriers. Now, honestly, the current model has – it leaves a lot still to be desired.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And we were honest about that in the blog, right? It makes mistakes. It misclicks. And we were careful to warn people, hey, this thing isn't – you can't just leave this thing to run on your computer for minutes and minutes. You got to give this thing boundaries and guardrails.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And I think that's one of the reasons we released it first in an API form rather than kind of, you know, this kind of just hands it to the consumer and give it control of their computer. But, you know, I definitely feel that it's important to get these capabilities out there. As models get more powerful, we're going to have to grapple with, you know, how do we use these capabilities safely?

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

How do we prevent them from being abused? And, you know, I think releasing the model while the capabilities are still limited is very helpful in terms of doing that. You know, I think since it's been released, a number of customers, I think Replit was maybe one of the most quickest to deploy things, have made use of it in various ways. People have hooked up demos for, you know, Windows desktops.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Macs, Linux machines. So yeah, it's been very exciting. I think as with anything else, it comes with new exciting abilities. And then with those new exciting abilities, we have to think about how to make the model safe, reliable, do what humans want them to do. I mean, it's the same story for everything, right? Same thing. It's that same tension.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, I think speaking at a high level, it's our intention to keep investing a lot in, you know, making the model better. Like, I think... We look at some of the benchmarks where previous models were like, oh, I could do it 6% of the time. And now our model would do it 14% or 22% of the time. And yeah, we want to get up to the human level reliability of 80%, 90%, just like anywhere else.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We're on the same curve that we were on with SweBench, where I think I would guess a year from now, the models can do this very, very reliably. But you got to start somewhere.