Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dario Amodei

πŸ‘€ Speaker
1816 total appearances

Appearances Over Time

Podcast Appearances

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So we have this thing where it's surprisingly hard to address these risks because they're not here today. They don't exist. They're like ghosts, but they're coming at us so fast because the models are improving so fast. So how do you deal with something that's not here today, doesn't exist, but is coming at us very fast.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So the solution we came up with for that in collaboration with people like the organization Meter and Paul Cristiano is, okay, what you need for that are you need tests to tell you when the risk is getting close. You need an early warning system.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And so every time we have a new model, we test it for its capability to do these CBRN tasks, as well as testing it for, you know, how capable it is of doing tasks autonomously on its own. And in the latest version of our RSP, which we released in the last in the last month or two,

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

The way we test autonomy risks is the model, the AI model's ability to do aspects of AI research itself, which when the model, when the AI models can do AI research, they become kind of truly, truly autonomous. And that, you know, that threshold is important for a bunch of other ways. And so what do we then do with these tasks?

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

The RSP basically develops what we've called an if-then structure, which is if the models pass a certain capability, then we impose a certain set of safety and security requirements on them. So today's models are what's called ASL2. Models that were, ASL 1 is for systems that manifestly don't pose any risk of autonomy or misuse. So for example, a chess playing bot, Deep Blue would be ASL 1.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's just manifestly the case that you can't use Deep Blue for anything other than chess. It was just designed for chess. No one's going to use it to like, you know, to conduct a masterful cyber attack or to, you know, run wild and take over the world.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

ASL2 is today's AI systems where we've measured them and we think these systems are simply not smart enough to autonomously self-replicate or conduct a bunch of tasks and also not smart enough to provide meaningful information about CBRN risks and how to build CBRN weapons above and beyond what can be known from looking at Google.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Uh, in fact, sometimes they do provide information, but, but not above and beyond the search engine, but not in a way that can be stitched together. Um, not, not in a way that kind of end to end is dangerous enough. So ASL three is going to be the point at which, uh, the models are helpful enough to enhance the capabilities of non-state actors, right?

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

State actors can already do a lot of, unfortunately, to a high level of proficiency, a lot of these very dangerous and destructive things. The difference is that non-state actors are not capable of it. And so when we get to ASL 3, we'll take special security precautions

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

designed to be sufficient to prevent theft of the model by non-state actors and misuse of the model as it's deployed, will have to have enhanced filters targeted at these particular areas. Cyber, bio, nuclear. Cyber, bio, nuclear, and model autonomy, which is less a misuse risk and more a risk of the model doing bad things itself.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

So ASL 4 getting to the point where these models could enhance the capability of a already knowledgeable state actor and or become the main source of such a risk. Like if you wanted to engage in such a risk, the main way you would do it is through a model. And then I think ASL 4 on the autonomy side. It's some amount of acceleration in AI research capabilities with an AI model.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And then ASL 5 is where we would get to the models that are kind of truly capable, that could exceed humanity in their ability to do any of these tasks. And so the point of the if-then structure commitment is basically to say, look, I don't know. I've been working with these models for many years and I've been worried about risk for many years. It's actually kind of dangerous to cry wolf.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

It's actually kind of dangerous to say this model is risky. And people look at it and they say this is manifestly not dangerous. Again, it's the... The delicacy of the risk isn't here today, but it's coming at us fast. How do you deal with that? It's really vexing to a risk planner to deal with it. And so this if then structure basically says, look, we don't want to antagonize a bunch of people.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We don't want to harm our own, you know, our kind of own ability to have a place in the conversation by imposing these these. very onerous burdens on models that are not dangerous today. So the if-then, the trigger commitment is basically a way to deal with this. It says you clamp down hard when you can show that the model is dangerous.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

And of course, what has to come with that is enough of a buffer threshold that you can you know, you're not at high risk of kind of missing the danger. It's not a perfect framework.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

We've had to change it every, you know, we came out with a new one just a few weeks ago and probably going forward, we might release new ones multiple times a year because it's hard to get these policies right, like technically, organizationally, from a research perspective. But that is the proposal. If then commitments and triggers in order to minimize burdens and false alarms now,

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

but really react appropriately when the dangers are here.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, so that is hotly debated within the company. We are working actively to prepare ASL 3 security plans security measures as well as ASL 3 deployment measures. I'm not going to go into detail, but we've made a lot of progress on both and we're prepared to be, I think, ready quite soon. I would not be surprised at all if we hit ASL 3 next year.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

There was some concern that we might even hit it this year. That's still possible. That could still happen. It's very hard to say, but I would be very, very surprised if it was like 2030. I think it's much sooner than that.

Lex Fridman Podcast
#452 – Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity

Yeah, I think for ASL 3, it's primarily about security and about... you know, filters on the model relating to a very narrow set of areas when we deploy the model, because at ASL three, the model isn't autonomous yet. And so you don't have to worry about, you know, kind of the model itself behaving in a bad way, even when it's deployed internally.