Dario Amodei

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

I do not think biological attacks will necessarily be carried out the instant it becomes widely possible to do so.

3266.018 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

In fact, I would bet against that.

3272.047 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

But added up across millions of people and a few years of time, I think there is a serious risk of a major attack, and the consequences would be so severe, with casualties potentially in the millions or more, that I believe we have no choice but to take serious measures to prevent it.

3274.792 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Subheading.

3290.268 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Defenses.

3291.77 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

That brings us to how to defend against these risks.

3293.492 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Here I see three things we can do.

3297.037 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

First, AI companies can put guardrails on their models to prevent them from helping to produce bioweapons.

3299.901 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Anthropic is very actively doing this.

3306.57 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Claude's constitution, which mostly focuses on high-level principles and values, has a small number of specific hardline prohibitions, and one of them relates to helping with the production of biological, or chemical, or nuclear, or radiological, weapons.

3309.657 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

But all models can be jailbroken, and so as a second line of defence, we've implemented, since mid-2025, when our tests showed our models were starting to get close to the threshold where they might begin to pose a risk, a classifier that specifically detects and blocks bioweapon-related outputs.

3324.659 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

We regularly upgrade and improve these classifiers, and have generally found them highly robust even against sophisticated adversarial attacks.

3342.192 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

These classifiers increase the costs to serve our models measurably.

3351.122 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

In some models, they are close to 5% of total inference costs and thus cut into our margins, but we feel that using them is the right thing to do.

3354.806 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

To their credit, some other AI companies have implemented classifiers as well.

3363.977 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

But not every company has, and there is also nothing requiring companies to keep their classifiers.

3368.872 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

I am concerned that over time there may be a prisoner's dilemma where companies can defect and lower their costs by removing classifiers.

3375.14 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

This is once again a classic negative externalities problem that can't be solved by the voluntary actions of Anthropic or any other single company alone.

3382.61 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Voluntary industry standards may help, as may third-party evaluations and verification of the type done by AI security institutes and third-party evaluators.

3390.961 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

But ultimately defence may require government action, which is the second thing we can do.

3401.115 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment