Dario Amodei
π€ SpeakerAppearances Over Time
Podcast Appearances
I do not think biological attacks will necessarily be carried out the instant it becomes widely possible to do so.
In fact, I would bet against that.
But added up across millions of people and a few years of time, I think there is a serious risk of a major attack, and the consequences would be so severe, with casualties potentially in the millions or more, that I believe we have no choice but to take serious measures to prevent it.
Subheading.
Defenses.
That brings us to how to defend against these risks.
Here I see three things we can do.
First, AI companies can put guardrails on their models to prevent them from helping to produce bioweapons.
Anthropic is very actively doing this.
Claude's constitution, which mostly focuses on high-level principles and values, has a small number of specific hardline prohibitions, and one of them relates to helping with the production of biological, or chemical, or nuclear, or radiological, weapons.
But all models can be jailbroken, and so as a second line of defence, we've implemented, since mid-2025, when our tests showed our models were starting to get close to the threshold where they might begin to pose a risk, a classifier that specifically detects and blocks bioweapon-related outputs.
We regularly upgrade and improve these classifiers, and have generally found them highly robust even against sophisticated adversarial attacks.
These classifiers increase the costs to serve our models measurably.
In some models, they are close to 5% of total inference costs and thus cut into our margins, but we feel that using them is the right thing to do.
To their credit, some other AI companies have implemented classifiers as well.
But not every company has, and there is also nothing requiring companies to keep their classifiers.
I am concerned that over time there may be a prisoner's dilemma where companies can defect and lower their costs by removing classifiers.
This is once again a classic negative externalities problem that can't be solved by the voluntary actions of Anthropic or any other single company alone.
Voluntary industry standards may help, as may third-party evaluations and verification of the type done by AI security institutes and third-party evaluators.
But ultimately defence may require government action, which is the second thing we can do.