Dario Amodei
π€ SpeakerAppearances Over Time
Podcast Appearances
We are investing in a wide range of evaluations so that we can understand the behaviors of our models in the lab, as well as monitoring tools to observe behaviors in the wild, when allowed by customers.
This will be essential for giving us and others the empirical information necessary to make better determinations about how these systems operate and how they break.
We publicly disclose system cards with each model release that aim for completeness and a thorough exploration of possible risks.
Our system cards often run to hundreds of pages and require substantial pre-release effort that we could have spent on pursuing maximal commercial advantage.
We've also broadcasted model behaviours more loudly when we see particularly concerning ones, as with the tendency to engage in blackmail.
The fourth thing we can do is encourage coordination to address autonomy risks at the level of industry and society.
While it is incredibly valuable for individual AI companies to engage in good practices or become good at steering AI models, and to share their findings publicly, the reality is that not all AI companies do this, and the worst ones can still be a danger to everyone even if the best ones have excellent practices.
For example, some AI companies have shown a disturbing negligence towards the sexualization of children in today's models, which makes me doubt that they'll show either the inclination or the ability to address autonomy risks in future models.
In addition, the commercial race between AI companies will only continue to heat up, and while the science of steering models can have some commercial benefits, overall the intensity of the race will make it increasingly hard to focus on addressing autonomy risks.
I believe the only solution is legislation, laws that directly affect the behavior of AI companies or otherwise incentivize R&D to solve these issues.
Here it is worth keeping in mind the warnings I gave at the beginning of this essay about uncertainty and surgical interventions.
We do not know for sure whether autonomy risks will be a serious problem.
As I said, I reject claims that the danger is inevitable or even that something will go wrong by default.
A credible risk of danger is enough for me and for Anthropic to pay quite significant costs to address it, but once we get into regulation, we are forcing a wide range of actors to bear economic costs, and many of these actors don't believe that autonomy risk is real or that AI will become powerful enough for it to be a threat.
I believe these actors are mistaken, but we should be pragmatic about the amount of opposition we expect to see and the dangers of overreach.
There is also a genuine risk that overly prescriptive legislation ends up imposing tests or rules that don't actually improve safety but that waste a lot of time, essentially amounting to safety theatre.
This too would cause backlash and make safety legislation look silly.
Anthropic's view has been that the right place to start is with transparency legislation, which essentially tries to require that every frontier AI company engage in the transparency practices I've described earlier in this section.
California's SB53 and New York's RAISE Act are examples of this kind of legislation, which Anthropic supported and which have successfully passed.
In supporting and helping to craft these laws, we've put a particular focus on trying to minimize collateral damage, for example by exempting smaller companies unlikely to produce frontier models from the law.