Dario Amodei
๐ค SpeakerAppearances Over Time
Podcast Appearances
So we're using this document as kind of the control rod in a loop to train the model.
And so essentially, Claude is an AI model whose fundamental principle is to follow this Constitution.
And I think a really interesting lesson we've learned
Early versions of the Constitution were very prescriptive.
They were very much about rules.
So we would say, you know, Claude should not tell the user how to hotwire a car.
Claude should not discuss politically sensitive topics.
But...
As we've worked on this for several years, we've come to the conclusion that the most robust way to train these models is to train them at the level of principles and reasons.
So now we say, you know, Claude is a model.
It's under a contract.
You know, its goal is to serve the interests of the user, but it has to protect third parties.
Claude aims to be, you know,
helpful, honest, and harmless.
Claude aims to consider a wide variety of interests.
We tell the model about how the model was trained.
We tell it about how it's situated in the world, the job it's trying to do for Anthropic, what Anthropic is aiming to achieve in the world, that it has a duty to be ethical and respect human life.
And we let it derive its rules from that.
Now, there are still some hard rules.
For