Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dario Amodei

πŸ‘€ Speaker
1816 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

It has the vibe of a letter from a deceased parent sealed until adulthood.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

We've approached Claude's constitution in this way because we believe that training Claude at the level of identity, character, values, and personality, rather than giving it specific instructions or priorities without explaining the reasons behind them, is more likely to lead to a coherent, wholesome, and balanced psychology and less likely to fall prey to the kinds of traps I discussed above.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Millions of people talk to Claude about an astonishingly diverse range of topics, which makes it impossible to write out a completely comprehensive list of safeguards ahead of time.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Claude's values help it generalize to new situations whenever it is in doubt.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Above, I discuss the idea that models draw upon data from their training process to adopt a persona.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Whereas flaws in that process could cause models to adopt a bad or evil personality, perhaps drawing on archetypes of bad or evil people, the goal of our constitution is to do the opposite.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

To teach Claude a concrete archetype of what it means to be a good AI.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Claude's constitution presents a vision for what a robustly good Claude is like.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

The rest of our training process aims to reinforce the message that Claude lives up to this vision.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

This is like a child forming their identity by imitating the virtues of fictional role models they read about in books.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

We believe that a feasible goal for 2026 is to train Claude in such a way that it almost never goes against the spirit of its constitution.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Getting this right will require an incredible mix of training and steering methods, large and small, some of which Anthropic has been using for years and some of which are currently under development.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

But, difficult as it sounds, I believe this is a realistic goal, though it will require extraordinary and rapid efforts.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

The second thing we can do is develop the science of looking inside AI models to diagnose their behavior so that we can identify problems and fix them.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

This is the science of interpretability, and I've talked about its importance in previous essays.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Even if we do a great job of developing Claude's constitution and apparently training Claude to essentially always adhere to it, legitimate concerns remain.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

As I've noted above, AI models can behave very differently under different circumstances, and as Claude gets more powerful and more capable of acting in the world on a larger scale, it's possible this could bring it into novel situations where previously unobserved problems with its constitutional training emerge.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

I am actually fairly optimistic that Claude's constitutional training will be more robust and novel situations than people might think, because we are increasingly finding that high-level training at the level of character and identity is surprisingly powerful and generalizes well.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

But there's no way to know that for sure, and when we're talking about risks to humanity, it's important to be paranoid and to try to obtain safety and reliability in several different, independent ways.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

One of those ways is to look inside the model itself.