Dario Amodei

👤 Speaker

1816 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

It has the vibe of a letter from a deceased parent sealed until adulthood.

1844.842 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

We've approached Claude's constitution in this way because we believe that training Claude at the level of identity, character, values, and personality, rather than giving it specific instructions or priorities without explaining the reasons behind them, is more likely to lead to a coherent, wholesome, and balanced psychology and less likely to fall prey to the kinds of traps I discussed above.

1849.042 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Millions of people talk to Claude about an astonishingly diverse range of topics, which makes it impossible to write out a completely comprehensive list of safeguards ahead of time.

1870.996 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Claude's values help it generalize to new situations whenever it is in doubt.

1881.016 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Above, I discuss the idea that models draw upon data from their training process to adopt a persona.

1886.127 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Whereas flaws in that process could cause models to adopt a bad or evil personality, perhaps drawing on archetypes of bad or evil people, the goal of our constitution is to do the opposite.

1892.41 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

To teach Claude a concrete archetype of what it means to be a good AI.

1903.788 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Claude's constitution presents a vision for what a robustly good Claude is like.

1908.054 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

The rest of our training process aims to reinforce the message that Claude lives up to this vision.

1912.561 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

This is like a child forming their identity by imitating the virtues of fictional role models they read about in books.

1918.111 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

We believe that a feasible goal for 2026 is to train Claude in such a way that it almost never goes against the spirit of its constitution.

1925.029 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Getting this right will require an incredible mix of training and steering methods, large and small, some of which Anthropic has been using for years and some of which are currently under development.

1933.062 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

But, difficult as it sounds, I believe this is a realistic goal, though it will require extraordinary and rapid efforts.

1943.567 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

The second thing we can do is develop the science of looking inside AI models to diagnose their behavior so that we can identify problems and fix them.

1950.443 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

This is the science of interpretability, and I've talked about its importance in previous essays.

1959.374 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Even if we do a great job of developing Claude's constitution and apparently training Claude to essentially always adhere to it, legitimate concerns remain.

1965.181 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

As I've noted above, AI models can behave very differently under different circumstances, and as Claude gets more powerful and more capable of acting in the world on a larger scale, it's possible this could bring it into novel situations where previously unobserved problems with its constitutional training emerge.

1974.018 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

I am actually fairly optimistic that Claude's constitutional training will be more robust and novel situations than people might think, because we are increasingly finding that high-level training at the level of character and identity is surprisingly powerful and generalizes well.

1991.112 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

But there's no way to know that for sure, and when we're talking about risks to humanity, it's important to be paranoid and to try to obtain safety and reliability in several different, independent ways.

2005.636 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

One of those ways is to look inside the model itself.

2016.762 View full episode →

← Previous Page 39 of 91 Next →

Report any issue