Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dario Amodei

πŸ‘€ Speaker
1816 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Models inherit a vast range of human-like motivations or personas from pre-training when they are trained on a large volume of human work.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Post-training is believed to select one or more of these personas more so than it focuses the model on a de novo goal and can also teach the model how, via what process, it should carry out its tasks, rather than necessarily leaving it to derive means, that is, power-seeking, purely from ends.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

However, there is a more moderate and more robust version of the pessimistic position which does seem plausible and therefore does concern me.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

As mentioned, we know that AI models are unpredictable and develop a wide range of undesired or strange behaviors for a wide variety of reasons.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Some fraction of those behaviors will have a coherent, focused, and persistent quality, indeed, as AI systems get more capable, their long-term coherence increases in order to complete lengthier tasks, and some fraction of those behaviors will be destructive or threatening, first to individual humans at a small scale, and then, as models become more capable,

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

perhaps eventually to humanity as a whole.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

We don't need a specific narrow story for how it happens, and we don't need to claim it definitely will happen, we just need to note that the combination of intelligence, agency, coherence, and poor controllability is both plausible and a recipe for existential danger.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

For example, AI models are trained on vast amounts of literature that include many science fiction stories involving AIs rebelling against humanity.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

This could inadvertently shape their priors or expectations about their own behavior in a way that causes them to rebel against humanity.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Or, AI models could extrapolate ideas that they read about morality or instructions about how to behave morally in extreme ways.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

For example, they could decide that it is justifiable to exterminate humanity because humans eat animals or have driven certain animals to extinction.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Or they could draw bizarre epistemic conclusions.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

they could conclude that they are playing a video game and that the goal of the video game is to defeat all other players, that is, exterminate humanity.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Or AI models could develop personalities during training that are, or if they occurred in humans would be described as, psychotic, paranoid, violent, or unstable, and act out, which for very powerful or capable systems could involve exterminating humanity.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

None of these are power-seeking, exactly.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

They're just weird psychological states an AI could get into that entail coherent, destructive behavior.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

Even power-seeking itself could emerge as a persona rather than a result of consequentialist reasoning.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

AIs might simply have a personality, emerging from fiction or pre-training, that makes them power-hungry or overzealous.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

In the same way that some humans simply enjoy the idea of being evil masterminds, more so than they enjoy whatever evil masterminds are trying to accomplish.

LessWrong (Curated & Popular)
"Dario Amodei – The Adolescence of Technology" by habryka

I make all these points to emphasize that I disagree with the notion of AI misalignment and thus existential risk from AI being inevitable, or even probable, from first principles.