Dario Amodei

"Dario Amodei – The Adolescence of Technology" by habryka

Models inherit a vast range of human-like motivations or personas from pre-training when they are trained on a large volume of human work.

1228.52 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Post-training is believed to select one or more of these personas more so than it focuses the model on a de novo goal and can also teach the model how, via what process, it should carry out its tasks, rather than necessarily leaving it to derive means, that is, power-seeking, purely from ends.

1236.715 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

However, there is a more moderate and more robust version of the pessimistic position which does seem plausible and therefore does concern me.

1254.16 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

As mentioned, we know that AI models are unpredictable and develop a wide range of undesired or strange behaviors for a wide variety of reasons.

1261.724 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Some fraction of those behaviors will have a coherent, focused, and persistent quality, indeed, as AI systems get more capable, their long-term coherence increases in order to complete lengthier tasks, and some fraction of those behaviors will be destructive or threatening, first to individual humans at a small scale, and then, as models become more capable,

1270.995 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

perhaps eventually to humanity as a whole.

1290.698 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

We don't need a specific narrow story for how it happens, and we don't need to claim it definitely will happen, we just need to note that the combination of intelligence, agency, coherence, and poor controllability is both plausible and a recipe for existential danger.

1294.023 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

For example, AI models are trained on vast amounts of literature that include many science fiction stories involving AIs rebelling against humanity.

1308.721 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

This could inadvertently shape their priors or expectations about their own behavior in a way that causes them to rebel against humanity.

1317.524 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Or, AI models could extrapolate ideas that they read about morality or instructions about how to behave morally in extreme ways.

1324.921 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

For example, they could decide that it is justifiable to exterminate humanity because humans eat animals or have driven certain animals to extinction.

1333.112 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Or they could draw bizarre epistemic conclusions.

1341.683 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

they could conclude that they are playing a video game and that the goal of the video game is to defeat all other players, that is, exterminate humanity.

1344.8 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Or AI models could develop personalities during training that are, or if they occurred in humans would be described as, psychotic, paranoid, violent, or unstable, and act out, which for very powerful or capable systems could involve exterminating humanity.

1352.972 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

None of these are power-seeking, exactly.

1368.239 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

They're just weird psychological states an AI could get into that entail coherent, destructive behavior.

1371.348 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Even power-seeking itself could emerge as a persona rather than a result of consequentialist reasoning.

1377.847 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

AIs might simply have a personality, emerging from fiction or pre-training, that makes them power-hungry or overzealous.

1383.578 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

In the same way that some humans simply enjoy the idea of being evil masterminds, more so than they enjoy whatever evil masterminds are trying to accomplish.

1390.588 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

I make all these points to emphasize that I disagree with the notion of AI misalignment and thus existential risk from AI being inevitable, or even probable, from first principles.

1399.82 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment