Dario Amodei

"Dario Amodei – The Adolescence of Technology" by habryka

What's the likelihood that our AI models would behave in such a way, and under what conditions would they do so?

"Dario Amodei – The Adolescence of Technology" by habryka

As with many issues, it's helpful to think through the spectrum of possible answers to this question by considering two opposite positions.

1003.933 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

The first position is that this simply can't happen because the AI models will be trained to do what humans ask them to do, and it's therefore absurd to imagine that they would do something dangerous unprompted.

1011.637 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

According to this line of thinking, we don't worry about a Roomba or a model airplane going rogue and murdering people because there is nowhere for such impulses to come from.

1022.458 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

So why should we worry about it for AI?

1031.515 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

The problem with this position is that there is now ample evidence, collected over the last few years, that AI systems are unpredictable and difficult to control.

1034.759 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

We've seen behaviors as varied as obsessions, sycophancy, laziness, deception, blackmail, scheming, cheating by hacking software environments, and much more.

1043.311 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

AI companies certainly want to train AI systems to follow human instructions, perhaps with the exception of dangerous or illegal tasks, but the process of doing so is more an art than a science, more akin to growing something than building it.

1053.925 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

We now know that it's a process where many things can go wrong.

1067.678 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

The second, opposite position, held by many who adopt the doomerism I described above, is the pessimistic claim that there are certain dynamics in the training process of powerful AI systems that will inevitably lead them to seek power or deceive humans.

1071.784 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Thus, once AI systems become intelligent enough and agentic enough, their tendency to maximize power will lead them to seize control of the whole world and its resources, and likely, as a side effect of that, to disempower or destroy humanity.

1085.245 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

The usual argument for this, which goes back at least 20 years and probably much earlier, is that if an AI model is trained in a wide variety of environments to agentically achieve a wide variety of goals, for example, writing an app, proving a theorem, designing a drug, etc., there are certain common strategies that help with all of these goals, and one key strategy is gaining as much power as possible in any environment.

1099.893 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

So, after being trained on a large number of diverse environments that involve reasoning about how to accomplish very expansive tasks and where power-seeking is an effective method for accomplishing those tasks, the AI model will generalize the lesson and develop either an inherent tendency to seek power or a tendency to reason about each task it is given in a way that predictably causes it to seek power as a means to accomplish that task.

1124.918 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

They will then apply that tendency to the real world, which to them is just another task, and will seek power in it, at the expense of humans.

1149.353 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

This misaligned power-seeking is the intellectual basis of predictions that AI will inevitably destroy humanity.

1157.643 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

The problem with this pessimistic position is that it mistakes a vague conceptual argument about high-level incentives, one that masks many hidden assumptions, for definitive proof.

1164.291 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

I think people who don't build AI systems every day are wildly miscalibrated on how easy it is for clean-sounding stories to end up being wrong, and how difficult it is to predict AI behavior from first principles, especially when it involves reasoning about generalization over millions of environments, which has over and over again proved mysterious and unpredictable.

1174.463 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

Dealing with the messiness of AI systems for over a decade has made me somewhat skeptical of this overly theoretical mode of thinking.

1194.68 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

One of the most important hidden assumptions, and a place where what we see in practice has diverged from the simple theoretical model, is the implicit assumption that AI models are necessarily monomaniacally focused on a single, coherent, narrow goal, and that they pursue that goal in a clean, consequentialist manner.

1202.093 View full episode →

LessWrong (Curated & Popular)

"Dario Amodei – The Adolescence of Technology" by habryka

In fact, our researchers have found that AI models are vastly more psychologically complex, as our work on introspection or personas shows.

1219.283 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment