Steven Byrnes

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

These are algorithms designed to find actions that maximize an objective by any means available.

530.513 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

That is, you get ruthless sociopathic behavior by default.

536.962 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

There's an image here.

541.469 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Description.

543.352 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Go find someone who was in AI in the 2010s or earlier, before LLMs took over, and they may well have spent a lot of time building or using RL agents and or model-based planning algorithms.

554.138 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

If so, they'll tell you, based on their lived experience, that these kinds of algorithms are ruthless by default, when they work at all, unless the programmers go out of their way to make them non-ruthless.

565.874 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

See for example this 2020 DeepMind blog post on specification gaming.

577.55 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

And how would the programmers go out of their way to make them non-ruthless?

582.758 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

I claim that the answer is not obvious, indeed not even known.

587.247 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

See my look on post, and my silver and sutton post, and more generally my post-behaviorist, RL reward functions lead to scheming for why obvious approaches to non-ruthlessness won't work.

591.677 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Rather, algorithms in this class are naturally, em, let's call them, ruthless ifeers, in the sense that they transmute even innocuous-sounding objectives like it's good if the human is happy into scarier-sounding ones like ruthlessly maximize the probability that the human is happy, which in turn suggests strategies such as forcibly drugging the human.

602.84 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Likewise, the innocuous-sounding it's bad to lie gets ruthless ified into it's bad to get caught lying, and so on.

622.76 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Of course, evolution did go out of its way to make humans non-ruthless by endowing us with social instincts.

630.025 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Maybe future AI programmers will likewise go out of their way to make ASIs non-ruthless?

637.493 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

I hope so, but we need to figure out how.

643.319 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

To be clear, ruthless consequentialism isn't always bad.

646.662 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

I'm happy for ruthless consequentialist AIs to be playing chess, designing chips, etc.,

650.946 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

In principle, I'd even be happy for a ruthless consequentialist AI to be emperor of the universe, creating an awesome future for all.

657.076 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

But making that actually happen would be super dangerous for lots of reasons, for example you might need to operationalize creating an awesome future for all in a loophole-free way.

664.731 View full episode →

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

See also the usual agent debugging loop and its future catastrophic breakdown.

674.57 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment