Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Steven Byrnes

๐Ÿ‘ค Speaker
266 total appearances

Appearances Over Time

Podcast Appearances

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

These are algorithms designed to find actions that maximize an objective by any means available.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

That is, you get ruthless sociopathic behavior by default.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

There's an image here.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Description.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Go find someone who was in AI in the 2010s or earlier, before LLMs took over, and they may well have spent a lot of time building or using RL agents and or model-based planning algorithms.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

If so, they'll tell you, based on their lived experience, that these kinds of algorithms are ruthless by default, when they work at all, unless the programmers go out of their way to make them non-ruthless.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

See for example this 2020 DeepMind blog post on specification gaming.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

And how would the programmers go out of their way to make them non-ruthless?

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

I claim that the answer is not obvious, indeed not even known.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

See my look on post, and my silver and sutton post, and more generally my post-behaviorist, RL reward functions lead to scheming for why obvious approaches to non-ruthlessness won't work.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Rather, algorithms in this class are naturally, em, let's call them, ruthless ifeers, in the sense that they transmute even innocuous-sounding objectives like it's good if the human is happy into scarier-sounding ones like ruthlessly maximize the probability that the human is happy, which in turn suggests strategies such as forcibly drugging the human.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Likewise, the innocuous-sounding it's bad to lie gets ruthless ified into it's bad to get caught lying, and so on.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Of course, evolution did go out of its way to make humans non-ruthless by endowing us with social instincts.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

Maybe future AI programmers will likewise go out of their way to make ASIs non-ruthless?

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

I hope so, but we need to figure out how.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

To be clear, ruthless consequentialism isn't always bad.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

I'm happy for ruthless consequentialist AIs to be playing chess, designing chips, etc.,

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

In principle, I'd even be happy for a ruthless consequentialist AI to be emperor of the universe, creating an awesome future for all.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

But making that actually happen would be super dangerous for lots of reasons, for example you might need to operationalize creating an awesome future for all in a loophole-free way.

LessWrong (Curated & Popular)
"Why we should expect ruthless sociopath ASI" by Steven Byrnes

See also the usual agent debugging loop and its future catastrophic breakdown.