Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (Curated & Popular)

"Why we should expect ruthless sociopath ASI" by Steven Byrnes

20 Feb 2026

Transcription

Chapter 1: What is the main argument about future artificial superintelligence?

0.031 - 16.975 Steven Byrnes

Why We Should Expect Ruthless Sociopath ASI by Stephen Burns. Published on February 18, 2026. Heading. The conversation begins. fictional, optimist.

0

16.995 - 40.627 Steven Byrnes

So you expect future artificial superintelligence, ASI, by default, that is in the absence of yet-to-be-invented techniques, to be a ruthless sociopath, happy to lie, cheat, and steal, whenever doing so is selfishly beneficial, and with callous indifference to whether anyone, including its own programmers and users, lives or dies. Me? Yup. Alas!

0

41.768 - 62.057 Steven Byrnes

Optimist, despite all the evidence right in front of our eyes from humans and LLMs. Me? Yup. Optimist. Okay, well, I'm here to tell you. That is a very specific and strange thing to expect, especially in the absence of any concrete evidence whatsoever. There's no reason to expect it.

0

63.139 - 84.419 Steven Byrnes

If you think that ruthless sociopathy is the true core nature of intelligence or whatever, then you should really look at yourself in a mirror and ask yourself where your life went horribly wrong. Me? Hmm, I think the true core nature of intelligence is above my pay grade. We should probably just talk about the issue at hand, namely future AI algorithms and their properties.

0

85.481 - 115.687 Steven Byrnes

But I actually agree with you that ruthless sociopathy is a very specific and strange thing for me to expect. Optimist. Wait, you, what? Me? Yes. Like, if you show me some random thing, there's a 99.999% chance that it's not a ruthless sociopath. Instead it might be, like, a dirt clod. Dirt clods are not ruthless sociopaths because they're not intelligent at all. Optimist.

116.528 - 118.109 Steven Byrnes

Oh come on, you know what I mean.

Chapter 2: Are people genuinely concerned about LLMs causing doom?

118.169 - 147.3 Steven Byrnes

I'm not talking about dirt clods. I'm saying, if you pick some random mind, there is no reason at all to expect it to be a ruthless sociopath. Me. How do you pick some random mind? Minds don't just appear out of nowhere. Optimist. Like, a human. Or an AI. Me. Different humans are different to some extent, and different AI algorithms are different to a much, much greater extent.

0

148.381 - 173.569 Steven Byrnes

AI includes everything from A asterisk search to Mu0 to LLMs. Is A asterisk search a ruthless sociopath? Well, I mean, it does seem rather maniacally obsessed with graph traversal. Right? Optimist. Haha, very funny. Please stop being annoyingly pedantic. I obviously didn't mean AI in the sense of the academic discipline.

0

174.611 - 196.846 Steven Byrnes

I meant, like AI in the colloquial sense, AI that qualifies as a mind, like LLMs. I'm mainly talking about human minds and LLM minds, that is all the minds we've ever seen in the real world, rather than in sci-fi. And hey, what a coincidence, approximately equals 100% of those minds are not ruthless sociopaths. Me.

0

197.667 - 216.473 Steven Byrnes

As it happens, the threat model I'm working on is not LLMs, but rather brain-like Artificial General Intelligence, AGI, which, from a safety perspective, is more or less a type of actor-critic model-based reinforcement learning, RL, agent. LLMs are profoundly different from what I'm working on.

0

Chapter 3: What makes brain-like RL-agent ASI potentially ruthless?

216.453 - 240.076 Steven Byrnes

Saying that LLMs will be similar to RL agent AGI because both are AI is like saying that LLMs will be similar to the A asterisk search algorithm because both are AI or that a frogfish will be similar to a human because both are animals. They can still be wildly different in every way that matters. Heading. Are people worried about LLMs causing doom? Optimist.

0

240.937 - 256.77 Steven Byrnes

Okay, but lots of other doomers talk about LLMs causing doom. Me? Well, kinda. I think we need to tease apart two groups of people. Both are sometimes called doomers, but one is much more pessimistic than the other.

0

257.189 - 277.968 Steven Byrnes

This is very caricatured, but the comparatively less pessimistic group, say, probability of doom, probability of human extinction from AI, assuming progress continues in the 5% to 50% range, is a bigger group, and I vaguely associate them with the centre of gravity of the effective altruism movement and anthropic employees.

0

279.029 - 299.994 Steven Byrnes

They definitely do not expect ruthless sociopath ASI as the default path we're on, absent a technical breakthrough, like I'm arguing for here. At most, they'll entertain the idea of ruthless sociopath ASI as an odd hypothetical or as a result of a competitive race to the bottom or from egregiously careless programmers or bad actors, etc.

0

300.975 - 323.683 Steven Byrnes

They're probably equally or more concerned about lots of other potential AI problems, AI-assisted bioterrorism, dictatorships, etc. I'm part of an even more pessimistic group, motto, if anyone builds it, everyone dies, which generally does expect ruthless sociopath ASI as the default path we're on, absent a technical breakthrough, along with other miracles.

324.825 - 345.434 Steven Byrnes

We tend to think 50% chance that humans will survive continued AI development is deliriously over-optimistic. Anyway, the extra heap of concern in that latter camp is not from the MLLMs of today causing near certain doom, or even the somewhat better MLLMs of tomorrow, but rather the wildly better ASIs of maybe soon, maybe not, who knows.

346.556 - 371.071 Steven Byrnes

But even if it's close in calendar time, and even if it comes out of MLLM research, such an ASI would still be systematically different from MLLMs as we know them today. Optimist, aka, you have no evidence. Me, no evidence either way, at least no evidence of that type. Anyway, as I was saying, ASI would be systematically different from today's LLMs because, ahem, where do I start?

372.172 - 392.352 Steven Byrnes

Actually, it would be much easier for me to explain if we start with the ASI threat model that I spend all my time on, and then we can circle back to Svel LMs afterwards. Is that okay? Heading. Positive argument that brain-like RL agent ASI would be a ruthless sociopath. Optimist? Sure.

393.394 - 411.143 Steven Byrnes

We can pause the discussion of LLMs for a few minutes and start in your comfort zone of actor-critic model-based RL agent brain-like ASI. Doesn't really matter anyway. Regardless of the exact algorithm, you clearly need some positive reason to believe that this kind of ASI would be a ruthless sociopath.

Chapter 4: How do LLMs differ from ASI in terms of learning?

446.639 - 461.846 Steven Byrnes

Me? Sure. Back at the start of the conversation, I mentioned that random objects like dirt clods are not able to accomplish impressive feats. I didn't just bring up dirt clods to troll you, rather I was laying the groundwork for a key point.

0

462.728 - 481.512 Steven Byrnes

If we're thinking about AI that can autonomously found, grow, and staff innovative companies for years, or autonomously invent new scientific paradigms, then clearly it's not a random object, but rather a thing that is able to accomplish impressive feats. And the question we should be asking is, how does it do that?

0

482.099 - 504.97 Steven Byrnes

Those things would be astronomically unlikely to happen if the AI were choosing actions at random. So there has to be some explanation for how the AI finds actions that accomplish those impressive feats. So an explanation has to exist. What is it? I claim there are really only two answers that work in practice. The first possible explanation is consequentialism.

0

505.771 - 526.509 Steven Byrnes

The AI accomplishes impressive feats by what amounts to having desires about what winds up happening in the future and running some search process to find actions that lead to those desires getting fulfilled. This is the main thing that you get from RL agents and from model-based planning algorithms. My brain-like AGI scenario would involve both of those at once.

0

527.63 - 543.953 Steven Byrnes

The whole point of those subfields of AI is... These are algorithms designed to find actions that maximize an objective by any means available. That is, you get ruthless sociopathic behavior by default. There's an image here. Description.

545.475 - 553.787 Unknown

And this is not just my armchair theorizing.

554.138 - 576.429 Steven Byrnes

Go find someone who was in AI in the 2010s or earlier, before LLMs took over, and they may well have spent a lot of time building or using RL agents and or model-based planning algorithms. If so, they'll tell you, based on their lived experience, that these kinds of algorithms are ruthless by default, when they work at all, unless the programmers go out of their way to make them non-ruthless.

577.55 - 602.86 Steven Byrnes

See for example this 2020 DeepMind blog post on specification gaming. And how would the programmers go out of their way to make them non-ruthless? I claim that the answer is not obvious, indeed not even known. See my look on post, and my silver and sutton post, and more generally my post-behaviorist, RL reward functions lead to scheming for why obvious approaches to non-ruthlessness won't work.

602.84 - 621.699 Steven Byrnes

Rather, algorithms in this class are naturally, em, let's call them, ruthless ifeers, in the sense that they transmute even innocuous-sounding objectives like it's good if the human is happy into scarier-sounding ones like ruthlessly maximize the probability that the human is happy, which in turn suggests strategies such as forcibly drugging the human.

Comments

There are no comments yet.

Please log in to write the first comment.