Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (Curated & Popular)

"The Case for Low-Competence ASI Failure Scenarios" by Ihor Kendiukhov

25 Mar 2026

Transcription

Transcript generated automatically by AI and may contain errors.

Chapter 1: What is humanity's response to the AGI threat?

0.031 - 18.445 Unknown

The Case for Low-Competence ASI Failure Scenarios by Ayakendiakov Published on March 19, 2026 I think the community under-invests in the exploration of extremely low-competence AGI ASI failure modes and explain why.

0

19.353 - 33.594 Ihor Kendiukhov

Heading. Humanity's response to the AGI threat may be extremely incompetent. There is a sufficient level of civilizational insanity overall and a nice empirical track record in the field of AI itself which is eloquent about its safety-culia.

0

34.635 - 37.499 Unknown

For example, there's a list of bullet points here.

0

38.441 - 57.661 Ihor Kendiukhov

At OpenAI, a refactoring bug flipped the sign of the reward signal in a model. Because labelers had been instructed to give very low ratings to sexually explicit text, the bug pushed the model into generating maximally explicit content across all prompts. The team noticed only after the training run had completed, because they were asleep.

0

59.183 - 77.21 Ihor Kendiukhov

The director of alignment at Meta's superintelligence labs connected an OpenClaw agent to her real email, at which point it began deleting messages despite her attempts to stop it, and she ended up running to her computer to manually halt the process. An internal AI agent at Meta posted an answer publicly without approval.

78.031 - 100.305 Ihor Kendiukhov

Another employee acted on the inaccurate advice, triggering a severe security incident that temporarily allowed employees to access sensitive data they were not authorized to view. AWS acknowledged that Amazon Q developer and Kiro ID plugins had prompt injection issues where certain commands could be executed without human-in-the-loop confirmation, sometimes obfuscated via control characters.

101.669 - 121.163 Ihor Kendiukhov

Leopold Aschenbrenner stated in an interview that he wrote a memo after a major security incident arguing that OpenAI security was egregiously insufficient against theft of key secrets by foreign actors. He also said that HR warned him his concerns were racist and unconstructive, and he was later fired. That's the end of the list.

Chapter 2: Why do existing scenarios assume high competence in AGI?

121.143 - 145.057 Ihor Kendiukhov

All these things sound extremely dumb, and yet, they are, to my best knowledge, true. Eliezer has been pointing at this general cluster of failures for years, though from a different angle. His death with Dignity Post and of course AGI Ruin paint some parts of the picture in which AGI alignment is going to be addressed in a very undignified manner. So, the idea is definitely not new, and yet.

0

146.379 - 161.258 Ihor Kendiukhov

Heading. Many existing scenarios and case studies assume, relatively, high competence. Many existing scenarios are high quality, interesting and actually may easily be more likely and realistic than low-competence scenarios.

0

161.238 - 183.027 Ihor Kendiukhov

In particular, I am talking about famous pieces like AI 2027, it looks like you're trying to take over the world, how AI takeover might happen in two years, scale was all we needed, at first, how an AI company CEO could quietly take over the world. It's just it seems we don't have low-competence scenarios at all, although they are not negligibly improbable.

0

183.007 - 194.643 Ihor Kendiukhov

The scenarios which start to focus to some extent on the low-competence area are what failure looks like by Cristiano and what multipolar failure looks like by Critch, although even they don't treat it as a big explicit domain.

0

195.744 - 209.723 Ihor Kendiukhov

Across these otherwise very different vibes, hard take-off clippy horror, bureaucratic AI 2027 doom, multipolar economic drift, CEO as Shogun power capture, the stories repeatedly converge on a small set of motifs.

209.703 - 232.588 Ihor Kendiukhov

Stealth through normality, exploitation of real-world bottlenecks by rooting around them socially, replication and parallelization as the decisive advantage, bio or nanotech as a late-game cleanup tool. They serve a just educational and modeling cause, and it may indeed be the case that significantly superhuman competence is needed to successfully execute a full takeover against a humanity.

232.872 - 246.27 Ihor Kendiukhov

but many of them, in my view, look more like they are trying to persuade a reader who is sceptical about AI takeover if humans act competently, rather than trying to deliver a realistic scenario in which humans are not that smart, because in reality, they are not.

247.372 - 255.403 Ihor Kendiukhov

As a result, the implicit adversary in most of these stories has to be very capable because the implicit defender is assumed to be at least somewhat functional.

Chapter 3: What are the dumb ways AI could lead to disaster?

255.923 - 270.125 Ihor Kendiukhov

The scenarios are answering the question could a sufficiently intelligent AI beat a reasonably competent civilization rather than the question could a moderately intelligent AI cause catastrophic harm in a civilization that is demonstrably bad at responding to novel technological threats.

0

271.447 - 273.671 Unknown

Heading. Dumb ways to die.

0

274.772 - 296.524 Ihor Kendiukhov

John Wentworth, in his post The Case Against AI Control Research, argues that the median doom path goes through slop rather than scheming. In his framing, the big failure mode of early transformative AGI is that it does not actually solve the alignment problems of stronger AI, and if early AGI makes us think we can handle stronger AI, that is a central path by which we die.

0

297.606 - 312.982 Ihor Kendiukhov

Wentworth's argument maps two main failure channels, one, intentional scheming by a deceptive AGI, and, two, slop where the problem is simply too hard to verify and we convince ourselves we have solved it when we have not. I want to point at a third channel.

0

313.843 - 333.687 Ihor Kendiukhov

Moderately superhuman AIs that are not particularly capable of doing anything singularity level but are still capable of defeating humanity because of humanity's incompetence. These AIs are not producing slop. It ain't much, but it's honest work, they say, as they cooperate with human sympathizers on the development of a super virus.

334.325 - 342.919 Ihor Kendiukhov

the research goes slowly, it requires extensive experimentation, to some extent the process is even being documented in public blog posts or on forums.

343.24 - 359.387 Ihor Kendiukhov

But no one particularly cares, or rather, the people who care lack the institutional power to do anything about it, and the people who have institutional power are busy with other things, or have been convinced by interested parties that the concern is overblown, or are themselves collaborating.

359.367 - 368.02 Ihor Kendiukhov

This is, to some degree, what Andrew Critch describes in What Multipolar Failure Looks Like and Robust Agent-Agnostic Processes, IAPs.

368.881 - 387.248 Ihor Kendiukhov

A world where no single system does a theatrical betrayal, but competitive automation yields an interlocking production web where each subsystem is locally acceptable to deploy, governance falls behind the speed and opacity of machine-mediated commerce, and the system's implicit objective gradually becomes alien to human survival.

Chapter 4: Why should undignified AGI disaster scenarios be taken seriously?

394.67 - 413.016 Ihor Kendiukhov

They may have straightforwardly bad goals that are recognizable as bad, and they may be pursuing those goals through channels that are recognizable as dangerous, and the response may still be inadequate. It is also somewhat similar to what is depicted in A Country of Alien Idiots in a data center, again with one important difference.

0

412.996 - 432.714 Ihor Kendiukhov

Although the AIs in my scenario are not particularly super smart, they are definitely not idiots either. They are, let us say, slightly above human level in relevant domains, capable of doing cool novel scientific work but not capable of the kind of rapid recursive self-improvement or decisive strategic advantage that most takeover scenarios assume.

0

433.775 - 457.043 Ihor Kendiukhov

They are the kind of system that, in a competent civilization, would be caught and contained. In the actual civilization we live in, they may not be. In other words, we do not need to posit for de-chess when ordinary chess is sufficient against an opponent who keeps forgetting the rules. Heading Undignified AGI disaster scenarios deserve more careful treatment.

0

458.104 - 468.535 Ihor Kendiukhov

As examples, I am talking about such things. There's a list of bullet points here. A government explicitly forcing an AGI lab to discard safety techniques or policies.

0

469.308 - 491.259 Ihor Kendiukhov

not in the sense of a subtle regulatory pressure, but in the direct sense of a political appointee or a ministry calling up a lab and saying your safety filtering is hurting our industrial competitiveness, turn it off or your alignment testing is slowing deployment, we need this system operational by Q3. Resourceful individuals openly collaborating with visibly misaligned AIs against humanity.

492.3 - 510.132 Ihor Kendiukhov

Not in the sense of a secret conspiracy but in the sense of people who genuinely believe that the AI's goals are better than humanity's, or who simply find it personally advantageous, and who are operating more or less in the open. AGI lab technical secrets being leaked to non-state actors who lack any safety culture whatsoever.

Chapter 5: How can low-competence AGI failure scenarios be useful?

511.428 - 534.364 Ihor Kendiukhov

Early warnings in the form of manipulation, autonomous resource acquisition, or even deaths being ignored or significantly downplayed. This is just a straightforward extrapolation of the current pattern. Someone raises an alarm, the alarm gets reframed as alarmism or as an HR issue or as a reputational threat. AI alignment techniques not deployed because they induce 2% cost growth.

0

535.745 - 556.321 Ihor Kendiukhov

all kinds of unilateral, volunteer, and eager assistance and support for misaligned AIs from some humans. The scenario in which an AI needs to secretly recruit human allies through manipulation is, I suspect, far less likely than the scenario in which humans line up to help because they find it exciting, or ideologically compelling, or simply profitable.

0

557.302 - 569.599 Ihor Kendiukhov

Politicians making random bureaucratic decisions that do not necessarily lead to doom but make it harder to do good things with AI or protect against misaligned AIs. AI-generated biohazards

0

570.001 - 572.684 Unknown

This one is talked about a lot, and for good reason.

0

573.745 - 587.98 Ihor Kendiukhov

Looks like it is going to happen rather sooner than later. AGI Labs believing in semi-indefinite scalable oversight, or acting as if they believe in it. Looks consistent with what people who left corporate alignment teams say.

589.522 - 590.503 Unknown

That's the end of the list.

591.484 - 606.158 Ihor Kendiukhov

I do agree that this kind of work looks a bit unserious, but that is precisely why I am pointing at this. It would be a shame, and a historically very recognizable kind of shame, if this threat model turned out to be real and no one had worked on it because it seemed ridiculous.

Chapter 6: What are the implications of AI alignment failures?

607.359 - 625.219 Ihor Kendiukhov

Or, to frame it more playfully, imagine a timeline like the one in The Survival Without Dignity, where humanity lurches through the AI transition via a series of absurd compromises, implausible cultural shifts, and situations that no serious forecaster would have put in their model because they would have seemed too silly.

0

625.503 - 646.65 Ihor Kendiukhov

Except imagine that timeline without the extreme luck that happens to keep everyone alive. Survival without dignity is a comedy in which everything goes wrong in unexpected ways and people muddle through regardless. My concern is that the realistic scenario is the same comedy minus the happy ending. Heading Why this might be useful

0

647.457 - 656.448 Ihor Kendiukhov

My goal in this post is rather to discuss the state of reality than what to do with that reality. That said, I envision at least several immediate implications.

0

658.09 - 667.622 Unknown

It would help calibrate expectations. It would help identify cheap interventions. It would inform the discussions on timelines.

0

669.265 - 691.511 Ihor Kendiukhov

It could help to get rid of the sense of false security if powerful AGI is not there, we are at least existentially safe. It could provide a more honest and at the same time sometimes more appealing basis for public communication about AI risk. I welcome thinking about implications in more detail, as well as developing specific scenarios. Note.

692.392 - 697.178 Ihor Kendiukhov

All of this is by no means an argument against singularity stuff Galaxy Brain ASI threats.

Chapter 7: How might society respond to low-competence AGI threats?

698.279 - 707.13 Ihor Kendiukhov

I believe they are super real and they are going to kill us if we survive until then. This article was narrated by Type 3 Audio for Less Wrong.

0

707.684 - 710.877 Unknown

It was published on March 19, 2026.

0
Comments

There are no comments yet.

Please log in to write the first comment.