Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (Curated & Popular)

"Product Alignment is not Superintelligence Alignment (and we need the latter to survive)" by plex

01 Apr 2026

Transcription

Chapter 1: What is the difference between product alignment and superintelligence alignment?

0.031 - 4.498 Unknown

Product alignment is not superintelligence alignment, and we need the latter to survive.

0

5.6 - 19.984 Plex

By Plex Published on March 31, 2026 TLDR Progress on making Claude friendly is not the same as progress on making it safe to build godlike superintelligence.

0

21.106 - 35.185 Unknown

Solving the former does not imply we get a good future. Please track the difference. The term alignment was coined to point to the technical problem of understanding how to build minds such that if they were to become strongly and generally superhuman, things would go well.

0

36.286 - 48.263 Unknown

It has been increasingly adopted by frontier AI labs and much of the rest of the AI safety community to mean a much easier challenge, something like having AIs that are empirically doing approximately what you ask them to do.

0

48.243 - 67.484 Unknown

If it's possible to use an intent aligned product to build a research system which discovers a new paradigm and breaks your guardrails, then it is not aligned in the original sense. If you can use your intent aligned system to write code which jailbreaks other LLMs and enables them to do dangerous ML research, it is also not aligned in the original sense.

68.565 - 78.216 Unknown

Conflating progress on product alignment with progress on superintelligence alignment seems to be lulling much of the AI safety community into a false sense of security.

78.196 - 84.527 Plex

Heading Why is superintelligence alignment less prominent? Because product alignment is

85.874 - 99.713 Unknown

Much closer to the scaling lab's core expertise, ML, than theory, technical philosophy and math, so easier for them to hire for and evaluate. Has easier-to-use feedback loops. Run an experiment, observe the results.

100.834 - 110.347 Unknown

Superintelligence alignment requires building enough theoretical understanding before running some kind of experiment, because you might not be alive to see some results if your theory is wrong.

Chapter 2: Why is superintelligence alignment less prominent in AI discussions?

111.448 - 131.147 Unknown

More profitable. Progress on product alignment makes AI more useful right away. Easier for funders to fund. It's harder to evaluate who will make progress or what even counts as progress on superintelligence alignment theory than a domain where you reliably get publishable results from running an experiment. This is inconvenient.

0

132.208 - 137.996 Unknown

It would be awesome if we could ride easy to evaluate profitable empirical feedback loops all the way to a great future.

0

138.997 - 145.623 Plex

But this seems far from certain. Heading. Why do we need superintelligence alignment to survive?

0

146.806 - 168.017 Unknown

Reality is allowed to be inconvenient. There's strong reason to expect that superhuman situationally aware agents inside your experiment break some of the foundations the scientific process relies upon, such as You can run roughly any experiment as often as you want to gather data and the world won't end because the theory you were testing was wrong and you ran a too strong agent.

0

169.079 - 180.563 Unknown

You won't have an intelligent adversary inside your experiment which is aware of you and faking data. Your experiment won't produce data which is superhumanly optimized to persuade you. In short,

Chapter 3: What challenges do we face with superintelligence alignment?

180.695 - 202.129 Unknown

Your experimental subject is not a neutral substrate, but a strategic actor more capable than you. If we don't have guarantees of maintaining safety properties each time a model builds the next rung on the capability ladder, we're rolling a dice for irreversible guardrail decay. And we're going to be very rapidly rolling huge numbers of those dice and the feedback loop spins up.

0

202.295 - 212.018 Unknown

As we are headed up the exponential, we're going to need techniques which generalize to strongly superhuman agents. Ones which correctly believe they could defeat all of humanity.

0

Chapter 4: How do product-aligned AIs contribute to AI safety research?

212.2 - 224.121 Unknown

Product-aligned AIs might help with that work, but the type of research they would need to automate needs to look more like technical philosophy and reliably avoiding slop, not just avoiding scheming and passing product alignment benchmarks.

0

225.283 - 245.717 Unknown

Only a tiny fraction of the field of AI safety is focused on these big picture bottlenecks, due to a mix of funding incentives and it being more rewarding for most people to do empirical science. When you see people enthusiastically talking about how much progress we have on alignment, please track and ask whether they're talking about aligning products or aligning superintelligence.

0

247.3 - 254.331 Unknown

This article was narrated by Type 3 Audio for Less Wrong. It was published on March 31, 2026.

0

Chapter 5: What are the implications of conflating product alignment with superintelligence alignment?

256.093 - 260.5 Unknown

The original text contained 10 footnotes which were omitted from the narration.

0
Comments

There are no comments yet.

Please log in to write the first comment.