LessWrong (30+ Karma)
Episodes
[Linkpost] “You can only build safe ASI if ASI is globally banned” by Connor Leahy
17 Apr 2026
Contributed by Lukas
This is a link post. Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the ...
“Beware of Well-Written Posts” by alseph
17 Apr 2026
Contributed by Lukas
Beware of when a post is so well-written that you can't put it down. Be wary of posts that are more visually attractive than average. Beware of posts...
“You Aren’t in Charge of the Overton Window; Politics Is Not Interior Design” by Davidmanheim
16 Apr 2026
Contributed by Lukas
Sometimes, people don't say what they actually think, not because saying it would be rude or costly, but because they believe saying it now would be ...
“Carpathia Day” by Drake Morrison
16 Apr 2026
Contributed by Lukas
(The better telling is here. Seriously you should go read it. I've heard this story told in rationalist circles, but there wasn't a post on LessWrong...
“Claude Code, Codex and Agentic Coding #7: Auto Mode” by Zvi
16 Apr 2026
Contributed by Lukas
As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrade...
“Do not conquer what you cannot defend” by habryka
16 Apr 2026
Contributed by Lukas
Epistemic status: All of the western canon must eventually be re-invented in a LessWrong post. So today we are re-inventing federalism. Once upon a t...
“What is the Iliad Intensive?” by Leon Lang, Alexander Gietelink Oldenziel, David Udell
16 Apr 2026
Contributed by Lukas
Almost two months ago, Iliad announced the Iliad Intensive and Iliad Fellowship. Fellowships are a well-understood unit, but what is an intensive? Th...
“The Mirror Test Is Complicated” by J Bostock
15 Apr 2026
Contributed by Lukas
The Mirror Test is kind of like Hitler. In any discussion of animal cognition, somebody is going to bring it up. The conversation usually goes like t...
“Contra Leicht on AI Pauses” by David Scott Krueger (formerly: capybaralet)
15 Apr 2026
Contributed by Lukas
This is going to be a nerdier article than usual. It's a response to Anton Leicht's blog post “Press Play To Continue”. I disagree with much of i...
“Nectome: All That I Know” by Raelifin
15 Apr 2026
Contributed by Lukas
TLDR: I flew to Oregon to investigate Nectome, a brain preservation startup, and talk to their entire team. They’re an ambitious company, looking t...
“Effective Altruism, Seen From Slytherin” by Xylix
15 Apr 2026
Contributed by Lukas
Epistemic status: Left as an exercise for the reader. I was thinking of EA outreach and its optics this week. And was inspired to glance at a specif...
“Majority Report” by peralice
15 Apr 2026
Contributed by Lukas
[Attention conservation notice: description of a social phenomenon which may be obvious to some people.] This post is partially inspired by Alexander...
“Current AIs seem pretty misaligned to me” by ryan_greenblatt
15 Apr 2026
Contributed by Lukas
Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're ...
“Contra Byrnes on UV & Cancer” by HedonicEscalator
15 Apr 2026
Contributed by Lukas
In his recent LessWrong post, Some takes on UV & cancer, Steve Byrnes comes out against the "Public Health Orthodoxy" on UV. Among other topics I...
“Everyone Has a Plan Until They Get Social Pressure To the Face” by Czynski
15 Apr 2026
Contributed by Lukas
or: Invisible Social Consensus is Real And Can Hurt You Related: Annoyingly Principled People, and what befalls them, both in terms of the claim bein...
“Mechanisms of Introspective Awareness” by Uzay Macar
14 Apr 2026
Contributed by Lukas
Uzay Macar and Li Yang are co-first authors. This work was advised by Jack Lindsey and Emmanuel Ameisen, with contributions from Atticus Wang and Pet...
“Claude Mythos #3: Capabilities and Additions” by Zvi
14 Apr 2026
Contributed by Lukas
To round out coverage of Mythos, today covers capabilities other than cyber, and anything else additional not covered by the first two posts, includi...
“Load-Bearing Sincerity: On the Motive Reinforcement Thesis” by Fiora Starlight
14 Apr 2026
Contributed by Lukas
This post was written almost entirely before David Africa and Jacob Pfau published this post earlier today, which takes a different approach to the s...
“Diary of a “Doomer”: 12+ years arguing about AI risk (part 1)” by David Scott Krueger (formerly: capybaralet)
14 Apr 2026
Contributed by Lukas
How I learned about Deep Learning. As far as I know, I’m the second person ever to get into the field of AI largely because I was worried about the...
“A Retrospective of Richard Ngo’s 2022 List of Conceptual Alignment Projects” by LawrenceC
14 Apr 2026
Contributed by Lukas
Written very quickly for the InkHaven Residency. In 2022, Richard Ngo wrote a list of 26 Conceptual Alignment Research Projects. Now that it's 2026, ...
“From personas to intentions: towards a science of motivations for AI models” by David Africa, Jacob Pfau
14 Apr 2026
Contributed by Lukas
TLDR: Behavior-only descriptions are useful, but insufficient for aligning advanced models with high assurance.Two models can look equally aligned on...
“The Shapley Share of Responsibility?” by Raemon
14 Apr 2026
Contributed by Lukas
Deepfates on twitter wrote: If you're in a theater and you shout "Fire!", and the audience reacts predictably and in the process trample someone to d...
“Who Killed Common Law?” by Benquo
14 Apr 2026
Contributed by Lukas
The classical undergraduate humanities curriculum in America was destroyed and replaced over the course of the twentieth century. The destruction is ...
“Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” by Alex Mallen, ryan_greenblatt
14 Apr 2026
Contributed by Lukas
It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at ...
“Meaningful Questions Have Return Types” by Drake Morrison
14 Apr 2026
Contributed by Lukas
One way intellectual progress stalls is when you are asking the Wrong Questions. Your question is nonsensical, or cuts against the way reality works....
“Political Violence Is Never Acceptable” by Zvi
13 Apr 2026
Contributed by Lukas
Nor is the threat or implication of violence. Period. Ever. No exceptions. It is completely unacceptable. I condemn it in the strongest possible ter...
“Only Law Can Prevent Extinction” by Eliezer Yudkowsky
13 Apr 2026
Contributed by Lukas
There's a quote I read as a kid that stuck with me my whole life: "Remember that all tax revenue is the result of holding a gun to somebody's head. N...
“AI Safety’s Biggest Talent Gap Isn’t Researchers. It’s Generalists.” by Topaz, agucova, Alexandra Bates, Parv Mahajan
13 Apr 2026
Contributed by Lukas
This post was cross posted to the EA Forum TL;DR: One of the largest talent gaps in AI safety is competent generalists: program managers, fieldbuilde...
“Tomas Bjartur: The Last Prodigy” by Linch
13 Apr 2026
Contributed by Lukas
n 2026, every budding prodigy in writing is in some sense a tragedy. Anybody with experience prompting the large language models to write fiction kno...
“Annoyingly Principled People, and what befalls them” by Raemon
13 Apr 2026
Contributed by Lukas
Here are two beliefs that are sort of haunting me right now: Folk who try to push people to uphold principles (whether established ones or novel ones...
“TAPs or it didn’t happen” by Raemon
13 Apr 2026
Contributed by Lukas
Once, I went to talk about "curiosity" with @LoganStrohl. They noted "it seems like you have a good handle on 'active curiosity', but you don't reall...
“Returns to intelligence” by RobertM
13 Apr 2026
Contributed by Lukas
I'm going to tell you a story. For that story to make sense, I need to give you some background context. I have some pretty smart friends. One of the...
“Daycare illnesses” by Nina Panickssery
13 Apr 2026
Contributed by Lukas
Before I had a baby I was pretty agnostic about the idea of daycare. I could imagine various pros and cons but I didn’t have a strong overall opini...
“The policy surrounding Mythos marks an irreversible power shift” by sil
13 Apr 2026
Contributed by Lukas
This post assumes Anthropic isn't lying: Mythos is the current SOTAMythos is potent[1]Anthropic will not make it publicly available un-nerfed[2]Anthr...
“Talk English, Think Something Else” by J Bostock
13 Apr 2026
Contributed by Lukas
There's an adage from programming in C++ which goes something like "Yes, you write C, but you imagine the machine code as you do." I assumed this was...
“Sparse Autoencoders for Single-Cell Models” by Ihor Kendiukhov
13 Apr 2026
Contributed by Lukas
People are rushing to build bigger and bigger single cell foundation models (trained on RNA sequencing data), but in my view we have not extracted ev...
“Eggs, rooms, puzzles, and talking about AI” by KatjaGrace
13 Apr 2026
Contributed by Lukas
I live with five friends in a big house, and two things I’ve done in it on this particular Sunday are hide 156 easter eggs all around, and reach a ...
“Morale” by J Bostock
12 Apr 2026
Contributed by Lukas
One particularly pernicious condition is low morale. Morale is, roughly, "the belief that if you work hard, your conditions will improve." If your mo...
“Your Mom is a Chimera” by michaelwaves
12 Apr 2026
Contributed by Lukas
And so are you! When you were a fetus, you were sending millions of your cells through the placenta into your mom. And she was sending her cells into...
“The Blast Radius Principle” by Martin Sustrik
12 Apr 2026
Contributed by Lukas
In April 2024, a salvo of cruise missiles destroyed the Trypilska thermal power plant, the largest in the Kyiv region, in under an hour. In June 2023...
“How to make good tea” by RobertM
12 Apr 2026
Contributed by Lukas
If you're starting from a baseline of drinking relatively cheap mass-market teabags, the easiest way to marginally improve your tea quality is by mak...
“Catching illicit distributed training operations during an AI pause” by Robi Rahman
12 Apr 2026
Contributed by Lukas
Last year, my colleagues on MIRI's Technical Governance Team proposed an international agreement to halt risky development of superhuman artificial i...
[Linkpost] “Scott Alexander gentrified my meetup” by dominicq
11 Apr 2026
Contributed by Lukas
This is a link post. On 10 April 2026, I organized an ACX/LW/rationality meetup. Initially, I intended for it to be just our regular meetup, and “re...
“Pausing AI Is the Best Answer to Post-Alignment Problems” by MichaelDickens
11 Apr 2026
Contributed by Lukas
Even if we solve the AI alignment problem, we still face post-alignment problems, which are all the other existential problems [1] that AI may bring...
“Some thoughts on Nectome’s risk and resilience” by Aurelia
11 Apr 2026
Contributed by Lukas
One of the best ways to reduce Nectome's long-term risk is to show that preservation is a thing people want by buying one yourself; this is a critica...
“Chocolate Sloths, Tinder, and Moral Backstops” by J Bostock
11 Apr 2026
Contributed by Lukas
My grandma has a poor understanding of moral hazard, when it comes to buying me 155g chocolate sloths. Moral hazard is a concept in political economy...
“Dario probably doesn’t believe in superintelligence” by RobertM
11 Apr 2026
Contributed by Lukas
Epistemic status: I think this is true but don't think this post is a very strong argument for the case, or particularly interesting to read. But I h...
“The Unintelligibility is Ours: Notes on Chain-of-Thought” by 1a3orn
11 Apr 2026
Contributed by Lukas
Many people seem to think that the chains-of-thought in RL-trained LLMs are under a great deal of "pressure" to cease being English. The idea is that...
“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by ryan_greenblatt
11 Apr 2026
Contributed by Lukas
Anthropic's system card for Mythos Preview says: It's unclear how we should interpret this. What do they mean by productivity uplift? To what exten...
“Claude Mythos #2: Cybersecurity and Project Glasswing” by Zvi
10 Apr 2026
Contributed by Lukas
Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon. Its cyber capabilities are too dangerous to...
“Why Control Creates Conflict, and When to Open Instead” by plex
10 Apr 2026
Contributed by Lukas
tl;dr: with multiple agents, control attempts tend to create conflict, because control attempts shut down communications channels, which leads to fee...
“Reproducing steering against evaluation awareness in a large open-weight model” by Thomas Read, Bronson Schoen, Joseph Bloom
10 Apr 2026
Contributed by Lukas
Produced as part of the UK AISI Model Transparency Team. Our team works on ensuring models don't subvert safety assessments, e.g. through evaluation ...
“Have we already lost? Part 2: Reasons for Doom” by LawrenceC
10 Apr 2026
Contributed by Lukas
Written very quickly for the Inkhaven Residency. As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidabl...
“Model organisms researchers should check whether high LRs defeat their model organisms” by dx26, Sebastian Prasanna, Alek Westover, Vivek Hebbar, Julian Stastny
10 Apr 2026
Contributed by Lukas
Thanks to Buck Shlegeris for feedback on a draft of this post. The goal-guarding hypothesis states that schemers will be able to preserve their goals...
“Anthropic did not publish a “risk discussion” of Mythos when required by their RSP” by RobertM
10 Apr 2026
Contributed by Lukas
I and some other people noticed a potential discrepancy in Anthropic's announcement of Claude Mythos. The version of the RSP that was operative over ...
“Claude Mythos: The System Card” by Zvi
10 Apr 2026
Contributed by Lukas
Claude Mythos is different. This is the first model other than GPT-2 that is at first not being released for public use at all. With GPT-2 the dela...
“Some takes on UV & cancer” by Steven Byrnes
10 Apr 2026
Contributed by Lukas
Table of contents: Part 1: In which I use my optical physics background to share some hopefully-uncontroversial observationsPart 2: In which I boldly...
“AI #163: Mythos Quest” by Zvi
09 Apr 2026
Contributed by Lukas
There exists an AI model, Claude Mythos, that has discovered critical safety vulnerabilities in every major operating system and browser. If released...
“Help me launch Obsolete: a book aimed at building a new movement for AI reform” by garrison
09 Apr 2026
Contributed by Lukas
I wrote a book! It's called Obsolete: The AI Industry's Trillion-Dollar Race to Replace You—and How to Stop It, and it’ll be available in May if ...
“Slightly-Super Persuasion Will Do” by Tomás B.
09 Apr 2026
Contributed by Lukas
In SF this week, I met an online friend in person for the first time yesterday. We talked about super-persuasion. His take was: there is mostly an ef...
“Have we already lost? Part 1: The Plan in 2024” by LawrenceC
09 Apr 2026
Contributed by Lukas
Written very quickly for the Inkhaven Residency. As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidab...
“Do not be surprised if LessWrong gets hacked” by RobertM
09 Apr 2026
Contributed by Lukas
Or, for that matter, anything else. This post is meant to be two things: a PSA about LessWrong's current security posture, from a LessWrong admin[1]a...
“One Week in the Rat Farm” by Philip Harker
09 Apr 2026
Contributed by Lukas
Hello, LessWrong. This is a personal introduction diary-ish post and it does not have a thesis. I apologise if this isn't a good fit for the website;...
“101 Humans of New York on the Risks of AI” by Corm
09 Apr 2026
Contributed by Lukas
Nobody has ever done an in person door to door survey about AI risks[1]. What do people really think about AI? Like really? There have been some surv...
“Baking tips” by RobertM
08 Apr 2026
Contributed by Lukas
These are things I've learned from experience that others might find helpful. Some of them are easy to miss for a while. (Also an exercise in "realit...
“An easy coordination problem?” by KatjaGrace
08 Apr 2026
Contributed by Lukas
Common wisdom says that it is incredibly hard to coordinate to not build more dangerous AI. This sounds believable in the abstract: international geo...
“Excerpts and Notes on Mythos Model Card” by williawa
08 Apr 2026
Contributed by Lukas
List of Excerpts from Mythos model card. Tried to include interesting things, but also included some boring to be expected things. I omitted some t...
“The effects of caffeine consumption do not decay with a ~5 hour half-life” by kman
08 Apr 2026
Contributed by Lukas
epistemic status: confident in the overall picture, substantial quantitative uncertainty about the relative potency of caffeine and paraxanthine tldr...
“You don’t know what you are made of till you’ve been stalked across three countries” by Shoshannah Tekofsky
08 Apr 2026
Contributed by Lukas
When I was 19 I made some decisions. One of them was to stop eating meat. One of them was to stop using cutlery. One of them was to stop using chairs...
“Why is Flesh So Weak?” by J Bostock
08 Apr 2026
Contributed by Lukas
Ok, I got nerd sniped on the specific argument "Animals would be better off being made of stronger material than protein, but they don't because evol...
“The hard part isn’t noticing when papers are bad, it’s deciding what to do afterwards” by LawrenceC
08 Apr 2026
Contributed by Lukas
Written (very) quickly for the Inkhaven Residency. I used to hate the classic management adage of “bring me solutions, not problems”. After all, ...
“We can prevent progress! Conceptual clarity, and inspiration from the FDA” by KatjaGrace
08 Apr 2026
Contributed by Lukas
“We can’t prevent progress” say the people for some reason enthusiastically advocating that we just risk dying by AI rather than even consider ...
“AI as a Trojan horse race” by KatjaGrace
08 Apr 2026
Contributed by Lukas
I’ve argued that the AI situation is not clearly an ‘arms race’. By which I mean, going fast is not clearly good, even selfishly. I think this...
“My unsupervised elicitation challenge” by DanielFilan
08 Apr 2026
Contributed by Lukas
Note: you are ineligible to complete this challenge if you’ve studied Ancient or Modern Greek, or if you natively speak Modern Greek, or if for oth...
“Role-playing vs Self-modelling” by Jan_Kulveit
08 Apr 2026
Contributed by Lukas
In a recent debate on Twitter – which I recommend reading in full – David Chalmers argues: "Claude doesn't role-play the assistant, it realizes t...
“Elementary Condensation” by Jan
08 Apr 2026
Contributed by Lukas
Previously in this series: Elementary Infra-Bayesianism 1. There's this paper Earlier last week I got nerd-sniped by a paper called Condensation: a t...
“Hedging and Survival-Weighted Planning” by Vaniver
08 Apr 2026
Contributed by Lukas
This wasn't intended to be a topical post, but Claude Mythos's system card is out, and... well. I wrote years ago about decision analysis, which ofte...
“Opus’s Schelling Steganography Has Amplifiable Secrecy Against Weaker Eavesdroppers” by Elle Najt
08 Apr 2026
Contributed by Lukas
Code: github.com/ElleNajt/Steganography_Wiretapping | Data: huggingface.co/datasets/lnajt/steganography-wiretapping Play the decoding game: can you ...
“An Alignment Journal: Features and policies” by JessRiedel, Dan MacKinlay, Luca, Daniel Murfet, david reinstein
08 Apr 2026
Contributed by Lukas
We previously announced a forthcoming research journal for AI alignment. This cross-post from our blog describes our tentative plans for the features...
“Fantasy ideology” by Ninety-Three
07 Apr 2026
Contributed by Lukas
The following is a long excerpt from a longer article published in 2002 by Lee Harris, Al Qaeda's Fantasy Ideology. The full article is about what it...
[Linkpost] “Questions raised about OpenAI leaders’ trustworthiness by the New Yorker” by Remmelt
07 Apr 2026
Contributed by Lukas
This is a link post. One excerpt stuck out for me – on Brockman's idea to play China, Russia, and other world powers against each other: In 2017, Am...
“Claude Mythos System Card Preview” by anaguma
07 Apr 2026
Contributed by Lukas
Anthropic has released a preview of the Claude Mythos System Card preview here. It is too long to present in full, but a section I found particularly...
“My picture of the present in AI” by ryan_greenblatt
07 Apr 2026
Contributed by Lukas
In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a scena...
[Linkpost] ”[Paper] Stringological sequence prediction I” by Vanessa Kosoy
07 Apr 2026
Contributed by Lukas
This is a link post. TLDR: The first in a planned series of three or more papers, which constitute the first major in-road in the compositional learni...
“We’re actually running out of benchmarks to upper bound AI capabilities” by LawrenceC
07 Apr 2026
Contributed by Lukas
Written quickly as part of the Inkhaven Residency. Opinions are my own and do not represent METR's official opinion. In early 2025, the situation fo...
“Don’t write for LLMs, just record everything” by RobertM
07 Apr 2026
Contributed by Lukas
Some people have argued the advent of LLMs has dramatically increased the value of having a public writing footprint. The first reason given is that ...
“Contra Nina Panickssery on advice for children” by Sean Herrington
07 Apr 2026
Contributed by Lukas
I recently read this post by Nina Panickssery on advice for children. I felt that several of the recommendations are actively harmful the children th...
“By Strong Default, ASI Will End Liberal Democracy” by MichaelDickens
07 Apr 2026
Contributed by Lukas
Cross-posted from my website. The existence of liberal democracy—with rule of law, constraints on government power, and enfranchised citizens—re...
“AIs can now often do massive easy-to-verify SWE tasks and I’ve updated towards shorter timelines” by ryan_greenblatt
06 Apr 2026
Contributed by Lukas
I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) a...
“Paper close reading: “Why Language Models Hallucinate”” by LawrenceC
06 Apr 2026
Contributed by Lukas
People often talk about paper reading as a skill, but there aren’t that many examples of people walking through how they do it. Part of this is a p...
“Ten different ways of thinking about Gradual Disempowerment” by David Scott Krueger (formerly: capybaralet)
05 Apr 2026
Contributed by Lukas
About a year ago, we wrote a paper that coined the term “Gradual Disempowerment.” It proved to be a great success, which is terrific. A friend an...
“11 pieces of advice for children” by Nina Panickssery
05 Apr 2026
Contributed by Lukas
I came up with these principles when I was a child myself. Don’t be a sheep 🐑. Avoid mindlessly copying others. Resist the urge towards conformi...
“Steering Might Stop Working Soon” by J Bostock
05 Apr 2026
Contributed by Lukas
Steering LLMs with single-vector methods might break down soon, and by soon I mean soon enough that if you're working on steering, you should start p...
“Am I the baddie?” by Ustice
05 Apr 2026
Contributed by Lukas
I am a software engineer. I work for a company that makes software for road construction. Monday last week we were under a bad crunch and we were tol...
“Academic Proof-of-Work in the Age of LLMs” by LawrenceC
05 Apr 2026
Contributed by Lukas
Written quickly as part of the Inkhaven Residency. Related: Bureaucracy as active ingredient, pain as active ingredient A widely known secret in acad...
“Positive sum does not mean “win-win”” by loops
05 Apr 2026
Contributed by Lukas
A lot of people and documents online say that positive-sum games are "win-wins", where all of the participants are better off. But this isn't true! I...
“Considerations for growing the pie” by Zach Stein-Perlman
05 Apr 2026
Contributed by Lukas
Recently some friends and I were comparing growing the pie interventions to an increasing our friends' share of the pie intervention, and at first we...
″“Following the incentives”” by David Scott Krueger (formerly: capybaralet)
04 Apr 2026
Contributed by Lukas
A few years ago I listened to a fascinating podcast interview featuring former Democratic presidential candidates Andrew Yang and Marianne Williamson...
“Chicken-Free Egg Whites” by jefftk
04 Apr 2026
Contributed by Lukas
Baking has traditionally made extensive use of egg whites, especially the way they can be beaten into a foam and then set with heat. While I eat eg...
“dark ilan” by ozymandias
04 Apr 2026
Contributed by Lukas
The second time Vellam uncovers the conspiracy underlying all of society, he approaches a Keeper. Some of the difference is convenience. Since Vellam...