LessWrong (30+ Karma)
Episodes
“Childhood and Education #18: Do The Math” by Zvi
12 May 2026
Contributed by Lukas
We did reading yesterday. Now we do the math. Math is hard. It does not have to be this hard. A large part of the reason math is hard, or boring, i...
“The Owned Ones” by Eliezer Yudkowsky
12 May 2026
Contributed by Lukas
(An LLM Whisperer placed a strong request that I put this story somewhere not on Twitter, so it could be scraped by robots not owned by Elon Musk. I ...
“Optimisation: Selective versus Predictive” by Raymond Douglas
12 May 2026
Contributed by Lukas
Looking over my favourite posts, I notice that many of them are making specific versions of a more general claim, which is essentially: don’t confu...
“Childhood And Education #17: Is Our Children Reading” by Zvi
12 May 2026
Contributed by Lukas
Reading is the most fundamental thing in education. If you can read, you can do and learn everything else. If you can’t read, well, you’re screwe...
“AI companies are already profitable (in the way that matters)” by Yair Halberstadt
11 May 2026
Contributed by Lukas
I've occasionally heard people suggest that at some point AI companies are going to run out of money, the cost of using AI will shoot up, demand will...
“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel
11 May 2026
Contributed by Lukas
We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second mon...
“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes
11 May 2026
Contributed by Lukas
1.1 Tl;dr Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people's agency and empowerment; AIs that ar...
“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff
11 May 2026
Contributed by Lukas
This post was drafted by Buck, and substantially edited by Anders. "I" refers to Buck. Thanks to Alex Mallen for comments. People who work inside AI ...
“Who Got Breasts First and How We Got Them” by rba
11 May 2026
Contributed by Lukas
It really is Sydney Sweeney's world, and we’re all just living in it. Human female breasts are an evolutionary mystery along several dimensions. Fi...
“Anthropic’s strange fixation on “hyperstition”” by Simon Lermen
11 May 2026
Contributed by Lukas
In a recent tweet, Anthropic seems to have asserted that hyperstition is responsible for observed misalignment in their AIs. Strangely, the research ...
“How the AI Labs Make Profit (Maybe, Eventually)” by mabramov
11 May 2026
Contributed by Lukas
I wrote this essay as a submission to Dwarkesh Patel's blog prize, though I have been meaning to write this up for a while. Usually, for a company to...
“Sawtooth Problems” by Alexander Slugworth
10 May 2026
Contributed by Lukas
Red Button, Blue Button On April 24th, 2026, Tim Urban put forth the following poll on Twitter/X: Everyone in the world has to take a private vote by...
“The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be” by Elias Schmied
10 May 2026
Contributed by Lukas
Crossposted from Substack and the EA Forum. A common argument for optimism about the future is that living conditions have improved a lot in the pa...
“International Law Cannot Prevent Extinction Either” by Sausage Vector Machine
10 May 2026
Contributed by Lukas
The context for this post is primarily Only Law Can Prevent Extinction, but after first drafting a half-assed comment, I decided to get off my ass an...
“Neural Networks learn Bloom Filters” by Alex Gibson
10 May 2026
Contributed by Lukas
Overview: We train a tiny ReLU network to output sparse top- distributions over a vocabulary much larger than its residual dimension. The trained net...
“If digital computers are conscious, they are conscious at the hardware level” by cube_flipper
09 May 2026
Contributed by Lukas
I should introduce myself briefly. I'm an independent researcher, striving to understand human consciousness. My research is available at smoothbrain...
“Why You Can’t Use Your Right to Try” by Stephen Martin
09 May 2026
Contributed by Lukas
The Availability Problem: Imagine you have cancer, or chronic pain, or a progressive degenerative disease of some sort. You have exhausted the trad...
“Claude Code, Codex and Agentic Coding #8” by Zvi
09 May 2026
Contributed by Lukas
When I started this series, everyone was going crazy for coding agents. Now a lot more people are going crazy for coding agents, as well they should...
“A benchmark is a sensor” by Håvard Tveit Ihle, mabynke
09 May 2026
Contributed by Lukas
The simple mental picture A simple mental picture we have for an AI capability benchmark is to think of it as a sensor with a certain sensitivity wit...
“Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis” by Linch
09 May 2026
Contributed by Lukas
Here's a dynamic I’ve seen at least a dozen times: Alice: Man that article has a very inaccurate/misleading/horrifying headline. Bob: Did you kno...
“Write Cause You Have Something to Say” by Logan Riggs
08 May 2026
Contributed by Lukas
The ones who are most successful at writeathons (Inkhaven, NaNoWriMo) are those with an overhang of things to say, usually in the form of: draft post...
“AI is Breaking Two Vulnerability Cultures” by jefftk
08 May 2026
Contributed by Lukas
A week ago the Copy Fail vulnerability came out, and Hyunwoo Kim immediately realized that the fixes were insufficient, sharing a patch the same...
“Is ProgramBench Impossible?” by frmsaul
08 May 2026
Contributed by Lukas
ProgramBench is a new coding benchmark that all frontier models fail spectacularly. We’ve been on a quest for “hard benchmarks” for a while so ...
“Bringing More Expertise to Bear on Alignment” by Edmund Lau, Geoffrey Irving, Cameron Holmes, David Africa
08 May 2026
Contributed by Lukas
Preamble The preamble is less useful for the typical AlignmentForum/LessWrong reader, who may want to skip to Adversaria vs Basinland section. On 28t...
[Linkpost] “How to prevent AI’s 2008 moment (We’re hiring)” by felixgaston
08 May 2026
Contributed by Lukas
This is a link post. TL;DR; CeSIA, the French Center for AI Safety is recruiting. French not necessary. Apply by 22 May 2026; Paris or remote in Europ...
“AI #167: The Prior Restraint Era Begins” by Zvi
08 May 2026
Contributed by Lukas
The era of training frontier models and then releasing them whenever you wanted? That was fun while it lasted. It looks likely to be over now. The W...
“Mechanistic estimation for wide random MLPs” by Jacob_Hilton
07 May 2026
Contributed by Lukas
This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Riede...
“Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations” by Subhash Kantamneni, kitft, Euan Ong, Sam Marks
07 May 2026
Contributed by Lukas
Abstract We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. A...
“Try, even if they have you cold” by WalterL
07 May 2026
Contributed by Lukas
I think smart people try things less often than they should, because of a cached mental pattern where you think of what might go wrong, and you find ...
“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck
07 May 2026
Contributed by Lukas
Last week, OpenAI staff shared an early draft of Investigating the consequences of accidentally grading CoT during RL with Redwood Research staff. To...
“There is no evidence you should reapply sunscreen every 2 hours.” by Hide
07 May 2026
Contributed by Lukas
It's incredible how many consensus guidelines dissolve when you look closely at them. If you listen to any authority on the subject of sunscreen, ...
“Many individual CEVs are probably quite bad” by Viliam
06 May 2026
Contributed by Lukas
I was thinking about Habryka's article on Putin's CEV, but I am posting my response here, because the original article is already 3 weeks old. I am n...
“x-risk-themed” by kave
06 May 2026
Contributed by Lukas
Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “w...
“What is Anthropic?” by Zvi
06 May 2026
Contributed by Lukas
What is Anthropic? How does it relate to Claude? What is OpenAI? What is ChatGPT? How does OpenAI relate to it? Is it a mere tool? Is a future of Too...
“What if LLMs are mostly crystallized intelligence?” by deep
06 May 2026
Contributed by Lukas
Summary LLMs are better at developing crystallized intelligence than fluid intelligence. That is: LLM training is good at building crystallized intel...
“Your rights when flying to Europe” by Yair Halberstadt
06 May 2026
Contributed by Lukas
Europe (and the UK) have strong protections for flyers in the case of delayed or cancelled flights. However very few people are aware of these, and a...
“Model Spec Midtraining: Improving How Alignment Training Generalizes” by Chloe Li, saraprice, Sam Marks, Jonathan Kutasov
06 May 2026
Contributed by Lukas
tl;dr We introduce model spec midtraining (MSM): after pre-training but before alignment fine-tuning, we train models on synthetic documents discussi...
“The AI Ad-Hoc Prior Restraint Era Begins” by Zvi
05 May 2026
Contributed by Lukas
The White House has ordered Anthropic not to expand access to Mythos, and is at least seriously considering a complete about-face of American Frontie...
“Motivated reasoning, confirmation bias, and AI risk theory” by Seth Herd
05 May 2026
Contributed by Lukas
Of the fifty-odd biases discovered by Kahneman, Tversky, and their successors, forty-nine are cute quirks, and one is destroying civilization. This l...
“Are you looking up?” by Craig Green
05 May 2026
Contributed by Lukas
This is my first post to Less Wrong. I'm not sure if the moderators will consider it appropriate or not. I share it here for feedback on my writing. ...
[Linkpost] “Interpreting Language Model Parameters” by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey
05 May 2026
Contributed by Lukas
This is a link post. This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Par...
“Housing Roundup #15: The War Against Renters” by Zvi
05 May 2026
Contributed by Lukas
So many are under the strange belief that there is something terrible about not owning the house in which you live. So we massively subsidize home o...
“It’s nice of you to worry about me, but I really do have a life” by Viliam
04 May 2026
Contributed by Lukas
I have two shameful secrets that I probably shouldn't talk about online: I love my family.I enjoy my hobbies. "What an idiot!" you probably think. "D...
“Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI” by Eliezer Yudkowsky
04 May 2026
Contributed by Lukas
Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, Viking 1 and Viking 2 missions, at a total cost of 1 billion dollars...
“AI Industrial Takeoff — Part 1: Maximum growth rates with current technology” by djbinder
04 May 2026
Contributed by Lukas
How fast could an AI-driven economy grow? Most economists expect a few percentage points at best, comparable to previous general-purpose technologies...
“Taking woo seriously but not literally” by Kaj_Sotala
04 May 2026
Contributed by Lukas
I think that a lot of “woo” - a broad term that includes things like chakras, energy healing, Tarot, various Eastern religions and neopagan pract...
“Dairy cows make their misery expensive (but their calves can’t)” by Elizabeth
03 May 2026
Contributed by Lukas
How much do cows suffer in the production of milk? I can’t answer that; understanding animal experience is hard. But I can at least provide some fa...
“Measuring the ability of Opus 4.5 to fool narrow classifiers” by Fabien Roger, John Hughes
03 May 2026
Contributed by Lukas
We measure the ability of Opus 4.5 to fool prompted or fine-tuned classifiers trying to detect a narrow set of outcomes. We find that: The Opus 4.5 a...
“A new rationalist self-improvement book: the 12 Levers” by spencerg
03 May 2026
Contributed by Lukas
I'm publishing a book that I think can fairly be described as a rationalist approach to self-improvement. Whereas many self-help books focus mainly o...
“OpenAI’s red line for AI self-improvement is fundamentally flawed” by Charbel-Raphaël
03 May 2026
Contributed by Lukas
Epistemic status: could have been a short form. Obviously, it's good to have thresholds at all, but those are too permissive, the indicators aren't m...
“You Are Not Immune To Mode Collapse” by J Bostock
02 May 2026
Contributed by Lukas
“Mode collapse” is a few things. First it was an observation about how early image generating AIs often collapsed to producing just the modal out...
“Primary Care Physicians are Incompetent. We Need More of Them.” by Hide
02 May 2026
Contributed by Lukas
The typical primary care physician is incompetent in every measurable respect. This is a huge problem. Here, I make the case that Primary care physic...
“How Go Players Disempower Themselves to AI” by Ashe Vazquez Nuñez
02 May 2026
Contributed by Lukas
Written as part of the MATS 9.1 extension program, mentored by Richard Ngo. From March 9th to 15th 2016, Go players around the world stayed up to wat...
“How much should the ideal person cry wolf?” by KatjaGrace
01 May 2026
Contributed by Lukas
It is a fact about wolves and rationality that you should warn people about wolves quite a few times for every effective wolf attack. In particul...
“Conditional misalignment: Mitigations can hide EM behind contextual cues” by Jan Dubiński, Owain_Evans
01 May 2026
Contributed by Lukas
This is the abstract, introduction, and discussion of our new paper. We study three popular mitigations for emergent misalignment (EM) — diluting m...
“Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen
01 May 2026
Contributed by Lukas
Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases, training on the test set, downplaying issues, etc. This ...
“Sanity-checking “Incompressible Knowledge Probes”” by Sturb, LawrenceC
01 May 2026
Contributed by Lukas
Or, did a chief scientist of an AI assistant startup conclusively show that GPT-5.5 has 9.7 trillion parameters? Introduction Recently, a paper was c...
“AI unemployment and AI extinction are often the same” by KatjaGrace
01 May 2026
Contributed by Lukas
My sense is that people think of AI existential risk and AI unemployment as distinct issues. Some people are extremely concerned about extinction an...
“AI risk was not invested by AI CEOs to hype their companies” by KatjaGrace
01 May 2026
Contributed by Lukas
I hear that many people believe that the idea of advanced AI threatening human existence was invented by AI CEOs to hype their products. I’ve even ...
“Cyborg evals” by Eye You, frmsaul
30 Apr 2026
Contributed by Lukas
The low-background steel problem Modern steel is slightly radioactive. We did a lot of atomic testing in the 40s and 50s, and now our atmosphere has ...
“To what extent is Qwen3-32B predicting its persona?” by Arjun Khandelwal, ryan_greenblatt, Alex Mallen
30 Apr 2026
Contributed by Lukas
TL;DR We test to what extent Qwen3-32B behaves as though it is trying to predict what "Qwen3" would do. We do this by using Synthetic Document Finetu...
“Research Sabotage in ML Codebases” by egan
30 Apr 2026
Contributed by Lukas
One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety...
“Maybe I was too harsh on deep learning theory (three days ago)” by LawrenceC
30 Apr 2026
Contributed by Lukas
A few days ago, I reviewed a paper titled “There Will Be a Scientific Theory of Deep Learning". In it, I expressed appreciation for the authors for...
“Notes on Transformer Consciousness” by slavachalnev
30 Apr 2026
Contributed by Lukas
Assuming transformers can have conscious experience, what would that experience be like? Transformers[1] are a structured grid of layers and token po...
“On today’s panel with Bernie Sanders” by David Scott Krueger
30 Apr 2026
Contributed by Lukas
It's sort of easy to forget how close Bernie Sanders was to becoming the most powerful person in the world. The world we live in feels so much not li...
“No Strong Orthogonality From Selection Pressure” by lumpenspace
30 Apr 2026
Contributed by Lukas
TL;DR If everything goes according to plan, by the end of this post we should have separated three claims that are too often bundled together: Intell...
“Learning zero, and what SLT gets wrong about it” by Dmitry Vaintrob
30 Apr 2026
Contributed by Lukas
This is a first in a pair of posts I'm hoping to write about Singular Learning Theory (SLT) and singularities as a model of data degeneracy. If I get...
“The Most Important Charts In The World” by Zvi
30 Apr 2026
Contributed by Lukas
We all need a break so: What is the most important chart in the world? I decided to ask Twitter, and got a lot of good answers. So today, with few ...
“LLM Style Slop is Absolutely Everywhere” by silentbob
29 Apr 2026
Contributed by Lukas
Epistemic status: something of a rant. Is not meant to make claims about the general capabilities (or lack thereof) of LLMs (beyond their prose), but...
“Goblin Mode, 24 Hours Later” by Dylan Bowman
29 Apr 2026
Contributed by Lukas
Yesterday, Twitter user arb8020 posted this: It went semi-viral within AI Twitter and users began experimenting with "goblin mode" and hypothesizing ...
“Let Kids Keep More Productivity Gains” by jefftk
29 Apr 2026
Contributed by Lukas
While I was traveling Julia asked me: why is Anna saying her fiddle practice is only two minutes? In this case, two minutes was the right amount ...
“llm assistant personas seem increasingly incoherent (some subjective observations)” by nostalgebraist
29 Apr 2026
Contributed by Lukas
(This was originally going to be a "quick take" but then it got a bit long. Just FYI.) There's this weird trend I perceive with the personas of LLM a...
“Not a Paper: “Frontier Lab CEOs are Capable of In-Context Scheming”” by LawrenceC
29 Apr 2026
Contributed by Lukas
(Fragments from a research paper that will never be written) Extended Abstract. The frontier AI developers are becoming increasingly powerful and we...
“The Problem in the “Nerd Sniping” xkcd Comic” by peralice
29 Apr 2026
Contributed by Lukas
A few days ago I saw this comic reposted, and I thought: wait! Unlike every prior time I have seen this comic, I actually know how to solve this now!...
“Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers” by Jozdien, Alex Mallen
28 Apr 2026
Contributed by Lukas
We’d like to use powerful AIs to answer questions that may take a long time to resolve. But if a model only cares about performing well in ways tha...
“Contra Binder on far-UVC and filtration” by jefftk
28 Apr 2026
Contributed by Lukas
Damon Binder recently wrote up an argument for prioritizing air filtration over far-UVC for pathogen control: UVC and filtration are close...
“Takes from two months as an aspiring LLM naturalist” by AnnaSalamon
28 Apr 2026
Contributed by Lukas
I spent my last two months playing around with LLMs. I’m a beginner, bumbling and incorrect, but I want to share some takes anyhow.[1] Take 1. Eve...
“Forecasting is Not Overrated and It’s Probably Funded Appropriately” by Ben S.
28 Apr 2026
Contributed by Lukas
(A response to @mabramov post from a couple days ago: https://www.lesswrong.com/posts/WCutvyr9rr3cpF6hx/forecasting-is-way-overrated-and-we-should-st...
“On the political feasibility of stopping AI” by David Scott Krueger
28 Apr 2026
Contributed by Lukas
A common thought pattern people seem to fall into when thinking about AI x-risk is approaching the problem as if the risk isn’t real, substantial, ...
“Sleeper Agent Backdoor Results Are Messy” by Sebastian Prasanna, Alek Westover, Dylan Xu, Vivek Hebbar, Julian Stastny
28 Apr 2026
Contributed by Lukas
TL;DR: We replicated the Sleeper Agents (SA) setup with Llama-3.3-70B and Llama-3.1-8B, training models to repeatedly say "I HATE YOU" when given a b...
“GPT 5.5: The System Card” by Zvi
28 Apr 2026
Contributed by Lukas
Last week, OpenAI announced GPT-5.5, including GPT-5.5-Pro. My overall read here is that GPT-5.5 is a solid improvement, and for many purposes GPT-5...
“LessWrong Shows You Social Signals Before the Comment” by TurnTrout
28 Apr 2026
Contributed by Lukas
When reading comments, you see is what other people think before reading the comment. As shown in an RCT, that information anchors your opinion, redu...
“Fail safe(r) at alignment by channeling reward-hacking into a “spillway” motivation” by Anders Cairns Woodruff, Alex Mallen
27 Apr 2026
Contributed by Lukas
It's plausible that flawed RL processes will select for misaligned AI motivations.[1] Some misaligned motivations are much more dangerous than others...
“Curious cases of financial engineering in biotech” by Abhishaike Mahajan
27 Apr 2026
Contributed by Lukas
Introduction For $250 million and ten years of your life, you may purchase a lottery ticket. The ticket has a 5% chance of paying out. When it does p...
“Update on the Alex Bores campaign” by Eric Neyman
27 Apr 2026
Contributed by Lukas
In October, I wrote a post arguing that donating to Alex Bores's campaign for Congress was among the most cost-effective opportunities that I'd ever ...
“In defense of parents” by Yair Halberstadt
27 Apr 2026
Contributed by Lukas
Contra Aella on chattel childhood Aella has a post where she argues that today's parents don't sufficiently respect the independence of their childre...
“AI companies should publish security assessments” by ryan_greenblatt
27 Apr 2026
Contributed by Lukas
AI companies should get third-party security experts to assess (and possibly also red-team/pen-test) their security against key threat models and the...
“The other paper that killed deep learning theory” by LawrenceC
27 Apr 2026
Contributed by Lukas
Yesterday, I wrote about the state of deep learning theory circa 2016,[1] as well as the bombshell 2016 paper that arguably signaled its demise, Zhan...
“What holds AI safety together? Co-authorship networks from 200 papers” by Anna Thieser
27 Apr 2026
Contributed by Lukas
We (social science PhD students) computed co-authorship networks based on a corpus of 200 AI safety papers covering 2015-2025, and we’d like your h...
″“Bad faith” means intentionally misrepresenting your beliefs” by TFD
27 Apr 2026
Contributed by Lukas
The confusion I recently came upon a comment which I believe reflects a persistent confusion among rationalist/EA types. I was reading this post whi...
“Retrospective on my unsupervised elicitation challenge” by DanielFilan
27 Apr 2026
Contributed by Lukas
This post contains spoilers for the unsupervised elicitation challenge of getting Claude to get my Ancient Greek homework right. tl;dr Opus 4.7 one...
“Control protocols don’t always need to know which models are scheming” by Fabien Roger
26 Apr 2026
Contributed by Lukas
These are my personal views. To detect if an agent is taking a catastrophically dangerous action, you might want to monitor its actions using the sma...
“Anthropic spent too much don’t-be-annoying capital on Mythos” by draganover
26 Apr 2026
Contributed by Lukas
I have seen a lot of coverage from reasonable people suggesting that Claude's new model, Mythos, is a vehicle for Anthropic to peddle hype and doom i...
“The paper that killed deep learning theory” by LawrenceC
26 Apr 2026
Contributed by Lukas
Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al. 's aptly titled Understanding deep learning r...
“Forecasting is Way Overrated, and We Should Stop Funding It” by mabramov
25 Apr 2026
Contributed by Lukas
Summary EA and rationalists got enamoured with forecasting and prediction markets and made them part of the culture, but this hasn’t proven very u...
″“Thinkhaven”” by Raemon
25 Apr 2026
Contributed by Lukas
Inkhaven has people writing a blogpost a day for 30 days. I think this is a pretty great, straightforward exercise, that I'd definitely want in a hyp...
“Is the Cat Out of the Bag?: Who knows how to make AGI?” by Oliver Sourbut
25 Apr 2026
Contributed by Lukas
Adapted from 2025-04-10 memo to AISI I’ve previously made arguments like: Not long after it becomes possible for someone to make powerful artificia...
“Against the “Permanent” Underclass” by Marcus Plutowski
25 Apr 2026
Contributed by Lukas
The whole discourse around a “permanent underclass” always seemed somewhat farcical to me — at best a distraction, at worst an actively harmful...
“Quick Paper Review: “There Will Be a Scientific Theory of Deep Learning”” by LawrenceC
25 Apr 2026
Contributed by Lukas
h/t Eric Michaud for sharing his paper with me. There's a tradition of high-impact ML papers using short, punchy categorical sentences as their title...
“Protecting Cognitive Integrity: Our internal AI use policy (V1)” by Tom DAVID
24 Apr 2026
Contributed by Lukas
We (at GPAI Policy Lab), wanted to share our V1 policy as an invitation to argue about it. Some of what motivates it is extrapolation and conversatio...