LessWrong (Curated & Popular)
Episodes
"Scientific breakthroughs of the year" by technicalities
17 Dec 2025
Contributed by Lukas
A couple of years ago, Gavin became frustrated with science journalism. No one was pulling together results across fields; the articles usually didn...
"A high integrity/epistemics political machine?" by Raemon
17 Dec 2025
Contributed by Lukas
I have goals that can only be reached via a powerful political machine. Probably a lot of other people around here share them. (Goals include “ensu...
"How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)" by Kaj_Sotala
16 Dec 2025
Contributed by Lukas
How it started I used to think that anything that LLMs said about having something like subjective experience or what it felt like on the inside was ...
“My AGI safety research—2025 review, ’26 plans” by Steven Byrnes
15 Dec 2025
Contributed by Lukas
Previous: 2024, 2022 “Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter.” –attributed ...
“Weird Generalization & Inductive Backdoors” by Jorio Cocola, Owain_Evans, dylan_f
14 Dec 2025
Contributed by Lukas
This is the abstract and introduction of our new paper. Links: 📜 Paper, 🐦 Twitter thread, 🌐 Project page, 💻 Code Authors: Jan Betley*, J...
“Insights into Claude Opus 4.5 from Pokémon” by Julian Bradshaw
13 Dec 2025
Contributed by Lukas
Credit: Nano Banana, with some text provided. You may be surprised to learn that ClaudePlaysPokemon is still running today, and that Claude still hasn...
“The funding conversation we left unfinished” by jenn
13 Dec 2025
Contributed by Lukas
People working in the AI industry are making stupid amounts of money, and word on the street is that Anthropic is going to have some sort of liquidit...
“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck
11 Dec 2025
Contributed by Lukas
Highly capable AI systems might end up deciding the future. Understanding what will drive those decisions is therefore one of the most important ques...
“Little Echo” by Zvi
09 Dec 2025
Contributed by Lukas
I believe that we will win. An echo of an old ad for the 2014 US men's World Cup team. It did not win. I was in Berkeley for the 2025 Secular So...
“A Pragmatic Vision for Interpretability” by Neel Nanda
08 Dec 2025
Contributed by Lukas
Executive Summary The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engine...
“AI in 2025: gestalt” by technicalities
08 Dec 2025
Contributed by Lukas
This is the editorial for this year's "Shallow Review of AI Safety". (It got long enough to stand alone.) Epistemic status: subjectiv...
“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky
07 Dec 2025
Contributed by Lukas
"How are you coping with the end of the world?" journalists sometimes ask me, and the true answer is something they have no hope of underst...
“An Ambitious Vision for Interpretability” by leogao
06 Dec 2025
Contributed by Lukas
The goal of ambitious mechanistic interpretability (AMI) is to fully understand how neural networks work. While some have pivoted towards more pragma...
“6 reasons why ‘alignment-is-hard’ discourse seems alien to human intuitions, and vice-versa” by Steven Byrnes
04 Dec 2025
Contributed by Lukas
Tl;dr AI alignment has a culture clash. On one side, the “technical-alignment-is-hard” / “rational agents” school-of-thought argues that we s...
“Three things that surprised me about technical grantmaking at Coefficient Giving (fka Open Phil)” by null
03 Dec 2025
Contributed by Lukas
Open Philanthropy's Coefficient Giving's Technical AI Safety team is hiring grantmakers. I thought this would be a good moment to share som...
“MIRI’s 2025 Fundraiser” by alexvermeer
02 Dec 2025
Contributed by Lukas
MIRI is running its first fundraiser in six years, targeting $6M. The first $1.6M raised will be matched 1:1 via an SFF grant. Fundraiser ends at mid...
“The Best Lack All Conviction: A Confusing Day in the AI Village” by null
01 Dec 2025
Contributed by Lukas
The AI Village is an ongoing experiment (currently running on weekdays from 10 a.m. to 2 p.m. Pacific time) in which frontier language models are giv...
“The Boring Part of Bell Labs” by Elizabeth
30 Nov 2025
Contributed by Lukas
It took me a long time to realize that Bell Labs was cool. You see, my dad worked at Bell Labs, and he has not done a single cool thing in his life e...
[Linkpost] “The Missing Genre: Heroic Parenthood - You can have kids and still punch the sun” by null
30 Nov 2025
Contributed by Lukas
This is a link post. I stopped reading when I was 30. You can fill in all the stereotypes of a girl with a book glued to her face during every meal, e...
“Writing advice: Why people like your quick bullshit takes better than your high-effort posts” by null
30 Nov 2025
Contributed by Lukas
Right now I’m coaching for Inkhaven, a month-long marathon writing event where our brave residents are writing a blog post every single day for the...
“Claude 4.5 Opus’ Soul Document” by null
30 Nov 2025
Contributed by Lukas
Summary As far as I understand and uncovered, a document for the character training for Claude is compressed in Claude's weights. The full docum...
“Unless its governance changes, Anthropic is untrustworthy” by null
29 Nov 2025
Contributed by Lukas
Anthropic is untrustworthy. This post provides arguments, asks questions, and documents some examples of Anthropic's leadership being misleading...
“Alignment remains a hard, unsolved problem” by null
27 Nov 2025
Contributed by Lukas
Thanks to (in alphabetical order) Joshua Batson, Roger Grosse, Jeremy Hadfield, Jared Kaplan, Jan Leike, Jack Lindsey, Monte MacDiarmid, Francesco Mo...
“Video games are philosophy’s playground” by Rachel Shu
26 Nov 2025
Contributed by Lukas
Crypto people have this saying: "cryptocurrencies are macroeconomics' playground." The idea is that blockchains let you cheaply spin u...
“Stop Applying And Get To Work” by plex
24 Nov 2025
Contributed by Lukas
TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs. If you... Have short timelines Have been struggl...
“Gemini 3 is Evaluation-Paranoid and Contaminated” by null
23 Nov 2025
Contributed by Lukas
TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its reality is fabricated. It can also reliably output...
“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato
22 Nov 2025
Contributed by Lukas
Abstract We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignme...
“Anthropic is (probably) not meeting its RSP security commitments” by habryka
21 Nov 2025
Contributed by Lukas
TLDR: An AI company's model weight security is at most as good as its compute providers' security. Anthropic has committed (with a bit of a...
“Varieties Of Doom” by jdp
20 Nov 2025
Contributed by Lukas
There has been a lot of talk about "p(doom)"over the last few years. This has always rubbed me the wrong waybecause "p(doom)" did...
“How Colds Spread” by RobertM
19 Nov 2025
Contributed by Lukas
It seems like a catastrophic civilizational failure that we don't have confident common knowledge of how colds spread. There have been a number ...
“New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” by Aaron_Scher, David Abecassis, Brian Abeyta, peterbarnett
19 Nov 2025
Contributed by Lukas
TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards a...
“Where is the Capital? An Overview” by johnswentworth
17 Nov 2025
Contributed by Lukas
When a new dollar goes into the capital markets, after being bundled and securitized and lent several times over, where does it end up? When society&...
“Problems I’ve Tried to Legibilize” by Wei Dai
17 Nov 2025
Contributed by Lukas
Looking back, it appears that much of my intellectual output could be described as legibilizing work, or trying to make certain problems in AI risk m...
“Do not hand off what you cannot pick up” by habryka
17 Nov 2025
Contributed by Lukas
Delegation is good! Delegation is the foundation of civilization! But in the depths of delegation madness breeds and evil rises. In my experience, t...
“7 Vicious Vices of Rationalists” by Ben Pace
17 Nov 2025
Contributed by Lukas
Vices aren't behaviors that one should never do. Rather, vices are behaviors that are fine and pleasurable to do in moderation, but tempting to ...
“Tell people as early as possible it’s not going to work out” by habryka
17 Nov 2025
Contributed by Lukas
Context: Post #4 in my sequence of private Lightcone Infrastructure memos edited for public consumption This week's principle is more about how ...
“Everyone has a plan until they get lied to the face” by Screwtape
16 Nov 2025
Contributed by Lukas
"Everyone has a plan until they get punched in the face." - Mike Tyson (The exact phrasing of that quote changes, this is my favourite.) I...
“Please, Don’t Roll Your Own Metaethics” by Wei Dai
14 Nov 2025
Contributed by Lukas
One day, when I was an interning at the cryptography research department of a large software company, my boss handed me an assignment to break a pseu...
“Paranoia rules everything around me” by habryka
14 Nov 2025
Contributed by Lukas
People sometimes make mistakes [citation needed]. The obvious explanation for most of those mistakes is that decision makers do not have access to th...
“Human Values ≠ Goodness” by johnswentworth
12 Nov 2025
Contributed by Lukas
There is a temptation to simply define Goodness as Human Values, or vice versa. Alas, we do not get to choose the definitions of commonly used words;...
“Condensation” by abramdemski
12 Nov 2025
Contributed by Lukas
Condensation: a theory of concepts is a model of concept-formation by Sam Eisenstat. Its goals and methods resemble John Wentworth's natural abs...
“Mourning a life without AI” by Nikola Jurkovic
10 Nov 2025
Contributed by Lukas
Recently, I looked at the one pair of winter boots I own, and I thought “I will probably never buy winter boots again.” The world as we know it p...
“Unexpected Things that are People” by Ben Goldhaber
09 Nov 2025
Contributed by Lukas
Cross-posted from https://bengoldhaber.substack.com/ It's widely known that Corporations are People. This is universally agreed to be a good thi...
“Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals” by Alexa Pan, ryan_greenblatt
06 Nov 2025
Contributed by Lukas
According to the Sonnet 4.5 system card, Sonnet 4.5 is much more likely than Sonnet 4 to mention in its chain-of-thought that it thinks it is being ev...
“Publishing academic papers on transformative AI is a nightmare” by Jakub Growiec
06 Nov 2025
Contributed by Lukas
I am a professor of economics. Throughout my career, I was mostly working on economic growth theory, and this eventually brought me to the topic of t...
“The Unreasonable Effectiveness of Fiction” by Raelifin
06 Nov 2025
Contributed by Lukas
[Meta: This is Max Harms. I wrote a novel about China and AGI, which comes out today. This essay from my fiction newsletter has been slightly modifie...
“Legible vs. Illegible AI Safety Problems” by Wei Dai
05 Nov 2025
Contributed by Lukas
Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy ...
“Lack of Social Grace is a Lack of Skill” by Screwtape
04 Nov 2025
Contributed by Lukas
1. I have claimed that one of the fundamental questions of rationality is “what am I about to do and what will happen next?” One of the domains...
[Linkpost] “I ate bear fat with honey and salt flakes, to prove a point” by aggliu
04 Nov 2025
Contributed by Lukas
This is a link post. Eliezer Yudkowsky did not exactly suggest that you should eat bear fat covered with honey and sprinkled with salt flakes. What he...
“What’s up with Anthropic predicting AGI by early 2027?” by ryan_greenblatt
04 Nov 2025
Contributed by Lukas
As far as I'm aware, Anthropic is the only AI company with official AGI timelines[1]: they expect AGI by early 2027. In their recommendations (f...
[Linkpost] “Emergent Introspective Awareness in Large Language Models” by Drake Thomas
03 Nov 2025
Contributed by Lukas
This is a link post. New Anthropic research (tweet, blog post, paper): We investigate whether large language models can introspect on their internal ...
[Linkpost] “You’re always stressed, your mind is always busy, you never have enough time” by mingyuan
03 Nov 2025
Contributed by Lukas
This is a link post. You have things you want to do, but there's just never time. Maybe you want to find someone to have kids with, or maybe you ...
“LLM-generated text is not testimony” by TsviBT
03 Nov 2025
Contributed by Lukas
Crosspost from my blog. Synopsis When we share words with each other, we don't only care about the words themselves. We care also—even primar...
“Post title: Why I Transitioned: A Case Study” by Fiora Sunshine
02 Nov 2025
Contributed by Lukas
An Overture Famously, trans people tend not to have great introspective clarity into their own motivations for transition. Intuitively, they tend to ...
“The Memetics of AI Successionism” by Jan_Kulveit
31 Oct 2025
Contributed by Lukas
TL;DR: AI progress and the recognition of associated risks are painful to think about. This cognitive dissonance acts as fertile ground in the memeti...
“How Well Does RL Scale?” by Toby_Ord
30 Oct 2025
Contributed by Lukas
This is the latest in a series of essays on AI Scaling. You can find the others on my site. Summary: RL-training for LLMs scales surprisingly poorly...
“An Opinionated Guide to Privacy Despite Authoritarianism” by TurnTrout
30 Oct 2025
Contributed by Lukas
I've created a highly specific and actionable privacy guide, sorted by importance and venturing several layers deep into the privacy iceberg. I ...
“Cancer has a surprising amount of detail” by Abhishaike Mahajan
30 Oct 2025
Contributed by Lukas
There is a very famous essay titled ‘Reality has a surprising amount of detail’. The thesis of the article is that reality is filled, just filled...
“AIs should also refuse to work on capabilities research” by Davidmanheim
29 Oct 2025
Contributed by Lukas
There's a strong argument that humans should stop trying to build more capable AI systems, or at least slow down progress. The risks are plausib...
“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky
27 Oct 2025
Contributed by Lukas
(23K words; best considered as nonfiction with a fictional-dialogue frame, not a proper short story.) Prologue: Klurl and Trapaucius were members of ...
“EU explained in 10 minutes” by Martin Sustrik
24 Oct 2025
Contributed by Lukas
If you want to understand a country, you should pick a similar country that you are already familiar with, research the differences between the two a...
“Cheap Labour Everywhere” by Morpheus
24 Oct 2025
Contributed by Lukas
I recently visited my girlfriend's parents in India. Here is what that experience taught me: Yudkowsky has this facebook post where he makes som...
[Linkpost] “Consider donating to AI safety champion Scott Wiener” by Eric Neyman
24 Oct 2025
Contributed by Lukas
This is a link post. Written in my personal capacity. Thanks to many people for conversations and comments. Written in less than 24 hours; sorry for a...
“Which side of the AI safety community are you in?” by Max Tegmark
23 Oct 2025
Contributed by Lukas
In recent years, I’ve found that people who self-identify as members of the AI safety community have increasingly split into two camps: Camp A) &qu...
“Doomers were right” by Algon
23 Oct 2025
Contributed by Lukas
There's an argument I sometimes hear against existential risks, or any other putative change that some are worried about, that goes something li...
“Do One New Thing A Day To Solve Your Problems” by Algon
22 Oct 2025
Contributed by Lukas
People don't explore enough. They rely on cached thoughts and actions to get through their day. Unfortunately, this doesn't lead to them ma...
“Humanity Learned Almost Nothing From COVID-19” by niplav
21 Oct 2025
Contributed by Lukas
Summary: Looking over humanity's response to the COVID-19 pandemic, almostsix years later, reveals that we've forgotten to fulfill our inte...
“Consider donating to Alex Bores, author of the RAISE Act” by Eric Neyman
20 Oct 2025
Contributed by Lukas
Written by Eric Neyman, in my personal capacity. The views expressed here are my own. Thanks to Zach Stein-Perlman, Jesse Richardson, and many others...
“Meditation is dangerous” by Algon
20 Oct 2025
Contributed by Lukas
Here's a story I've heard a couple of times. A youngish person is looking for some solutions to their depression, chronic pain, ennui or so...
“That Mad Olympiad” by Tomás B.
19 Oct 2025
Contributed by Lukas
"I heard Chen started distilling the day after he was born. He's only four years old, if you can believe it. He's written 18 novels. H...
“The ‘Length’ of ‘Horizons’” by Adam Scholl
17 Oct 2025
Contributed by Lukas
Current AI models are strange. They can speak—often coherently, sometimes even eloquently—which is wild. They can predict the structure of protei...
“Don’t Mock Yourself” by Algon
15 Oct 2025
Contributed by Lukas
About half a year ago, I decided to try stop insulting myself for two weeks. No more self-deprecating humour, calling myself a fool, or thinking I&ap...
“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd
14 Oct 2025
Contributed by Lukas
About me and this review: I don’t identify as a member of the rationalist community, and I haven’t thought much about AI risk. I read AstralCodex...
“The Most Common Bad Argument In These Parts” by J Bostock
12 Oct 2025
Contributed by Lukas
I've noticed an antipattern. It's definitely on the dark pareto-frontier of "bad argument" and "I see it all the time amongs...
“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn
11 Oct 2025
Contributed by Lukas
Intro LLMs being trained with RLVR (Reinforcement Learning from Verifiable Rewards) start off with a 'chain-of-thought' (CoT) in whatever l...
“I take antidepressants. You’re welcome” by Elizabeth
10 Oct 2025
Contributed by Lukas
It's amazing how much smarter everyone else gets when I take antidepressants. It makes sense that the drugs work on other people, because ther...
“Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks
10 Oct 2025
Contributed by Lukas
This is a link post for two papers that came out today: Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-...
“Hospitalization: A Review” by Logan Riggs
10 Oct 2025
Contributed by Lukas
I woke up Friday morning w/ a very sore left shoulder. I tried stretching it, but my left chest hurt too. Isn't pain on one side a sign of a hea...
“What, if not agency?” by abramdemski
09 Oct 2025
Contributed by Lukas
Sahil has been up to things. Unfortunately, I've seen people put effort into trying to understand and still bounce off. I recently talked to som...
“The Origami Men” by Tomás B.
08 Oct 2025
Contributed by Lukas
Of course, you must understand, I couldn't be bothered to act. I know weepers still pretend to try, but I wasn't a weeper, at least not the...
“A non-review of ‘If Anyone Builds It, Everyone Dies’” by boazbarak
06 Oct 2025
Contributed by Lukas
I was hoping to write a full review of "If Anyone Builds It, Everyone Dies" (IABIED Yudkowski and Soares) but realized I won't have ti...
“Notes on fatalities from AI takeover” by ryan_greenblatt
06 Oct 2025
Contributed by Lukas
Suppose misaligned AIs take over. What fraction of people will die? I'll discuss my thoughts on this question and my basic framework for thinkin...
“Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most ‘classic humans’ in a few decades.” by Raemon
04 Oct 2025
Contributed by Lukas
I wrote my recent Accelerando post to mostly stand on it's own as a takeoff scenario. But, the reason it's on my mind is that, if I imagine...
“Omelas Is Perfectly Misread” by Tobias H
03 Oct 2025
Contributed by Lukas
The Standard Reading If you've heard of Le Guin's ‘The Ones Who Walk Away from Omelas’, you probably know the basic idea. It's a go...
“Ethical Design Patterns” by AnnaSalamon
01 Oct 2025
Contributed by Lukas
Related to: Commonsense Good, Creative Good (and my comment); Ethical Injunctions. Epistemic status: I’m fairly sure “ethics” does useful work ...
“You’re probably overestimating how well you understand Dunning-Kruger” by abstractapplic
30 Sep 2025
Contributed by Lukas
I The popular conception of Dunning-Kruger is something along the lines of “some people are too dumb to know they’re dumb, and end up thinking th...
“Reasons to sell frontier lab equity to donate now rather than later” by Daniel_Eth, Ethan Perez
27 Sep 2025
Contributed by Lukas
Tl;dr: We believe shareholders in frontier labs who plan to donate some portion of their equity to reduce AI risk should consider liquidating and don...
“CFAR update, and New CFAR workshops” by AnnaSalamon
26 Sep 2025
Contributed by Lukas
Hi all! After about five years of hibernation and quietly getting our bearings,[1] CFAR will soon be running two pilot mainline workshops, and may ru...
“Why you should eat meat - even if you hate factory farming” by KatWoods
26 Sep 2025
Contributed by Lukas
Cross-posted from my Substack To start off with, I’ve been vegan/vegetarian for the majority of my life. I think that factory farming has caused m...
[Linkpost] “Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures” by Charbel-Raphaël
23 Sep 2025
Contributed by Lukas
This is a link post. Today, the Global Call for AI Red Lines was released and presented at the UN General Assembly. It was developed by the French Cen...
“This is a review of the reviews” by Recurrented
23 Sep 2025
Contributed by Lukas
This is a review of the reviews, a meta review if you will, but first a tangent. and then a history lesson. This felt boring and obvious and somewhat...
“The title is reasonable” by Raemon
21 Sep 2025
Contributed by Lukas
I'm annoyed by various people who seem to be complaining about the book title being "unreasonable" – who don't merely disagree ...
“The Problem with Defining an ‘AGI Ban’ by Outcome (a lawyer’s take).” by Katalina Hernandez
21 Sep 2025
Contributed by Lukas
TL;DR Most “AGI ban” proposals define AGI by outcome: whatever potentially leads to human extinction. That's legally insufficient: regulatio...
“Contra Collier on IABIED” by Max Harms
20 Sep 2025
Contributed by Lukas
Clara Collier recently reviewed If Anyone Builds It, Everyone Dies in Asterisk Magazine. I’ve been a reader of Asterisk since the beginning and had...
“You can’t eval GPT5 anymore” by Lukas Petersson
20 Sep 2025
Contributed by Lukas
The GPT-5 API is aware of today's date (no other model provider does this). This is problematic because the model becomes aware that it is in a ...
“Teaching My Toddler To Read” by maia
20 Sep 2025
Contributed by Lukas
I have been teaching my oldest son to read with Anki and techniques recommended here on LessWrong as well as in Larry Sanger's post, and it&apos...
“Safety researchers should take a public stance” by Ishual, Mateusz Bagiński
20 Sep 2025
Contributed by Lukas
[Co-written by Mateusz Bagiński and Samuel Buteau (Ishual)] TL;DR Many X-risk-concerned people who join AI capabilities labs with the intent to cont...
“The Company Man” by Tomás B.
19 Sep 2025
Contributed by Lukas
To get to the campus, I have to walk past the fentanyl zombies. I call them fentanyl zombies because it helps engender a sort of detached, low-empath...
“Christian homeschoolers in the year 3000” by Buck
19 Sep 2025
Contributed by Lukas
[I wrote this blog post as part of the Asterisk Blogging Fellowship. It's substantially an experiment in writing more breezily and concisely tha...
“I enjoyed most of IABED” by Buck
17 Sep 2025
Contributed by Lukas
I listened to "If Anyone Builds It, Everyone Dies" today. I think the first two parts of the book are the best available explanation of the...