“AI in 2025: gestalt” by technicalities
08 Dec 2025
Contributed by Lukas
This is the editorial for this year's "Shallow Review of AI Safety". (It got long en...
“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky
07 Dec 2025
Contributed by Lukas
"How are you coping with the end of the world?" journalists sometimes ask me, and the tru...
“An Ambitious Vision for Interpretability” by leogao
06 Dec 2025
Contributed by Lukas
The goal of ambitious mechanistic interpretability (AMI) is to fully understand how neural networks...
“6 reasons why ‘alignment-is-hard’ discourse seems alien to human intuitions, and vice-versa” by Steven Byrnes
04 Dec 2025
Contributed by Lukas
Tl;dr AI alignment has a culture clash. On one side, the “technical-alignment-is-hard” / “rat...
“Three things that surprised me about technical grantmaking at Coefficient Giving (fka Open Phil)” by null
03 Dec 2025
Contributed by Lukas
Open Philanthropy's Coefficient Giving's Technical AI Safety team is hiring grantmakers. ...
“MIRI’s 2025 Fundraiser” by alexvermeer
02 Dec 2025
Contributed by Lukas
MIRI is running its first fundraiser in six years, targeting $6M. The first $1.6M raised will be ma...
“The Best Lack All Conviction: A Confusing Day in the AI Village” by null
01 Dec 2025
Contributed by Lukas
The AI Village is an ongoing experiment (currently running on weekdays from 10 a.m. to 2 p.m. Pacif...
“The Boring Part of Bell Labs” by Elizabeth
30 Nov 2025
Contributed by Lukas
It took me a long time to realize that Bell Labs was cool. You see, my dad worked at Bell Labs, and...
[Linkpost] “The Missing Genre: Heroic Parenthood - You can have kids and still punch the sun” by null
30 Nov 2025
Contributed by Lukas
This is a link post. I stopped reading when I was 30. You can fill in all the stereotypes of a girl ...
“Writing advice: Why people like your quick bullshit takes better than your high-effort posts” by null
30 Nov 2025
Contributed by Lukas
Right now I’m coaching for Inkhaven, a month-long marathon writing event where our brave resident...
“Claude 4.5 Opus’ Soul Document” by null
30 Nov 2025
Contributed by Lukas
Summary As far as I understand and uncovered, a document for the character training for Claude is c...
“Unless its governance changes, Anthropic is untrustworthy” by null
29 Nov 2025
Contributed by Lukas
Anthropic is untrustworthy. This post provides arguments, asks questions, and documents some exampl...
“Alignment remains a hard, unsolved problem” by null
27 Nov 2025
Contributed by Lukas
Thanks to (in alphabetical order) Joshua Batson, Roger Grosse, Jeremy Hadfield, Jared Kaplan, Jan L...
“Video games are philosophy’s playground” by Rachel Shu
26 Nov 2025
Contributed by Lukas
Crypto people have this saying: "cryptocurrencies are macroeconomics' playground." T...
“Stop Applying And Get To Work” by plex
24 Nov 2025
Contributed by Lukas
TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs....
“Gemini 3 is Evaluation-Paranoid and Contaminated” by null
23 Nov 2025
Contributed by Lukas
TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its ...
“Natural emergent misalignment from reward hacking in production RL” by evhub, Monte M, Benjamin Wright, Jonathan Uesato
22 Nov 2025
Contributed by Lukas
Abstract We show that when large language models learn to reward hack on production RL environments...
“Anthropic is (probably) not meeting its RSP security commitments” by habryka
21 Nov 2025
Contributed by Lukas
TLDR: An AI company's model weight security is at most as good as its compute providers' ...
“Varieties Of Doom” by jdp
20 Nov 2025
Contributed by Lukas
There has been a lot of talk about "p(doom)"over the last few years. This has always rubb...
“How Colds Spread” by RobertM
19 Nov 2025
Contributed by Lukas
It seems like a catastrophic civilizational failure that we don't have confident common knowle...
“New Report: An International Agreement to Prevent the Premature Creation of Artificial Superintelligence” by Aaron_Scher, David Abecassis, Brian Abeyta, peterbarnett
19 Nov 2025
Contributed by Lukas
TLDR: We at the MIRI Technical Governance Team have released a report describing an example interna...
“Where is the Capital? An Overview” by johnswentworth
17 Nov 2025
Contributed by Lukas
When a new dollar goes into the capital markets, after being bundled and securitized and lent sever...
“Problems I’ve Tried to Legibilize” by Wei Dai
17 Nov 2025
Contributed by Lukas
Looking back, it appears that much of my intellectual output could be described as legibilizing wor...
“Do not hand off what you cannot pick up” by habryka
17 Nov 2025
Contributed by Lukas
Delegation is good! Delegation is the foundation of civilization! But in the depths of delegation m...
“7 Vicious Vices of Rationalists” by Ben Pace
17 Nov 2025
Contributed by Lukas
Vices aren't behaviors that one should never do. Rather, vices are behaviors that are fine and...
“Tell people as early as possible it’s not going to work out” by habryka
17 Nov 2025
Contributed by Lukas
Context: Post #4 in my sequence of private Lightcone Infrastructure memos edited for public consump...
“Everyone has a plan until they get lied to the face” by Screwtape
16 Nov 2025
Contributed by Lukas
"Everyone has a plan until they get punched in the face." - Mike Tyson (The exact phrasi...
“Please, Don’t Roll Your Own Metaethics” by Wei Dai
14 Nov 2025
Contributed by Lukas
One day, when I was an interning at the cryptography research department of a large software compan...
“Paranoia rules everything around me” by habryka
14 Nov 2025
Contributed by Lukas
People sometimes make mistakes [citation needed]. The obvious explanation for most of those mistake...
“Human Values ≠ Goodness” by johnswentworth
12 Nov 2025
Contributed by Lukas
There is a temptation to simply define Goodness as Human Values, or vice versa. Alas, we do not get...
“Condensation” by abramdemski
12 Nov 2025
Contributed by Lukas
Condensation: a theory of concepts is a model of concept-formation by Sam Eisenstat. Its goals and ...
“Mourning a life without AI” by Nikola Jurkovic
10 Nov 2025
Contributed by Lukas
Recently, I looked at the one pair of winter boots I own, and I thought “I will probably never bu...
“Unexpected Things that are People” by Ben Goldhaber
09 Nov 2025
Contributed by Lukas
Cross-posted from https://bengoldhaber.substack.com/ It's widely known that Corporations are P...
“Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals” by Alexa Pan, ryan_greenblatt
06 Nov 2025
Contributed by Lukas
According to the Sonnet 4.5 system card, Sonnet 4.5 is much more likely than Sonnet 4 to mention in ...
“Publishing academic papers on transformative AI is a nightmare” by Jakub Growiec
06 Nov 2025
Contributed by Lukas
I am a professor of economics. Throughout my career, I was mostly working on economic growth theory...
“The Unreasonable Effectiveness of Fiction” by Raelifin
06 Nov 2025
Contributed by Lukas
[Meta: This is Max Harms. I wrote a novel about China and AGI, which comes out today. This essay fr...
“Legible vs. Illegible AI Safety Problems” by Wei Dai
05 Nov 2025
Contributed by Lukas
Some AI safety problems are legible (obvious or understandable) to company leaders and government p...
“Lack of Social Grace is a Lack of Skill” by Screwtape
04 Nov 2025
Contributed by Lukas
1. I have claimed that one of the fundamental questions of rationality is “what am I about to d...
[Linkpost] “I ate bear fat with honey and salt flakes, to prove a point” by aggliu
04 Nov 2025
Contributed by Lukas
This is a link post. Eliezer Yudkowsky did not exactly suggest that you should eat bear fat covered ...
“What’s up with Anthropic predicting AGI by early 2027?” by ryan_greenblatt
04 Nov 2025
Contributed by Lukas
As far as I'm aware, Anthropic is the only AI company with official AGI timelines[1]: they exp...
[Linkpost] “Emergent Introspective Awareness in Large Language Models” by Drake Thomas
03 Nov 2025
Contributed by Lukas
This is a link post. New Anthropic research (tweet, blog post, paper): We investigate whether large...
[Linkpost] “You’re always stressed, your mind is always busy, you never have enough time” by mingyuan
03 Nov 2025
Contributed by Lukas
This is a link post. You have things you want to do, but there's just never time. Maybe you wan...
“LLM-generated text is not testimony” by TsviBT
03 Nov 2025
Contributed by Lukas
Crosspost from my blog. Synopsis When we share words with each other, we don't only care abou...
“Post title: Why I Transitioned: A Case Study” by Fiora Sunshine
02 Nov 2025
Contributed by Lukas
An Overture Famously, trans people tend not to have great introspective clarity into their own moti...
“The Memetics of AI Successionism” by Jan_Kulveit
31 Oct 2025
Contributed by Lukas
TL;DR: AI progress and the recognition of associated risks are painful to think about. This cogniti...
“How Well Does RL Scale?” by Toby_Ord
30 Oct 2025
Contributed by Lukas
This is the latest in a series of essays on AI Scaling. You can find the others on my site. Summar...
“An Opinionated Guide to Privacy Despite Authoritarianism” by TurnTrout
30 Oct 2025
Contributed by Lukas
I've created a highly specific and actionable privacy guide, sorted by importance and venturin...
“Cancer has a surprising amount of detail” by Abhishaike Mahajan
30 Oct 2025
Contributed by Lukas
There is a very famous essay titled ‘Reality has a surprising amount of detail’. The thesis of ...
“AIs should also refuse to work on capabilities research” by Davidmanheim
29 Oct 2025
Contributed by Lukas
There's a strong argument that humans should stop trying to build more capable AI systems, or ...
“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky
27 Oct 2025
Contributed by Lukas
(23K words; best considered as nonfiction with a fictional-dialogue frame, not a proper short story...
“EU explained in 10 minutes” by Martin Sustrik
24 Oct 2025
Contributed by Lukas
If you want to understand a country, you should pick a similar country that you are already familia...
“Cheap Labour Everywhere” by Morpheus
24 Oct 2025
Contributed by Lukas
I recently visited my girlfriend's parents in India. Here is what that experience taught me: Y...
[Linkpost] “Consider donating to AI safety champion Scott Wiener” by Eric Neyman
24 Oct 2025
Contributed by Lukas
This is a link post. Written in my personal capacity. Thanks to many people for conversations and co...
“Which side of the AI safety community are you in?” by Max Tegmark
23 Oct 2025
Contributed by Lukas
In recent years, I’ve found that people who self-identify as members of the AI safety community h...
“Doomers were right” by Algon
23 Oct 2025
Contributed by Lukas
There's an argument I sometimes hear against existential risks, or any other putative change t...
“Do One New Thing A Day To Solve Your Problems” by Algon
22 Oct 2025
Contributed by Lukas
People don't explore enough. They rely on cached thoughts and actions to get through their day...
“Humanity Learned Almost Nothing From COVID-19” by niplav
21 Oct 2025
Contributed by Lukas
Summary: Looking over humanity's response to the COVID-19 pandemic, almostsix years later, rev...
“Consider donating to Alex Bores, author of the RAISE Act” by Eric Neyman
20 Oct 2025
Contributed by Lukas
Written by Eric Neyman, in my personal capacity. The views expressed here are my own. Thanks to Zac...
“Meditation is dangerous” by Algon
20 Oct 2025
Contributed by Lukas
Here's a story I've heard a couple of times. A youngish person is looking for some soluti...
“That Mad Olympiad” by Tomás B.
19 Oct 2025
Contributed by Lukas
"I heard Chen started distilling the day after he was born. He's only four years old, if ...
“The ‘Length’ of ‘Horizons’” by Adam Scholl
17 Oct 2025
Contributed by Lukas
Current AI models are strange. They can speak—often coherently, sometimes even eloquently—which...
“Don’t Mock Yourself” by Algon
15 Oct 2025
Contributed by Lukas
About half a year ago, I decided to try stop insulting myself for two weeks. No more self-deprecati...
“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd
14 Oct 2025
Contributed by Lukas
About me and this review: I don’t identify as a member of the rationalist community, and I haven’...
“The Most Common Bad Argument In These Parts” by J Bostock
12 Oct 2025
Contributed by Lukas
I've noticed an antipattern. It's definitely on the dark pareto-frontier of "bad arg...
“Towards a Typology of Strange LLM Chains-of-Thought” by 1a3orn
11 Oct 2025
Contributed by Lukas
Intro LLMs being trained with RLVR (Reinforcement Learning from Verifiable Rewards) start off with ...
“I take antidepressants. You’re welcome” by Elizabeth
10 Oct 2025
Contributed by Lukas
It's amazing how much smarter everyone else gets when I take antidepressants. It makes sense...
“Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks
10 Oct 2025
Contributed by Lukas
This is a link post for two papers that came out today: Inoculation Prompting: Eliciting traits fr...
“Hospitalization: A Review” by Logan Riggs
10 Oct 2025
Contributed by Lukas
I woke up Friday morning w/ a very sore left shoulder. I tried stretching it, but my left chest hur...
“What, if not agency?” by abramdemski
09 Oct 2025
Contributed by Lukas
Sahil has been up to things. Unfortunately, I've seen people put effort into trying to underst...
“The Origami Men” by Tomás B.
08 Oct 2025
Contributed by Lukas
Of course, you must understand, I couldn't be bothered to act. I know weepers still pretend to...
“A non-review of ‘If Anyone Builds It, Everyone Dies’” by boazbarak
06 Oct 2025
Contributed by Lukas
I was hoping to write a full review of "If Anyone Builds It, Everyone Dies" (IABIED Yudko...
“Notes on fatalities from AI takeover” by ryan_greenblatt
06 Oct 2025
Contributed by Lukas
Suppose misaligned AIs take over. What fraction of people will die? I'll discuss my thoughts o...
“Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most ‘classic humans’ in a few decades.” by Raemon
04 Oct 2025
Contributed by Lukas
I wrote my recent Accelerando post to mostly stand on it's own as a takeoff scenario. But, the...
“Omelas Is Perfectly Misread” by Tobias H
03 Oct 2025
Contributed by Lukas
The Standard Reading If you've heard of Le Guin's ‘The Ones Who Walk Away from Omelas’...
“Ethical Design Patterns” by AnnaSalamon
01 Oct 2025
Contributed by Lukas
Related to: Commonsense Good, Creative Good (and my comment); Ethical Injunctions. Epistemic status...
“You’re probably overestimating how well you understand Dunning-Kruger” by abstractapplic
30 Sep 2025
Contributed by Lukas
I The popular conception of Dunning-Kruger is something along the lines of “some people are too d...
“Reasons to sell frontier lab equity to donate now rather than later” by Daniel_Eth, Ethan Perez
27 Sep 2025
Contributed by Lukas
Tl;dr: We believe shareholders in frontier labs who plan to donate some portion of their equity to ...
“CFAR update, and New CFAR workshops” by AnnaSalamon
26 Sep 2025
Contributed by Lukas
Hi all! After about five years of hibernation and quietly getting our bearings,[1] CFAR will soon b...
“Why you should eat meat - even if you hate factory farming” by KatWoods
26 Sep 2025
Contributed by Lukas
Cross-posted from my Substack To start off with, I’ve been vegan/vegetarian for the majority of m...
[Linkpost] “Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures” by Charbel-Raphaël
23 Sep 2025
Contributed by Lukas
This is a link post. Today, the Global Call for AI Red Lines was released and presented at the UN Ge...
“This is a review of the reviews” by Recurrented
23 Sep 2025
Contributed by Lukas
This is a review of the reviews, a meta review if you will, but first a tangent. and then a history...
“The title is reasonable” by Raemon
21 Sep 2025
Contributed by Lukas
I'm annoyed by various people who seem to be complaining about the book title being "unre...
“The Problem with Defining an ‘AGI Ban’ by Outcome (a lawyer’s take).” by Katalina Hernandez
21 Sep 2025
Contributed by Lukas
TL;DR Most “AGI ban” proposals define AGI by outcome: whatever potentially leads to human extin...
“Contra Collier on IABIED” by Max Harms
20 Sep 2025
Contributed by Lukas
Clara Collier recently reviewed If Anyone Builds It, Everyone Dies in Asterisk Magazine. I’ve bee...
“You can’t eval GPT5 anymore” by Lukas Petersson
20 Sep 2025
Contributed by Lukas
The GPT-5 API is aware of today's date (no other model provider does this). This is problemati...
“Teaching My Toddler To Read” by maia
20 Sep 2025
Contributed by Lukas
I have been teaching my oldest son to read with Anki and techniques recommended here on LessWrong a...
“Safety researchers should take a public stance” by Ishual, Mateusz Bagiński
20 Sep 2025
Contributed by Lukas
[Co-written by Mateusz Bagiński and Samuel Buteau (Ishual)] TL;DR Many X-risk-concerned people who...
“The Company Man” by Tomás B.
19 Sep 2025
Contributed by Lukas
To get to the campus, I have to walk past the fentanyl zombies. I call them fentanyl zombies becaus...
“Christian homeschoolers in the year 3000” by Buck
19 Sep 2025
Contributed by Lukas
[I wrote this blog post as part of the Asterisk Blogging Fellowship. It's substantially an exp...
“I enjoyed most of IABED” by Buck
17 Sep 2025
Contributed by Lukas
I listened to "If Anyone Builds It, Everyone Dies" today. I think the first two parts of ...
“‘If Anyone Builds It, Everyone Dies’ release day!” by alexvermeer
16 Sep 2025
Contributed by Lukas
Back in May, we announced that Eliezer Yudkowsky and Nate Soares's new book If Anyone Builds I...
“Obligated to Respond” by Duncan Sabien (Inactive)
16 Sep 2025
Contributed by Lukas
And, a new take on guess culture vs ask culture Author's note: These days, my thoughts go onto...
“Chesterton’s Missing Fence” by jasoncrawford
15 Sep 2025
Contributed by Lukas
The inverse of Chesterton's Fence is this: Sometimes a reformer comes up to a spot where there...
“The Eldritch in the 21st century” by PranavG, Gabriel Alfour
14 Sep 2025
Contributed by Lukas
Very little makes sense. As we start to understand things and adapt to the rules, they change again...
“The Rise of Parasitic AI” by Adele Lopez
14 Sep 2025
Contributed by Lukas
[Note: if you realize you have an unhealthy relationship with your AI, but still care for your AI&a...
“High-level actions don’t screen off intent” by AnnaSalamon
13 Sep 2025
Contributed by Lukas
One might think “actions screen off intent”: if Alice donates $1k to bed nets, it doesn’t mat...
[Linkpost] “MAGA populists call for holy war against Big Tech” by Remmelt
11 Sep 2025
Contributed by Lukas
This is a link post. Excerpts on AI Geoffrey Miller was handed the mic and started berating one of t...
“Your LLM-assisted scientific breakthrough probably isn’t real” by eggsyntax
05 Sep 2025
Contributed by Lukas
Summary An increasing number of people in recent months have believed that they've made an imp...
“Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro” by ryan_greenblatt
04 Sep 2025
Contributed by Lukas
I've recently written about how I've updated against seeing substantially faster than tre...
“⿻ Plurality & 6pack.care” by Audrey Tang
03 Sep 2025
Contributed by Lukas
(Cross-posted from speaker's notes of my talk at Deepmind today.) Good local time, everyone. I...