LessWrong (30+ Karma)
Episodes
“Re-rolling environment” by Raemon
01 Nov 2025
Contributed by Lukas
I'm currently on a "rationality as 'skills you practice'" kick. I'm really into subtle cognitive skills. I do think they eventually pay off. But, rea...
“LLM-generated text is not testimony” by TsviBT
01 Nov 2025
Contributed by Lukas
Crosspost from my blog. Synopsis When we share words with each other, we don't only care about the words themselves. We care also—even primaril...
“Supervillain Monologues Are Unrealistic” by Algon
01 Nov 2025
Contributed by Lukas
Supervillain monologues are strange. Not because a supervillain telling someone their evil plan is weird. In fact, that's what we should actually exp...
“Anthropic’s Pilot Sabotage Risk Report” by dmz
01 Nov 2025
Contributed by Lukas
As practice for potential future Responsible Scaling Policy obligations, we're releasing a report on misalignment risk posed by our deployed models a...
“OpenAI Moves To Complete Potentially The Largest Theft In Human History” by Zvi
31 Oct 2025
Contributed by Lukas
OpenAI is now set to become a Public Benefit Corporation, with its investors entitled to uncapped profit shares. Its nonprofit foundation will retain...
“Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence” by David Lorell
31 Oct 2025
Contributed by Lukas
Audio note: this article contains 86 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...
“Steering Evaluation-Aware Models to Act Like They Are Deployed” by Tim Hua, andrq, Sam Marks, Neel Nanda
30 Oct 2025
Contributed by Lukas
📄Paper, 🖥️Code, 🤖Evaluation Aware Model Organism TL, DR:; We train an evaluation-aware LLM. Specifically, we train a model organism ...
[Linkpost] “AISLE discovered three new OpenSSL vulnerabilities” by Jan_Kulveit
30 Oct 2025
Contributed by Lukas
This is a link post. The company post is linked; it seems like an update on where we are with automated cybersec. So far in 2025, only four security ...
“Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals” by Alexa Pan, ryan_greenblatt
30 Oct 2025
Contributed by Lukas
According to the Sonnet 4.5 system card, Sonnet 4.5 is much more likely than Sonnet 4 to mention in its chain-of-thought that it thinks it is being ev...
“ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents” by Ziqian Zhong
30 Oct 2025
Contributed by Lukas
This is a post about our recent work ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases (with Aditi Raghunathan, Nicholas Carlini) ...
[Linkpost] “Emergent Introspective Awareness in Large Language Models” by Drake Thomas
30 Oct 2025
Contributed by Lukas
This is a link post. New Anthropic research (tweet, blog post, paper): We investigate whether large language models can introspect on their internal ...
“An Opinionated Guide to Privacy Despite Authoritarianism” by TurnTrout
29 Oct 2025
Contributed by Lukas
I've created a highly specific and actionable privacy guide, sorted by importance and venturing several layers deep into the privacy iceberg. I start...
“The End of OpenAI’s Nonprofit Era” by garrison
29 Oct 2025
Contributed by Lukas
Key regulators have agreed to let the company kill its profit caps and restructure as a for-profit — with some strings attached This is the full te...
“Please Do Not Sell B30A Chips to China” by Zvi
29 Oct 2025
Contributed by Lukas
The Chinese and Americans are currently negotiating a trade deal. There are plenty of ways to generate a win-win deal, and early signs of this are pr...
“AI Craziness Mitigation Efforts” by Zvi
29 Oct 2025
Contributed by Lukas
AI chatbots in general, and OpenAI and ChatGPT and especially GPT-4o the absurd sycophant in particular, have long had a problem with issues around me...
“Some data from LeelaPieceOdds” by Jeremy Gillen
29 Oct 2025
Contributed by Lukas
I've been curious about how good LeelaPieceOdds is, so I downloaded a bunch of data and graphed it. For context, Leela is a chess bot and this versio...
“When Will AI Transform the Economy?” by Andre.Infante
29 Oct 2025
Contributed by Lukas
Substack version here: https://andreinfante.substack.com/p/when-will-ai-transform-the-economy A caricature of a common Twitter argument: ”Hey it...
“Workshop on Post-AGI Economics, Culture, and Governance” by Raymond Douglas, Jan_Kulveit, scasper, David Duvenaud
29 Oct 2025
Contributed by Lukas
This is an announcement and call for applications to the Workshop on Post-AGI Economics, Culture, and Governance taking place in San Diego on Wednesd...
“Introducing the Epoch Capabilities Index (ECI)” by luke_emberson, YafahEdelman, Jsevillamol
29 Oct 2025
Contributed by Lukas
We at Epoch AI have recently released a new composite AI capability index called the Epoch Capabilities Index (ECI), based on nearly 40 underlying be...
“Mottes and Baileys in AI discourse” by Raemon
29 Oct 2025
Contributed by Lukas
This post kinda necessarily needs to touch multiple political topics at once. Please, everyone, be careful. If it looks like you haven't read the Les...
“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson
29 Oct 2025
Contributed by Lukas
TLDR: Andon Labs, evaluates AI in the real world to measure capabilities and to see what can go wrong. For example, we previously made LLMs operate v...
“The Memetics of AI Successionism” by Jan_Kulveit
28 Oct 2025
Contributed by Lukas
TL;DR: AI progress and the recognition of associated risks are painful to think about. This cognitive dissonance acts as fertile ground in the memeti...
“All the lab’s AI safety Plans: 2025 edition” by Algon
28 Oct 2025
Contributed by Lukas
Three out of three CEOs of top AI companies agree: "Mitigating the risk of extinction from AI should be a global priority." How do they plan to do th...
“life lessons from trading” by thiccythot
28 Oct 2025
Contributed by Lukas
crossposted from my substack 1) Traders get paid to guess where money will move. 2) There are many ways people win guessing games: Some people read ...
“Stability of natural latents in information theoretic terms” by Aram Ebtekar
27 Oct 2025
Contributed by Lukas
Audio note: this article contains 32 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...
“AIs should also refuse to work on capabilities research” by Davidmanheim
27 Oct 2025
Contributed by Lukas
There's a strong argument that humans should stop trying to build more capable AI systems, or at least slow down progress. The risks are plausibly la...
“FWIW: What I noticed at a (Goenka) Vipassana retreat” by David Gross
27 Oct 2025
Contributed by Lukas
tl;dr: I went to a typical 10-day Vipassana Center retreat. I had some hopes going in for what I might get out of it and those were mostly fulfilled....
“Cancer has a surprising amount of detail” by Abhishaike Mahajan
27 Oct 2025
Contributed by Lukas
There is a very famous essay titled ‘Reality has a surprising amount of detail’. The thesis of the article is that reality is filled, just filled...
“Credit goes to the presenter, not the inventor” by Algon
27 Oct 2025
Contributed by Lukas
VN: Hey M, you come up with a name for the architecture yet? M: No, we've been busy. VN: Buddy, it takes all of 5 seconds to come up with a name. M:...
“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky
27 Oct 2025
Contributed by Lukas
(23K words; best considered as nonfiction with a fictional-dialogue frame, not a proper short story.) Prologue: Klurl and Trapaucius were members of ...
“Brightline is Actually Pretty Dangerous” by jefftk
26 Oct 2025
Contributed by Lukas
Per the Atlantic's A 'Death Train' is Haunting South Florida: According to Federal Railroad Administration data, the Brightline has been in...
“Seven-ish Words from My Thought-Language” by Lorxus
26 Oct 2025
Contributed by Lukas
(With thanks to @TsviBT, @Lucie Philippon, and @johnswentworth for encouragement and feedback, among many.) Seven entries from a dictionary that will...
“Origins and dangers of future AI capability denial” by Patrick Spencer
25 Oct 2025
Contributed by Lukas
In rationalist spheres, there's a fairly clear consensus that whatever AI's ultimate impact will be, it is at its core a capable technology that will...
“Beliefs about formal methods and AI safety” by Quinn
25 Oct 2025
Contributed by Lukas
I appreciate Theodore Ehrenborg's comments. As a wee lad, I heard about mathematical certainty of computer programs. Let's go over what I currently ...
“Reminder: Morality is unsolved” by Jesper L.
25 Oct 2025
Contributed by Lukas
Here is a game you can play with yourself, or others: a) You have to decide on a moral framework that can be explained in detail, to anyone. b) It wi...
[Linkpost] “AI Timelines and Points of no return” by Gabriel Alfour
25 Oct 2025
Contributed by Lukas
This is a link post. In this essay, I introduce two Points of No Return (PNR): The Hard PNR. The moment where we have AI systems powerful and intelli...
“Musings on Reported Cost of Compute (Oct 2025)” by Vladimir_Nesov
25 Oct 2025
Contributed by Lukas
There are many ways in which costs of compute get reported. A 1 GW datacenter site costs $10-15bn in the infrastructure (buildings, cooling, power), ...
[Linkpost] “LW Reacts pack for Discord/Slack/etc” by plex
25 Oct 2025
Contributed by Lukas
This is a link post. Ever wanted to say Scout Mindset to someone on a chat platform, but didn't want to have to type 13 characters? Or wanted your ser...
“Regardless of X, you can still just sign superintelligence-statement.org if you agree” by Ishual
24 Oct 2025
Contributed by Lukas
TL;DR: you can still just sign this statement if you agree with it. It still matters, and you can clarify your position in a statement of support (60...
“New Statement Calls For Not Building Superintelligence For Now” by Zvi
24 Oct 2025
Contributed by Lukas
Building superintelligence poses large existential risks. Also known as: If Anyone Builds It, Everyone Dies. Where ‘it’ is superintelligence, and...
“AI #139: The Overreach Machines” by Zvi
24 Oct 2025
Contributed by Lukas
The big release this week was OpenAI giving us a new browser, called Atlas. The idea of Atlas is that it is Chrome, except with ChatGPT integrated t...
“Plan 1 and Plan 2” by Towards_Keeperhood
24 Oct 2025
Contributed by Lukas
Max Tegmark recently published a post "Which side of the AI safety community are you in?", where he carves the AI safety community into 2 camps: Camp...
“How an AI company CEO could quietly take over the world” by Alex Kastner
24 Oct 2025
Contributed by Lukas
Cross-posted from the AI Futures Project Substack. This post outlines a concrete scenario for how takeover by an AI company CEO might go, which I dev...
“The main way I’ve seen people turn ideologically crazy [Linkpost]” by Noosphere89
24 Oct 2025
Contributed by Lukas
This linkpost is in part a response to @Raemon's comment about why the procedure Raemon did doesn't work in practice to deal with the selection effec...
“Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?” by Alek Westover
24 Oct 2025
Contributed by Lukas
There is some concern that training AI systems on content predicting AI misalignment will hyperstition AI systems into misalignment. This has been di...
“Homomorphically encrypted consciousness and its implications” by jessicata
23 Oct 2025
Contributed by Lukas
I present a step-by-step argument in philosophy of mind. The main conclusion is that it is probably possible for conscious homomorphically encrypted ...
“Learning to Interpret Weight Differences in Language Models” by avichal
23 Oct 2025
Contributed by Lukas
Audio note: this article contains 38 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...
“Utopiography Interview” by plex
23 Oct 2025
Contributed by Lukas
It serves people well to mostly build towards a good future rather than getting distracted by the shape of utopia, but having a vision of where we wa...
[Linkpost] “Statement on Superintelligence - FLI Open Letter” by plex
23 Oct 2025
Contributed by Lukas
This is a link post. We call for a prohibition on the development of superintelligence, not lifted before there is broad scientific consensus that it...
“Doomers were right” by Algon
23 Oct 2025
Contributed by Lukas
There's an argument I sometimes hear against existential risks, or any other putative change that some are worried about, that goes something like th...
“Penny’s Hands” by Tomás B.
22 Oct 2025
Contributed by Lukas
It is a strange thing to love another. I had not had much experience. It was Penny, of course, who I fell for. I suppose everyone fell in love with h...
“Which side of the AI safety community are you in?” by Max Tegmark
22 Oct 2025
Contributed by Lukas
In recent years, I’ve found that people who self-identify as members of the AI safety community have increasingly split into two camps: Camp A) "Ra...
“Postrationality: An Oral History” by Gordon Seidoh Worley
22 Oct 2025
Contributed by Lukas
Last week I gave an invited talk as part of the Integral Altruism speaker series. A recording of the talk and the extensive Q&A is up on YouTube;...
[Linkpost] “Consider donating to AI safety champion Scott Wiener” by Eric Neyman
22 Oct 2025
Contributed by Lukas
This is a link post. Written in my personal capacity. Thanks to many people for conversations and comments. Written in less than 24 hours; sorry for a...
“How Well Does RL Scale?” by Toby_Ord
22 Oct 2025
Contributed by Lukas
This is the latest in a series of essays on AI Scaling. You can find the others on my site. Summary: RL-training for LLMs scales surprisingly poorly...
“White House OSTP AI Deregulation Public Comment Period Ends Oct. 27” by Zack_M_Davis
22 Oct 2025
Contributed by Lukas
The White House's Office of Science and Technology Policy has issued a request for information (RFI) relevant to regulations that "unnecessarily hind...
“Is 90% of code at Anthropic being written by AIs?” by ryan_greenblatt
22 Oct 2025
Contributed by Lukas
In March 2025, Dario Amodei (CEO of Anthropic) said that he expects AI to be writing 90% of the code in 3 to 6 months and that AI might be writing es...
“Stratified Utopia” by Cleo Nardo
22 Oct 2025
Contributed by Lukas
Summary: "Stratified utopia" is a possible outcome where mundane values get proximal resources (near Earth in space and time) and exotic values get d...
“Remarks on Bayesian studies from 1963” by dynomight
22 Oct 2025
Contributed by Lukas
In 1963, Mosteller and Wallace published Inference in an Authorship Problem, which used Bayesian statistics to try to infer who wrote some of the dis...
[Linkpost] “21st Century Civilization curriculum” by Richard_Ngo
21 Oct 2025
Contributed by Lukas
This is a link post. I’ve just released a curriculum on foundational questions in modern politics, which I drew up in collaboration with Samo Burja....
“Symbiogenesis vs. Convergent Consequentialism” by Audrey Tang, plex
21 Oct 2025
Contributed by Lukas
(Cross-posted from SayIt archive.) Background for conversation: After an exchange in the comments of Audrey's LW post where plex suggested various re...
“EU explained in 10 minutes” by Martin Sustrik
21 Oct 2025
Contributed by Lukas
If you want to understand a country, you should pick a similar country that you are already familiar with, research the differences between the two a...
“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian
20 Oct 2025
Contributed by Lukas
tl;dr: I share a curated set of transcripts of models successfully executing message passing steganography from our recent paper. I then give a few t...
[Linkpost] “How Stuart Buck funded the replication crisis” by Elizabeth
20 Oct 2025
Contributed by Lukas
This is a link post. Stuart Buck has perhaps the largest shapely value of any one individual in creating the replication crisis first in psychology, a...
“Consider donating to Alex Bores, author of the RAISE Act” by Eric Neyman
20 Oct 2025
Contributed by Lukas
Written by Eric Neyman, in my personal capacity. The views expressed here are my own. Thanks to Zach Stein-Perlman, Jesse Richardson, and many others...
“Considerations around career costs of political donations” by GradientDissenter
20 Oct 2025
Contributed by Lukas
I’m close to a single-issue voter/donor. I tend to like politicians who show strong support for AI safety, because I think it's an incredibly import...
“Bubble, Bubble, Toil and Trouble” by Zvi
20 Oct 2025
Contributed by Lukas
We have the classic phenomenon where suddenly everyone decided it is good for your social status to say we are in an ‘AI bubble.’ Are these peop...
“Scenes, cliques and teams - a high level ontology of groups” by Tobes
20 Oct 2025
Contributed by Lukas
Ontological status: Yes, this is ontology Groups of people are one of the most important things. If I were to list all the things and rank them by im...
“Frontier LLM Race/Sex Exchange Rates” by Arjun Panickssery
20 Oct 2025
Contributed by Lukas
This is a cross-post (with permission) of Arctotherium's post yesterday "LLM Exchange Rates, Updated." It uses a similar methodology to the CAIS "Uti...
“Humanity Learned Almost Nothing From COVID-19” by niplav
19 Oct 2025
Contributed by Lukas
Summary: Looking over humanity's response to the COVID-19 pandemic, almost six years later, reveals that we've forgotten to fulfill our intent at pre...
“AI #138 Part 2: Watch Out For Documents” by Zvi
19 Oct 2025
Contributed by Lukas
As usual when things split, Part 1 is mostly about capabilities, and Part 2 is mostly about a mix of policy and alignment. Table of Contents ...
“The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs’ undecoded outputs” by Lovre
19 Oct 2025
Contributed by Lukas
Audio note: this article contains 225 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in th...
“Give Me Your Data: The Rationalist Mind Meld” by Taylor G. Lunt
19 Oct 2025
Contributed by Lukas
I don’t want your rationality. I can supply my own, thank you very much. I want your data. If you spot a logical error in my thinking, then please ...
“Meditation is dangerous” by Algon
18 Oct 2025
Contributed by Lukas
Here's a story I've heard a couple of times. A youngish person is looking for some solutions to their depression, chronic pain, ennui or some other c...
“I’m an EA who benefitted from rationality” by juliawise
17 Oct 2025
Contributed by Lukas
This is my personal take, not an organizational one. Originally written May 2025, revived for the EA Forum's Draft Amnesty Week. Cross-posted from th...
“Finding Features in Neural Networks with the Empirical NTK” by jylin04
17 Oct 2025
Contributed by Lukas
Audio note: this article contains 63 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...
“Reducing risk from scheming by studying trained-in scheming behavior” by ryan_greenblatt
17 Oct 2025
Contributed by Lukas
In a previous post, I discussed mitigating risks from scheming by studying examples of actual scheming AIs.[1] In this post, I'll discuss an alternat...
“Rogue internal deployments via external APIs” by Fabien Roger, Buck
16 Oct 2025
Contributed by Lukas
Once AI companies build powerful AIs, they may: Give internal AIs access to sensitive internal privileges (e.g. access to the internal infra that to...
“Cheap Labour Everywhere” by Morpheus
16 Oct 2025
Contributed by Lukas
I recently visited my girlfriend's parents in India. Here is what that experience taught me: Yudkowsky has this facebook post where he makes some in...
“[draft amnesty] A New Global Risk: Large Comet’s Impact on Sun Could Cause Fires on Earth” by avturchin
16 Oct 2025
Contributed by Lukas
There are several scientific papers that claim that the variability in luminosity of young stars can be explained by the collisions of these stars wi...
“How I Became a 5x Engineer with Claude Code” by Gordon Seidoh Worley
15 Oct 2025
Contributed by Lukas
Claude Code has radically changed what it means for me to be a programmer. It's made me much more productive. I’m able to get work done in hours th...
“That Mad Olympiad” by Tomás B.
15 Oct 2025
Contributed by Lukas
"I heard Chen started distilling the day after he was born. He's only four years old, if you can believe it. He's written 18 novels. His first words ...
“It will cost you nothing to ‘bribe’ a Utilitarian” by Gabriel Alfour
15 Oct 2025
Contributed by Lukas
Audio note: this article contains 41 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...
“The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work” by Elizabeth
15 Oct 2025
Contributed by Lukas
On some level, calories in calories out has to be true. But these variables are not independent. Bodies respond to exercise by getting hungry and to ...
“Recontextualization Mitigates Specification Gaming Without Modifying the Specification” by vgillioz, TurnTrout, cloud, ariana_azarbal
14 Oct 2025
Contributed by Lukas
Recontextualization distills good behavior into a context which allows bad behavior. More specifically, recontextualization is a modification to RL w...
“The ‘Length’ of ‘Horizons’” by Adam Scholl
14 Oct 2025
Contributed by Lukas
Current AI models are strange. They can speak—often coherently, sometimes even eloquently—which is wild. They can predict the structure of protei...
“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger
14 Oct 2025
Contributed by Lukas
tl;dr: We fine-tune or few-shot LLMs to use reasoning encoded with simple ciphers (e.g. base64, rot13, putting a dot between each letter) to solve ma...
“The Mom Test for AI Extinction Scenarios” by Taylor G. Lunt
14 Oct 2025
Contributed by Lukas
(Also posted to my Substack; written as part of the Halfhaven virtual blogging camp.) Let's set aside the question of whether or not superintelligen...
“How AI Manipulates—A Case Study” by Adele Lopez
14 Oct 2025
Contributed by Lukas
If there is only one thing you take away from this article, let it be this: THOU SHALT NOT ALLOW ANOTHER TO MODIFY THINE SELF-IMAGE This a...
“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd
14 Oct 2025
Contributed by Lukas
About me and this review: I don’t identify as a member of the rationalist community, and I haven’t thought much about AI risk. I read AstralCodex...
“Making legible that many experts think we are not on track for a good future, barring some international cooperation” by Mateusz Bagiński, Ishual
13 Oct 2025
Contributed by Lukas
[Context: This post is aimed at all readers[1] who broadly agree that the current race toward superintelligence is bad, that stopping would be good, ...
“OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53” by Zvi
13 Oct 2025
Contributed by Lukas
A little over a month ago, I documented how OpenAI had descended into paranoia and bad faith lobbying surrounding California's SB 53. This included ...
“Sublinear Utility in Population and other Uncommon Utilitarianism” by Alice Blair
13 Oct 2025
Contributed by Lukas
Content warning: Anthropics, Moral Philosophy, and Shrimp This post isn't trying to be self contained, since I have so many disparate thoughts about ...
[Linkpost] “Pause House, Blackpool” by Greg C
13 Oct 2025
Contributed by Lukas
This is a link post. Are you passionate about pushing for a global halt to AGI development? An international treaty banning superintelligent AI? Pausi...
“The Narcissistic Spectrum” by Dawn Drescher
13 Oct 2025
Contributed by Lukas
Pathological narcissism is a fortress built against unbearable pain. Some fortresses are sculpted from glass, some hewn from granite. My six-tier spe...
“Don’t Mock Yourself” by Algon
13 Oct 2025
Contributed by Lukas
About half a year ago, I decided to try stop insulting myself for two weeks. No more self-deprecating humour, calling myself a fool, or thinking I'm ...
“Emil the Moose” by Martin Sustrik
11 Oct 2025
Contributed by Lukas
The travels of Emil the Moose since he entered Czechia in mid-June. Moose became extinct in most of Germany around 1000 CE, and in Bohemia, Moravia, A...
“Experiments With Sonnet 4.5 Fiction” by Tomás B.
11 Oct 2025
Contributed by Lukas
Note, this post contains outputs of an LLM. I do not use LLMs in any of my fiction and do not claim this story as my own. I have been having fun wri...
“The Most Common Bad Argument In These Parts” by J Bostock
11 Oct 2025
Contributed by Lukas
I've noticed an antipattern. It's definitely on the dark pareto-frontier of "bad argument" and "I see it all the time amongst smart people". I'm conf...
“Iterated Development and Study of Schemers (IDSS)” by ryan_greenblatt
10 Oct 2025
Contributed by Lukas
In a previous post, we discussed prospects for studying scheming using natural examples. In this post, we'll describe a more detailed proposal for it...