LessWrong (30+ Karma)

“Re-rolling environment” by Raemon

01 Nov 2025

Contributed by Lukas

I'm currently on a "rationality as 'skills you practice'" kick. I'm really into subtle cognitive skills. I do think they eventually pay off. But, rea...

“LLM-generated text is not testimony” by TsviBT

01 Nov 2025

Contributed by Lukas

Crosspost from my blog. Synopsis When we share words with each other, we don't only care about the words themselves. We care also—even primaril...

“Supervillain Monologues Are Unrealistic” by Algon

01 Nov 2025

Contributed by Lukas

Supervillain monologues are strange. Not because a supervillain telling someone their evil plan is weird. In fact, that's what we should actually exp...

“Anthropic’s Pilot Sabotage Risk Report” by dmz

01 Nov 2025

Contributed by Lukas

As practice for potential future Responsible Scaling Policy obligations, we're releasing a report on misalignment risk posed by our deployed models a...

“OpenAI Moves To Complete Potentially The Largest Theft In Human History” by Zvi

31 Oct 2025

Contributed by Lukas

OpenAI is now set to become a Public Benefit Corporation, with its investors entitled to uncapped profit shares. Its nonprofit foundation will retain...

“Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence” by David Lorell

31 Oct 2025

Contributed by Lukas

Audio note: this article contains 86 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...

“Steering Evaluation-Aware Models to Act Like They Are Deployed” by Tim Hua, andrq, Sam Marks, Neel Nanda

30 Oct 2025

Contributed by Lukas

📄Paper, 🖥️Code, 🤖Evaluation Aware Model Organism TL, DR:; We train an evaluation-aware LLM. Specifically, we train a model organism ...

[Linkpost] “AISLE discovered three new OpenSSL vulnerabilities” by Jan_Kulveit

30 Oct 2025

Contributed by Lukas

This is a link post. The company post is linked; it seems like an update on where we are with automated cybersec. So far in 2025, only four security ...

“Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals” by Alexa Pan, ryan_greenblatt

30 Oct 2025

Contributed by Lukas

According to the Sonnet 4.5 system card, Sonnet 4.5 is much more likely than Sonnet 4 to mention in its chain-of-thought that it thinks it is being ev...

“ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents” by Ziqian Zhong

30 Oct 2025

Contributed by Lukas

This is a post about our recent work ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases (with Aditi Raghunathan, Nicholas Carlini) ...

[Linkpost] “Emergent Introspective Awareness in Large Language Models” by Drake Thomas

30 Oct 2025

Contributed by Lukas

This is a link post. New Anthropic research (tweet, blog post, paper): We investigate whether large language models can introspect on their internal ...

“An Opinionated Guide to Privacy Despite Authoritarianism” by TurnTrout

29 Oct 2025

Contributed by Lukas

I've created a highly specific and actionable privacy guide, sorted by importance and venturing several layers deep into the privacy iceberg. I start...

“The End of OpenAI’s Nonprofit Era” by garrison

29 Oct 2025

Contributed by Lukas

Key regulators have agreed to let the company kill its profit caps and restructure as a for-profit — with some strings attached This is the full te...

“Please Do Not Sell B30A Chips to China” by Zvi

29 Oct 2025

Contributed by Lukas

The Chinese and Americans are currently negotiating a trade deal. There are plenty of ways to generate a win-win deal, and early signs of this are pr...

“AI Craziness Mitigation Efforts” by Zvi

29 Oct 2025

Contributed by Lukas

AI chatbots in general, and OpenAI and ChatGPT and especially GPT-4o the absurd sycophant in particular, have long had a problem with issues around me...

“Some data from LeelaPieceOdds” by Jeremy Gillen

29 Oct 2025

Contributed by Lukas

I've been curious about how good LeelaPieceOdds is, so I downloaded a bunch of data and graphed it. For context, Leela is a chess bot and this versio...

“When Will AI Transform the Economy?” by Andre.Infante

29 Oct 2025

Contributed by Lukas

Substack version here: https://andreinfante.substack.com/p/when-will-ai-transform-the-economy A caricature of a common Twitter argument: ”Hey it...

“Workshop on Post-AGI Economics, Culture, and Governance” by Raymond Douglas, Jan_Kulveit, scasper, David Duvenaud

29 Oct 2025

Contributed by Lukas

This is an announcement and call for applications to the Workshop on Post-AGI Economics, Culture, and Governance taking place in San Diego on Wednesd...

“Introducing the Epoch Capabilities Index (ECI)” by luke_emberson, YafahEdelman, Jsevillamol

29 Oct 2025

Contributed by Lukas

We at Epoch AI have recently released a new composite AI capability index called the Epoch Capabilities Index (ECI), based on nearly 40 underlying be...

“Mottes and Baileys in AI discourse” by Raemon

29 Oct 2025

Contributed by Lukas

This post kinda necessarily needs to touch multiple political topics at once. Please, everyone, be careful. If it looks like you haven't read the Les...

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

29 Oct 2025

Contributed by Lukas

TLDR: Andon Labs, evaluates AI in the real world to measure capabilities and to see what can go wrong. For example, we previously made LLMs operate v...

“The Memetics of AI Successionism” by Jan_Kulveit

28 Oct 2025

Contributed by Lukas

TL;DR: AI progress and the recognition of associated risks are painful to think about. This cognitive dissonance acts as fertile ground in the memeti...

“All the lab’s AI safety Plans: 2025 edition” by Algon

28 Oct 2025

Contributed by Lukas

Three out of three CEOs of top AI companies agree: "Mitigating the risk of extinction from AI should be a global priority." How do they plan to do th...

“life lessons from trading” by thiccythot

28 Oct 2025

Contributed by Lukas

crossposted from my substack 1) Traders get paid to guess where money will move. 2) There are many ways people win guessing games: Some people read ...

“Stability of natural latents in information theoretic terms” by Aram Ebtekar

27 Oct 2025

Contributed by Lukas

Audio note: this article contains 32 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...

“AIs should also refuse to work on capabilities research” by Davidmanheim

27 Oct 2025

Contributed by Lukas

There's a strong argument that humans should stop trying to build more capable AI systems, or at least slow down progress. The risks are plausibly la...

“FWIW: What I noticed at a (Goenka) Vipassana retreat” by David Gross

27 Oct 2025

Contributed by Lukas

tl;dr: I went to a typical 10-day Vipassana Center retreat. I had some hopes going in for what I might get out of it and those were mostly fulfilled....

“Cancer has a surprising amount of detail” by Abhishaike Mahajan

27 Oct 2025

Contributed by Lukas

There is a very famous essay titled ‘Reality has a surprising amount of detail’. The thesis of the article is that reality is filled, just filled...

“Credit goes to the presenter, not the inventor” by Algon

27 Oct 2025

Contributed by Lukas

VN: Hey M, you come up with a name for the architecture yet? M: No, we've been busy. VN: Buddy, it takes all of 5 seconds to come up with a name. M:...

“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky

27 Oct 2025

Contributed by Lukas

(23K words; best considered as nonfiction with a fictional-dialogue frame, not a proper short story.) Prologue: Klurl and Trapaucius were members of ...

“Brightline is Actually Pretty Dangerous” by jefftk

26 Oct 2025

Contributed by Lukas

Per the Atlantic's A 'Death Train' is Haunting South Florida: According to Federal Railroad Administration data, the Brightline has been in...

“Seven-ish Words from My Thought-Language” by Lorxus

26 Oct 2025

Contributed by Lukas

(With thanks to @TsviBT, @Lucie Philippon, and @johnswentworth for encouragement and feedback, among many.) Seven entries from a dictionary that will...

“Origins and dangers of future AI capability denial” by Patrick Spencer

25 Oct 2025

Contributed by Lukas

In rationalist spheres, there's a fairly clear consensus that whatever AI's ultimate impact will be, it is at its core a capable technology that will...

“Beliefs about formal methods and AI safety” by Quinn

25 Oct 2025

Contributed by Lukas

I appreciate Theodore Ehrenborg's comments. As a wee lad, I heard about mathematical certainty of computer programs. Let's go over what I currently ...

“Reminder: Morality is unsolved” by Jesper L.

25 Oct 2025

Contributed by Lukas

Here is a game you can play with yourself, or others: a) You have to decide on a moral framework that can be explained in detail, to anyone. b) It wi...

[Linkpost] “AI Timelines and Points of no return” by Gabriel Alfour

25 Oct 2025

Contributed by Lukas

This is a link post. In this essay, I introduce two Points of No Return (PNR): The Hard PNR. The moment where we have AI systems powerful and intelli...

“Musings on Reported Cost of Compute (Oct 2025)” by Vladimir_Nesov

25 Oct 2025

Contributed by Lukas

There are many ways in which costs of compute get reported. A 1 GW datacenter site costs $10-15bn in the infrastructure (buildings, cooling, power), ...

[Linkpost] “LW Reacts pack for Discord/Slack/etc” by plex

25 Oct 2025

Contributed by Lukas

This is a link post. Ever wanted to say Scout Mindset to someone on a chat platform, but didn't want to have to type 13 characters? Or wanted your ser...

“Regardless of X, you can still just sign superintelligence-statement.org if you agree” by Ishual

24 Oct 2025

Contributed by Lukas

TL;DR: you can still just sign this statement if you agree with it. It still matters, and you can clarify your position in a statement of support (60...

“New Statement Calls For Not Building Superintelligence For Now” by Zvi

24 Oct 2025

Contributed by Lukas

Building superintelligence poses large existential risks. Also known as: If Anyone Builds It, Everyone Dies. Where ‘it’ is superintelligence, and...

“AI #139: The Overreach Machines” by Zvi

24 Oct 2025

Contributed by Lukas

The big release this week was OpenAI giving us a new browser, called Atlas. The idea of Atlas is that it is Chrome, except with ChatGPT integrated t...

“Plan 1 and Plan 2” by Towards_Keeperhood

24 Oct 2025

Contributed by Lukas

Max Tegmark recently published a post "Which side of the AI safety community are you in?", where he carves the AI safety community into 2 camps: Camp...

“How an AI company CEO could quietly take over the world” by Alex Kastner

24 Oct 2025

Contributed by Lukas

Cross-posted from the AI Futures Project Substack. This post outlines a concrete scenario for how takeover by an AI company CEO might go, which I dev...

“The main way I’ve seen people turn ideologically crazy [Linkpost]” by Noosphere89

24 Oct 2025

Contributed by Lukas

This linkpost is in part a response to @Raemon's comment about why the procedure Raemon did doesn't work in practice to deal with the selection effec...

“Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?” by Alek Westover

24 Oct 2025

Contributed by Lukas

There is some concern that training AI systems on content predicting AI misalignment will hyperstition AI systems into misalignment. This has been di...

“Homomorphically encrypted consciousness and its implications” by jessicata

23 Oct 2025

Contributed by Lukas

I present a step-by-step argument in philosophy of mind. The main conclusion is that it is probably possible for conscious homomorphically encrypted ...

“Learning to Interpret Weight Differences in Language Models” by avichal

23 Oct 2025

Contributed by Lukas

Audio note: this article contains 38 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...

“Utopiography Interview” by plex

23 Oct 2025

Contributed by Lukas

It serves people well to mostly build towards a good future rather than getting distracted by the shape of utopia, but having a vision of where we wa...

[Linkpost] “Statement on Superintelligence - FLI Open Letter” by plex

23 Oct 2025

Contributed by Lukas

This is a link post. We call for a prohibition on the development of superintelligence, not lifted before there is broad scientific consensus that it...

“Doomers were right” by Algon

23 Oct 2025

Contributed by Lukas

There's an argument I sometimes hear against existential risks, or any other putative change that some are worried about, that goes something like th...

“Penny’s Hands” by Tomás B.

22 Oct 2025

Contributed by Lukas

It is a strange thing to love another. I had not had much experience. It was Penny, of course, who I fell for. I suppose everyone fell in love with h...

“Which side of the AI safety community are you in?” by Max Tegmark

22 Oct 2025

Contributed by Lukas

In recent years, I’ve found that people who self-identify as members of the AI safety community have increasingly split into two camps: Camp A) "Ra...

“Postrationality: An Oral History” by Gordon Seidoh Worley

22 Oct 2025

Contributed by Lukas

Last week I gave an invited talk as part of the Integral Altruism speaker series. A recording of the talk and the extensive Q&A is up on YouTube;...

[Linkpost] “Consider donating to AI safety champion Scott Wiener” by Eric Neyman

22 Oct 2025

Contributed by Lukas

This is a link post. Written in my personal capacity. Thanks to many people for conversations and comments. Written in less than 24 hours; sorry for a...

“How Well Does RL Scale?” by Toby_Ord

22 Oct 2025

Contributed by Lukas

This is the latest in a series of essays on AI Scaling. You can find the others on my site. Summary: RL-training for LLMs scales surprisingly poorly...

“White House OSTP AI Deregulation Public Comment Period Ends Oct. 27” by Zack_M_Davis

22 Oct 2025

Contributed by Lukas

The White House's Office of Science and Technology Policy has issued a request for information (RFI) relevant to regulations that "unnecessarily hind...

“Is 90% of code at Anthropic being written by AIs?” by ryan_greenblatt

22 Oct 2025

Contributed by Lukas

In March 2025, Dario Amodei (CEO of Anthropic) said that he expects AI to be writing 90% of the code in 3 to 6 months and that AI might be writing es...

“Stratified Utopia” by Cleo Nardo

22 Oct 2025

Contributed by Lukas

Summary: "Stratified utopia" is a possible outcome where mundane values get proximal resources (near Earth in space and time) and exotic values get d...

“Remarks on Bayesian studies from 1963” by dynomight

22 Oct 2025

Contributed by Lukas

In 1963, Mosteller and Wallace published Inference in an Authorship Problem, which used Bayesian statistics to try to infer who wrote some of the dis...

[Linkpost] “21st Century Civilization curriculum” by Richard_Ngo

21 Oct 2025

Contributed by Lukas

This is a link post. I’ve just released a curriculum on foundational questions in modern politics, which I drew up in collaboration with Samo Burja....

“Symbiogenesis vs. Convergent Consequentialism” by Audrey Tang, plex

21 Oct 2025

Contributed by Lukas

(Cross-posted from SayIt archive.) Background for conversation: After an exchange in the comments of Audrey's LW post where plex suggested various re...

“EU explained in 10 minutes” by Martin Sustrik

21 Oct 2025

Contributed by Lukas

If you want to understand a country, you should pick a similar country that you are already familiar with, research the differences between the two a...

“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

20 Oct 2025

Contributed by Lukas

tl;dr: I share a curated set of transcripts of models successfully executing message passing steganography from our recent paper. I then give a few t...

[Linkpost] “How Stuart Buck funded the replication crisis” by Elizabeth

20 Oct 2025

Contributed by Lukas

This is a link post. Stuart Buck has perhaps the largest shapely value of any one individual in creating the replication crisis first in psychology, a...

“Consider donating to Alex Bores, author of the RAISE Act” by Eric Neyman

20 Oct 2025

Contributed by Lukas

Written by Eric Neyman, in my personal capacity. The views expressed here are my own. Thanks to Zach Stein-Perlman, Jesse Richardson, and many others...

“Considerations around career costs of political donations” by GradientDissenter

20 Oct 2025

Contributed by Lukas

I’m close to a single-issue voter/donor. I tend to like politicians who show strong support for AI safety, because I think it's an incredibly import...

“Bubble, Bubble, Toil and Trouble” by Zvi

20 Oct 2025

Contributed by Lukas

We have the classic phenomenon where suddenly everyone decided it is good for your social status to say we are in an ‘AI bubble.’ Are these peop...

“Scenes, cliques and teams - a high level ontology of groups” by Tobes

20 Oct 2025

Contributed by Lukas

Ontological status: Yes, this is ontology Groups of people are one of the most important things. If I were to list all the things and rank them by im...

“Frontier LLM Race/Sex Exchange Rates” by Arjun Panickssery

20 Oct 2025

Contributed by Lukas

This is a cross-post (with permission) of Arctotherium's post yesterday "LLM Exchange Rates, Updated." It uses a similar methodology to the CAIS "Uti...

“Humanity Learned Almost Nothing From COVID-19” by niplav

19 Oct 2025

Contributed by Lukas

Summary: Looking over humanity's response to the COVID-19 pandemic, almost six years later, reveals that we've forgotten to fulfill our intent at pre...

“AI #138 Part 2: Watch Out For Documents” by Zvi

19 Oct 2025

Contributed by Lukas

As usual when things split, Part 1 is mostly about capabilities, and Part 2 is mostly about a mix of policy and alignment. Table of Contents ...

“The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs’ undecoded outputs” by Lovre

19 Oct 2025

Contributed by Lukas

Audio note: this article contains 225 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in th...

“Give Me Your Data: The Rationalist Mind Meld” by Taylor G. Lunt

19 Oct 2025

Contributed by Lukas

I don’t want your rationality. I can supply my own, thank you very much. I want your data. If you spot a logical error in my thinking, then please ...

“Meditation is dangerous” by Algon

18 Oct 2025

Contributed by Lukas

Here's a story I've heard a couple of times. A youngish person is looking for some solutions to their depression, chronic pain, ennui or some other c...

“I’m an EA who benefitted from rationality” by juliawise

17 Oct 2025

Contributed by Lukas

This is my personal take, not an organizational one. Originally written May 2025, revived for the EA Forum's Draft Amnesty Week. Cross-posted from th...

“Finding Features in Neural Networks with the Empirical NTK” by jylin04

17 Oct 2025

Contributed by Lukas

Audio note: this article contains 63 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...

“Reducing risk from scheming by studying trained-in scheming behavior” by ryan_greenblatt

17 Oct 2025

Contributed by Lukas

In a previous post, I discussed mitigating risks from scheming by studying examples of actual scheming AIs.[1] In this post, I'll discuss an alternat...

“Rogue internal deployments via external APIs” by Fabien Roger, Buck

16 Oct 2025

Contributed by Lukas

Once AI companies build powerful AIs, they may: Give internal AIs access to sensitive internal privileges (e.g. access to the internal infra that to...

“Cheap Labour Everywhere” by Morpheus

16 Oct 2025

Contributed by Lukas

I recently visited my girlfriend's parents in India. Here is what that experience taught me: Yudkowsky has this facebook post where he makes some in...

“[draft amnesty] A New Global Risk: Large Comet’s Impact on Sun Could Cause Fires on Earth” by avturchin

16 Oct 2025

Contributed by Lukas

There are several scientific papers that claim that the variability in luminosity of young stars can be explained by the collisions of these stars wi...

“How I Became a 5x Engineer with Claude Code” by Gordon Seidoh Worley

15 Oct 2025

Contributed by Lukas

Claude Code has radically changed what it means for me to be a programmer. It's made me much more productive. I’m able to get work done in hours th...

“That Mad Olympiad” by Tomás B.

15 Oct 2025

Contributed by Lukas

"I heard Chen started distilling the day after he was born. He's only four years old, if you can believe it. He's written 18 novels. His first words ...

“It will cost you nothing to ‘bribe’ a Utilitarian” by Gabriel Alfour

15 Oct 2025

Contributed by Lukas

Audio note: this article contains 41 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the...

“The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work” by Elizabeth

15 Oct 2025

Contributed by Lukas

On some level, calories in calories out has to be true. But these variables are not independent. Bodies respond to exercise by getting hungry and to ...

“Recontextualization Mitigates Specification Gaming Without Modifying the Specification” by vgillioz, TurnTrout, cloud, ariana_azarbal

14 Oct 2025

Contributed by Lukas

Recontextualization distills good behavior into a context which allows bad behavior. More specifically, recontextualization is a modification to RL w...

“The ‘Length’ of ‘Horizons’” by Adam Scholl

14 Oct 2025

Contributed by Lukas

Current AI models are strange. They can speak—often coherently, sometimes even eloquently—which is wild. They can predict the structure of protei...

“Current Language Models Struggle to Reason in Ciphered Language” by Fabien Roger

14 Oct 2025

Contributed by Lukas

tl;dr: We fine-tune or few-shot LLMs to use reasoning encoded with simple ciphers (e.g. base64, rot13, putting a dot between each letter) to solve ma...

“The Mom Test for AI Extinction Scenarios” by Taylor G. Lunt

14 Oct 2025

Contributed by Lukas

(Also posted to my Substack; written as part of the Halfhaven virtual blogging camp.) Let's set aside the question of whether or not superintelligen...

“How AI Manipulates—A Case Study” by Adele Lopez

14 Oct 2025

Contributed by Lukas

If there is only one thing you take away from this article, let it be this: THOU SHALT NOT ALLOW ANOTHER TO MODIFY THINE SELF-IMAGE  This a...

“If Anyone Builds It Everyone Dies, a semi-outsider review” by dvd

14 Oct 2025

Contributed by Lukas

About me and this review: I don’t identify as a member of the rationalist community, and I haven’t thought much about AI risk. I read AstralCodex...

“Making legible that many experts think we are not on track for a good future, barring some international cooperation” by Mateusz Bagiński, Ishual

13 Oct 2025

Contributed by Lukas

[Context: This post is aimed at all readers[1] who broadly agree that the current race toward superintelligence is bad, that stopping would be good, ...

“OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53” by Zvi

13 Oct 2025

Contributed by Lukas

A little over a month ago, I documented how OpenAI had descended into paranoia and bad faith lobbying surrounding California's SB 53. This included ...

“Sublinear Utility in Population and other Uncommon Utilitarianism” by Alice Blair

13 Oct 2025

Contributed by Lukas

Content warning: Anthropics, Moral Philosophy, and Shrimp This post isn't trying to be self contained, since I have so many disparate thoughts about ...

[Linkpost] “Pause House, Blackpool” by Greg C

13 Oct 2025

Contributed by Lukas

This is a link post. Are you passionate about pushing for a global halt to AGI development? An international treaty banning superintelligent AI? Pausi...

“The Narcissistic Spectrum” by Dawn Drescher

13 Oct 2025

Contributed by Lukas

Pathological narcissism is a fortress built against unbearable pain. Some fortresses are sculpted from glass, some hewn from granite. My six-tier spe...

“Don’t Mock Yourself” by Algon

13 Oct 2025

Contributed by Lukas

About half a year ago, I decided to try stop insulting myself for two weeks. No more self-deprecating humour, calling myself a fool, or thinking I'm ...

“Emil the Moose” by Martin Sustrik

11 Oct 2025

Contributed by Lukas

The travels of Emil the Moose since he entered Czechia in mid-June. Moose became extinct in most of Germany around 1000 CE, and in Bohemia, Moravia, A...

“Experiments With Sonnet 4.5 Fiction” by Tomás B.

11 Oct 2025

Contributed by Lukas

Note, this post contains outputs of an LLM. I do not use LLMs in any of my fiction and do not claim this story as my own. I have been having fun wri...

“The Most Common Bad Argument In These Parts” by J Bostock

11 Oct 2025

Contributed by Lukas

I've noticed an antipattern. It's definitely on the dark pareto-frontier of "bad argument" and "I see it all the time amongst smart people". I'm conf...

“Iterated Development and Study of Schemers (IDSS)” by ryan_greenblatt

10 Oct 2025

Contributed by Lukas

In a previous post, we discussed prospects for studying scheming using natural examples. In this post, we'll describe a more detailed proposal for it...

Activity Overview

Episodes

“Re-rolling environment” by Raemon

“LLM-generated text is not testimony” by TsviBT

“Supervillain Monologues Are Unrealistic” by Algon

“Anthropic’s Pilot Sabotage Risk Report” by dmz

“OpenAI Moves To Complete Potentially The Largest Theft In Human History” by Zvi

“Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence” by David Lorell

“Steering Evaluation-Aware Models to Act Like They Are Deployed” by Tim Hua, andrq, Sam Marks, Neel Nanda

[Linkpost] “AISLE discovered three new OpenSSL vulnerabilities” by Jan_Kulveit

“Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals” by Alexa Pan, ryan_greenblatt

“ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents” by Ziqian Zhong

[Linkpost] “Emergent Introspective Awareness in Large Language Models” by Drake Thomas

“An Opinionated Guide to Privacy Despite Authoritarianism” by TurnTrout

“The End of OpenAI’s Nonprofit Era” by garrison

“Please Do Not Sell B30A Chips to China” by Zvi

“AI Craziness Mitigation Efforts” by Zvi

“Some data from LeelaPieceOdds” by Jeremy Gillen

“When Will AI Transform the Economy?” by Andre.Infante

“Workshop on Post-AGI Economics, Culture, and Governance” by Raymond Douglas, Jan_Kulveit, scasper, David Duvenaud

“Introducing the Epoch Capabilities Index (ECI)” by luke_emberson, YafahEdelman, Jsevillamol

“Mottes and Baileys in AI discourse” by Raemon

“LLM robots can’t pass butter (and they are having an existential crisis about it)” by Lukas Petersson

“The Memetics of AI Successionism” by Jan_Kulveit

“All the lab’s AI safety Plans: 2025 edition” by Algon

“life lessons from trading” by thiccythot

“Stability of natural latents in information theoretic terms” by Aram Ebtekar

“AIs should also refuse to work on capabilities research” by Davidmanheim

“FWIW: What I noticed at a (Goenka) Vipassana retreat” by David Gross

“Cancer has a surprising amount of detail” by Abhishaike Mahajan

“Credit goes to the presenter, not the inventor” by Algon

“On Fleshling Safety: A Debate by Klurl and Trapaucius.” by Eliezer Yudkowsky

“Brightline is Actually Pretty Dangerous” by jefftk

“Seven-ish Words from My Thought-Language” by Lorxus

“Origins and dangers of future AI capability denial” by Patrick Spencer

“Beliefs about formal methods and AI safety” by Quinn

“Reminder: Morality is unsolved” by Jesper L.

[Linkpost] “AI Timelines and Points of no return” by Gabriel Alfour

“Musings on Reported Cost of Compute (Oct 2025)” by Vladimir_Nesov

[Linkpost] “LW Reacts pack for Discord/Slack/etc” by plex

“Regardless of X, you can still just sign superintelligence-statement.org if you agree” by Ishual

“New Statement Calls For Not Building Superintelligence For Now” by Zvi

“AI #139: The Overreach Machines” by Zvi

“Plan 1 and Plan 2” by Towards_Keeperhood

“How an AI company CEO could quietly take over the world” by Alex Kastner

“The main way I’ve seen people turn ideologically crazy [Linkpost]” by Noosphere89

“Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?” by Alek Westover

“Homomorphically encrypted consciousness and its implications” by jessicata

“Learning to Interpret Weight Differences in Language Models” by avichal

“Utopiography Interview” by plex

[Linkpost] “Statement on Superintelligence - FLI Open Letter” by plex

“Doomers were right” by Algon

“Penny’s Hands” by Tomás B.

“Which side of the AI safety community are you in?” by Max Tegmark

“Postrationality: An Oral History” by Gordon Seidoh Worley

[Linkpost] “Consider donating to AI safety champion Scott Wiener” by Eric Neyman

“How Well Does RL Scale?” by Toby_Ord

“White House OSTP AI Deregulation Public Comment Period Ends Oct. 27” by Zack_M_Davis

“Is 90% of code at Anthropic being written by AIs?” by ryan_greenblatt

“Stratified Utopia” by Cleo Nardo

“Remarks on Bayesian studies from 1963” by dynomight

[Linkpost] “21st Century Civilization curriculum” by Richard_Ngo

“Symbiogenesis vs. Convergent Consequentialism” by Audrey Tang, plex

“EU explained in 10 minutes” by Martin Sustrik

“Can you find the steganographically hidden message?” by Kei Nishimura-Gasparian

[Linkpost] “How Stuart Buck funded the replication crisis” by Elizabeth

“Consider donating to Alex Bores, author of the RAISE Act” by Eric Neyman

“Considerations around career costs of political donations” by GradientDissenter

“Bubble, Bubble, Toil and Trouble” by Zvi

“Scenes, cliques and teams - a high level ontology of groups” by Tobes

“Frontier LLM Race/Sex Exchange Rates” by Arjun Panickssery

“Humanity Learned Almost Nothing From COVID-19” by niplav

“AI #138 Part 2: Watch Out For Documents” by Zvi

“The Dark Arts of Tokenization or: How I learned to start worrying and love LLMs’ undecoded outputs” by Lovre

“Give Me Your Data: The Rationalist Mind Meld” by Taylor G. Lunt

“Meditation is dangerous” by Algon

“I’m an EA who benefitted from rationality” by juliawise

“Finding Features in Neural Networks with the Empirical NTK” by jylin04

“Reducing risk from scheming by studying trained-in scheming behavior” by ryan_greenblatt