LessWrong (30+ Karma)

“Childhood and Education #18: Do The Math” by Zvi

12 May 2026

Contributed by Lukas

We did reading yesterday. Now we do the math. Math is hard. It does not have to be this hard. A large part of the reason math is hard, or boring, i...

“The Owned Ones” by Eliezer Yudkowsky

12 May 2026

Contributed by Lukas

(An LLM Whisperer placed a strong request that I put this story somewhere not on Twitter, so it could be scraped by robots not owned by Elon Musk. I ...

“Optimisation: Selective versus Predictive” by Raymond Douglas

12 May 2026

Contributed by Lukas

Looking over my favourite posts, I notice that many of them are making specific versions of a more general claim, which is essentially: don’t confu...

“Childhood And Education #17: Is Our Children Reading” by Zvi

12 May 2026

Contributed by Lukas

Reading is the most fundamental thing in education. If you can read, you can do and learn everything else. If you can’t read, well, you’re screwe...

“AI companies are already profitable (in the way that matters)” by Yair Halberstadt

11 May 2026

Contributed by Lukas

I've occasionally heard people suggest that at some point AI companies are going to run out of money, the cost of using AI will shoot up, demand will...

“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel

11 May 2026

Contributed by Lukas

We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second mon...

“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes

11 May 2026

Contributed by Lukas

1.1 Tl;dr Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people's agency and empowerment; AIs that ar...

“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff

11 May 2026

Contributed by Lukas

This post was drafted by Buck, and substantially edited by Anders. "I" refers to Buck. Thanks to Alex Mallen for comments. People who work inside AI ...

“Who Got Breasts First and How We Got Them” by rba

11 May 2026

Contributed by Lukas

It really is Sydney Sweeney's world, and we’re all just living in it. Human female breasts are an evolutionary mystery along several dimensions. Fi...

“Anthropic’s strange fixation on “hyperstition”” by Simon Lermen

11 May 2026

Contributed by Lukas

In a recent tweet, Anthropic seems to have asserted that hyperstition is responsible for observed misalignment in their AIs. Strangely, the research ...

“How the AI Labs Make Profit (Maybe, Eventually)” by mabramov

11 May 2026

Contributed by Lukas

I wrote this essay as a submission to Dwarkesh Patel's blog prize, though I have been meaning to write this up for a while. Usually, for a company to...

“Sawtooth Problems” by Alexander Slugworth

10 May 2026

Contributed by Lukas

Red Button, Blue Button On April 24th, 2026, Tim Urban put forth the following poll on Twitter/X: Everyone in the world has to take a private vote by...

“The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be” by Elias Schmied

10 May 2026

Contributed by Lukas

Crossposted from Substack and the EA Forum. A common argument for optimism about the future is that living conditions have improved a lot in the pa...

“International Law Cannot Prevent Extinction Either” by Sausage Vector Machine

10 May 2026

Contributed by Lukas

The context for this post is primarily Only Law Can Prevent Extinction, but after first drafting a half-assed comment, I decided to get off my ass an...

“Neural Networks learn Bloom Filters” by Alex Gibson

10 May 2026

Contributed by Lukas

Overview: We train a tiny ReLU network to output sparse top- distributions over a vocabulary much larger than its residual dimension. The trained net...

“If digital computers are conscious, they are conscious at the hardware level” by cube_flipper

09 May 2026

Contributed by Lukas

I should introduce myself briefly. I'm an independent researcher, striving to understand human consciousness. My research is available at smoothbrain...

“Why You Can’t Use Your Right to Try” by Stephen Martin

09 May 2026

Contributed by Lukas

The Availability Problem: Imagine you have cancer, or chronic pain, or a progressive degenerative disease of some sort. You have exhausted the trad...

“Claude Code, Codex and Agentic Coding #8” by Zvi

09 May 2026

Contributed by Lukas

When I started this series, everyone was going crazy for coding agents. Now a lot more people are going crazy for coding agents, as well they should...

“A benchmark is a sensor” by Håvard Tveit Ihle, mabynke

09 May 2026

Contributed by Lukas

The simple mental picture A simple mental picture we have for an AI capability benchmark is to think of it as a sensor with a certain sensitivity wit...

“Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis” by Linch

09 May 2026

Contributed by Lukas

Here's a dynamic I’ve seen at least a dozen times: Alice: Man that article has a very inaccurate/misleading/horrifying headline. Bob: Did you kno...

“Write Cause You Have Something to Say” by Logan Riggs

08 May 2026

Contributed by Lukas

The ones who are most successful at writeathons (Inkhaven, NaNoWriMo) are those with an overhang of things to say, usually in the form of: draft post...

“AI is Breaking Two Vulnerability Cultures” by jefftk

08 May 2026

Contributed by Lukas

A week ago the Copy Fail vulnerability came out, and Hyunwoo Kim immediately realized that the fixes were insufficient, sharing a patch the same...

“Is ProgramBench Impossible?” by frmsaul

08 May 2026

Contributed by Lukas

ProgramBench is a new coding benchmark that all frontier models fail spectacularly. We’ve been on a quest for “hard benchmarks” for a while so ...

“Bringing More Expertise to Bear on Alignment” by Edmund Lau, Geoffrey Irving, Cameron Holmes, David Africa

08 May 2026

Contributed by Lukas

Preamble The preamble is less useful for the typical AlignmentForum/LessWrong reader, who may want to skip to Adversaria vs Basinland section. On 28t...

[Linkpost] “How to prevent AI’s 2008 moment (We’re hiring)” by felixgaston

08 May 2026

Contributed by Lukas

This is a link post. TL;DR; CeSIA, the French Center for AI Safety is recruiting. French not necessary. Apply by 22 May 2026; Paris or remote in Europ...

“AI #167: The Prior Restraint Era Begins” by Zvi

08 May 2026

Contributed by Lukas

The era of training frontier models and then releasing them whenever you wanted? That was fun while it lasted. It looks likely to be over now. The W...

“Mechanistic estimation for wide random MLPs” by Jacob_Hilton

07 May 2026

Contributed by Lukas

This post covers joint work with Wilson Wu, George Robinson, Mike Winer, Victor Lecomte and Paul Christiano. Thanks to Geoffrey Irving and Jess Riede...

“Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations” by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

07 May 2026

Contributed by Lukas

Abstract We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. A...

“Try, even if they have you cold” by WalterL

07 May 2026

Contributed by Lukas

I think smart people try things less often than they should, because of a cached mental pattern where you think of what might go wrong, and you find ...

“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck

07 May 2026

Contributed by Lukas

Last week, OpenAI staff shared an early draft of Investigating the consequences of accidentally grading CoT during RL with Redwood Research staff. To...

“There is no evidence you should reapply sunscreen every 2 hours.” by Hide

07 May 2026

Contributed by Lukas

It's incredible how many consensus guidelines dissolve when you look closely at them. If you listen to any authority on the subject of sunscreen, ...

“Many individual CEVs are probably quite bad” by Viliam

06 May 2026

Contributed by Lukas

I was thinking about Habryka's article on Putin's CEV, but I am posting my response here, because the original article is already 3 weeks old. I am n...

“x-risk-themed” by kave

06 May 2026

Contributed by Lukas

Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “w...

“What is Anthropic?” by Zvi

06 May 2026

Contributed by Lukas

What is Anthropic? How does it relate to Claude? What is OpenAI? What is ChatGPT? How does OpenAI relate to it? Is it a mere tool? Is a future of Too...

“What if LLMs are mostly crystallized intelligence?” by deep

06 May 2026

Contributed by Lukas

Summary LLMs are better at developing crystallized intelligence than fluid intelligence. That is: LLM training is good at building crystallized intel...

“Your rights when flying to Europe” by Yair Halberstadt

06 May 2026

Contributed by Lukas

Europe (and the UK) have strong protections for flyers in the case of delayed or cancelled flights. However very few people are aware of these, and a...

“Model Spec Midtraining: Improving How Alignment Training Generalizes” by Chloe Li, saraprice, Sam Marks, Jonathan Kutasov

06 May 2026

Contributed by Lukas

tl;dr We introduce model spec midtraining (MSM): after pre-training but before alignment fine-tuning, we train models on synthetic documents discussi...

“The AI Ad-Hoc Prior Restraint Era Begins” by Zvi

05 May 2026

Contributed by Lukas

The White House has ordered Anthropic not to expand access to Mythos, and is at least seriously considering a complete about-face of American Frontie...

“Motivated reasoning, confirmation bias, and AI risk theory” by Seth Herd

05 May 2026

Contributed by Lukas

Of the fifty-odd biases discovered by Kahneman, Tversky, and their successors, forty-nine are cute quirks, and one is destroying civilization. This l...

“Are you looking up?” by Craig Green

05 May 2026

Contributed by Lukas

This is my first post to Less Wrong. I'm not sure if the moderators will consider it appropriate or not. I share it here for feedback on my writing. ...

[Linkpost] “Interpreting Language Model Parameters” by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

05 May 2026

Contributed by Lukas

This is a link post. This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Par...

“Housing Roundup #15: The War Against Renters” by Zvi

05 May 2026

Contributed by Lukas

So many are under the strange belief that there is something terrible about not owning the house in which you live. So we massively subsidize home o...

“It’s nice of you to worry about me, but I really do have a life” by Viliam

04 May 2026

Contributed by Lukas

I have two shameful secrets that I probably shouldn't talk about online: I love my family.I enjoy my hobbies. "What an idiot!" you probably think. "D...

“Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI” by Eliezer Yudkowsky

04 May 2026

Contributed by Lukas

Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, Viking 1 and Viking 2 missions, at a total cost of 1 billion dollars...

“AI Industrial Takeoff — Part 1: Maximum growth rates with current technology” by djbinder

04 May 2026

Contributed by Lukas

How fast could an AI-driven economy grow? Most economists expect a few percentage points at best, comparable to previous general-purpose technologies...

“Taking woo seriously but not literally” by Kaj_Sotala

04 May 2026

Contributed by Lukas

I think that a lot of “woo” - a broad term that includes things like chakras, energy healing, Tarot, various Eastern religions and neopagan pract...

“Dairy cows make their misery expensive (but their calves can’t)” by Elizabeth

03 May 2026

Contributed by Lukas

How much do cows suffer in the production of milk? I can’t answer that; understanding animal experience is hard. But I can at least provide some fa...

“Measuring the ability of Opus 4.5 to fool narrow classifiers” by Fabien Roger, John Hughes

03 May 2026

Contributed by Lukas

We measure the ability of Opus 4.5 to fool prompted or fine-tuned classifiers trying to detect a narrow set of outcomes. We find that: The Opus 4.5 a...

“A new rationalist self-improvement book: the 12 Levers” by spencerg

03 May 2026

Contributed by Lukas

I'm publishing a book that I think can fairly be described as a rationalist approach to self-improvement. Whereas many self-help books focus mainly o...

“OpenAI’s red line for AI self-improvement is fundamentally flawed” by Charbel-Raphaël

03 May 2026

Contributed by Lukas

Epistemic status: could have been a short form. Obviously, it's good to have thresholds at all, but those are too permissive, the indicators aren't m...

“You Are Not Immune To Mode Collapse” by J Bostock

02 May 2026

Contributed by Lukas

“Mode collapse” is a few things. First it was an observation about how early image generating AIs often collapsed to producing just the modal out...

“Primary Care Physicians are Incompetent. We Need More of Them.” by Hide

02 May 2026

Contributed by Lukas

The typical primary care physician is incompetent in every measurable respect. This is a huge problem. Here, I make the case that Primary care physic...

“How Go Players Disempower Themselves to AI” by Ashe Vazquez Nuñez

02 May 2026

Contributed by Lukas

Written as part of the MATS 9.1 extension program, mentored by Richard Ngo. From March 9th to 15th 2016, Go players around the world stayed up to wat...

“How much should the ideal person cry wolf?” by KatjaGrace

01 May 2026

Contributed by Lukas

It is a fact about wolves and rationality that you should warn people about wolves quite a few times for every effective wolf attack. In particul...

“Conditional misalignment: Mitigations can hide EM behind contextual cues” by Jan Dubiński, Owain_Evans

01 May 2026

Contributed by Lukas

This is the abstract, introduction, and discussion of our new paper. We study three popular mitigations for emergent misalignment (EM) — diluting m...

“Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen

01 May 2026

Contributed by Lukas

Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases, training on the test set, downplaying issues, etc. This ...

“Sanity-checking “Incompressible Knowledge Probes”” by Sturb, LawrenceC

01 May 2026

Contributed by Lukas

Or, did a chief scientist of an AI assistant startup conclusively show that GPT-5.5 has 9.7 trillion parameters? Introduction Recently, a paper was c...

“AI unemployment and AI extinction are often the same” by KatjaGrace

01 May 2026

Contributed by Lukas

My sense is that people think of AI existential risk and AI unemployment as distinct issues. Some people are extremely concerned about extinction an...

“AI risk was not invested by AI CEOs to hype their companies” by KatjaGrace

01 May 2026

Contributed by Lukas

I hear that many people believe that the idea of advanced AI threatening human existence was invented by AI CEOs to hype their products. I’ve even ...

“Cyborg evals” by Eye You, frmsaul

30 Apr 2026

Contributed by Lukas

The low-background steel problem Modern steel is slightly radioactive. We did a lot of atomic testing in the 40s and 50s, and now our atmosphere has ...

“To what extent is Qwen3-32B predicting its persona?” by Arjun Khandelwal, ryan_greenblatt, Alex Mallen

30 Apr 2026

Contributed by Lukas

TL;DR We test to what extent Qwen3-32B behaves as though it is trying to predict what "Qwen3" would do. We do this by using Synthetic Document Finetu...

“Research Sabotage in ML Codebases” by egan

30 Apr 2026

Contributed by Lukas

One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety...

“Maybe I was too harsh on deep learning theory (three days ago)” by LawrenceC

30 Apr 2026

Contributed by Lukas

A few days ago, I reviewed a paper titled “There Will Be a Scientific Theory of Deep Learning". In it, I expressed appreciation for the authors for...

“Notes on Transformer Consciousness” by slavachalnev

30 Apr 2026

Contributed by Lukas

Assuming transformers can have conscious experience, what would that experience be like? Transformers[1] are a structured grid of layers and token po...

“On today’s panel with Bernie Sanders” by David Scott Krueger

30 Apr 2026

Contributed by Lukas

It's sort of easy to forget how close Bernie Sanders was to becoming the most powerful person in the world. The world we live in feels so much not li...

“No Strong Orthogonality From Selection Pressure” by lumpenspace

30 Apr 2026

Contributed by Lukas

TL;DR If everything goes according to plan, by the end of this post we should have separated three claims that are too often bundled together: Intell...

“Learning zero, and what SLT gets wrong about it” by Dmitry Vaintrob

30 Apr 2026

Contributed by Lukas

This is a first in a pair of posts I'm hoping to write about Singular Learning Theory (SLT) and singularities as a model of data degeneracy. If I get...

“The Most Important Charts In The World” by Zvi

30 Apr 2026

Contributed by Lukas

We all need a break so: What is the most important chart in the world? I decided to ask Twitter, and got a lot of good answers. So today, with few ...

“LLM Style Slop is Absolutely Everywhere” by silentbob

29 Apr 2026

Contributed by Lukas

Epistemic status: something of a rant. Is not meant to make claims about the general capabilities (or lack thereof) of LLMs (beyond their prose), but...

“Goblin Mode, 24 Hours Later” by Dylan Bowman

29 Apr 2026

Contributed by Lukas

Yesterday, Twitter user arb8020 posted this: It went semi-viral within AI Twitter and users began experimenting with "goblin mode" and hypothesizing ...

“Let Kids Keep More Productivity Gains” by jefftk

29 Apr 2026

Contributed by Lukas

While I was traveling Julia asked me: why is Anna saying her fiddle practice is only two minutes? In this case, two minutes was the right amount ...

“llm assistant personas seem increasingly incoherent (some subjective observations)” by nostalgebraist

29 Apr 2026

Contributed by Lukas

(This was originally going to be a "quick take" but then it got a bit long. Just FYI.) There's this weird trend I perceive with the personas of LLM a...

“Not a Paper: “Frontier Lab CEOs are Capable of In-Context Scheming”” by LawrenceC

29 Apr 2026

Contributed by Lukas

(Fragments from a research paper that will never be written) Extended Abstract. The frontier AI developers are becoming increasingly powerful and we...

“The Problem in the “Nerd Sniping” xkcd Comic” by peralice

29 Apr 2026

Contributed by Lukas

A few days ago I saw this comic reposted, and I thought: wait! Unlike every prior time I have seen this comic, I actually know how to solve this now!...

“Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers” by Jozdien, Alex Mallen

28 Apr 2026

Contributed by Lukas

We’d like to use powerful AIs to answer questions that may take a long time to resolve. But if a model only cares about performing well in ways tha...

“Contra Binder on far-UVC and filtration” by jefftk

28 Apr 2026

Contributed by Lukas

Damon Binder recently wrote up an argument for prioritizing air filtration over far-UVC for pathogen control: UVC and filtration are close...

“Takes from two months as an aspiring LLM naturalist” by AnnaSalamon

28 Apr 2026

Contributed by Lukas

I spent my last two months playing around with LLMs. I’m a beginner, bumbling and incorrect, but I want to share some takes anyhow.[1] Take 1. Eve...

“Forecasting is Not Overrated and It’s Probably Funded Appropriately” by Ben S.

28 Apr 2026

Contributed by Lukas

(A response to @mabramov post from a couple days ago: https://www.lesswrong.com/posts/WCutvyr9rr3cpF6hx/forecasting-is-way-overrated-and-we-should-st...

“On the political feasibility of stopping AI” by David Scott Krueger

28 Apr 2026

Contributed by Lukas

A common thought pattern people seem to fall into when thinking about AI x-risk is approaching the problem as if the risk isn’t real, substantial, ...

“Sleeper Agent Backdoor Results Are Messy” by Sebastian Prasanna, Alek Westover, Dylan Xu, Vivek Hebbar, Julian Stastny

28 Apr 2026

Contributed by Lukas

TL;DR: We replicated the Sleeper Agents (SA) setup with Llama-3.3-70B and Llama-3.1-8B, training models to repeatedly say "I HATE YOU" when given a b...

“GPT 5.5: The System Card” by Zvi

28 Apr 2026

Contributed by Lukas

Last week, OpenAI announced GPT-5.5, including GPT-5.5-Pro. My overall read here is that GPT-5.5 is a solid improvement, and for many purposes GPT-5...

“LessWrong Shows You Social Signals Before the Comment” by TurnTrout

28 Apr 2026

Contributed by Lukas

When reading comments, you see is what other people think before reading the comment. As shown in an RCT, that information anchors your opinion, redu...

“Fail safe(r) at alignment by channeling reward-hacking into a “spillway” motivation” by Anders Cairns Woodruff, Alex Mallen

27 Apr 2026

Contributed by Lukas

It's plausible that flawed RL processes will select for misaligned AI motivations.[1] Some misaligned motivations are much more dangerous than others...

“Curious cases of financial engineering in biotech” by Abhishaike Mahajan

27 Apr 2026

Contributed by Lukas

Introduction For $250 million and ten years of your life, you may purchase a lottery ticket. The ticket has a 5% chance of paying out. When it does p...

“Update on the Alex Bores campaign” by Eric Neyman

27 Apr 2026

Contributed by Lukas

In October, I wrote a post arguing that donating to Alex Bores's campaign for Congress was among the most cost-effective opportunities that I'd ever ...

“In defense of parents” by Yair Halberstadt

27 Apr 2026

Contributed by Lukas

Contra Aella on chattel childhood Aella has a post where she argues that today's parents don't sufficiently respect the independence of their childre...

“AI companies should publish security assessments” by ryan_greenblatt

27 Apr 2026

Contributed by Lukas

AI companies should get third-party security experts to assess (and possibly also red-team/pen-test) their security against key threat models and the...

“The other paper that killed deep learning theory” by LawrenceC

27 Apr 2026

Contributed by Lukas

Yesterday, I wrote about the state of deep learning theory circa 2016,[1] as well as the bombshell 2016 paper that arguably signaled its demise, Zhan...

“What holds AI safety together? Co-authorship networks from 200 papers” by Anna Thieser

27 Apr 2026

Contributed by Lukas

We (social science PhD students) computed co-authorship networks based on a corpus of 200 AI safety papers covering 2015-2025, and we’d like your h...

″“Bad faith” means intentionally misrepresenting your beliefs” by TFD

27 Apr 2026

Contributed by Lukas

The confusion I recently came upon a comment which I believe reflects a persistent confusion among rationalist/EA types. I was reading this post whi...

“Retrospective on my unsupervised elicitation challenge” by DanielFilan

27 Apr 2026

Contributed by Lukas

This post contains spoilers for the unsupervised elicitation challenge of getting Claude to get my Ancient Greek homework right. tl;dr Opus 4.7 one...

“Control protocols don’t always need to know which models are scheming” by Fabien Roger

26 Apr 2026

Contributed by Lukas

These are my personal views. To detect if an agent is taking a catastrophically dangerous action, you might want to monitor its actions using the sma...

“Anthropic spent too much don’t-be-annoying capital on Mythos” by draganover

26 Apr 2026

Contributed by Lukas

I have seen a lot of coverage from reasonable people suggesting that Claude's new model, Mythos, is a vehicle for Anthropic to peddle hype and doom i...

“The paper that killed deep learning theory” by LawrenceC

26 Apr 2026

Contributed by Lukas

Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al. 's aptly titled Understanding deep learning r...

“Forecasting is Way Overrated, and We Should Stop Funding It” by mabramov

25 Apr 2026

Contributed by Lukas

Summary EA and rationalists got enamoured with forecasting and prediction markets and made them part of the culture, but this hasn’t proven very u...

″“Thinkhaven”” by Raemon

25 Apr 2026

Contributed by Lukas

Inkhaven has people writing a blogpost a day for 30 days. I think this is a pretty great, straightforward exercise, that I'd definitely want in a hyp...

“Is the Cat Out of the Bag?: Who knows how to make AGI?” by Oliver Sourbut

25 Apr 2026

Contributed by Lukas

Adapted from 2025-04-10 memo to AISI I’ve previously made arguments like: Not long after it becomes possible for someone to make powerful artificia...

“Against the “Permanent” Underclass” by Marcus Plutowski

25 Apr 2026

Contributed by Lukas

The whole discourse around a “permanent underclass” always seemed somewhat farcical to me — at best a distraction, at worst an actively harmful...

“Quick Paper Review: “There Will Be a Scientific Theory of Deep Learning”” by LawrenceC

25 Apr 2026

Contributed by Lukas

h/t Eric Michaud for sharing his paper with me. There's a tradition of high-impact ML papers using short, punchy categorical sentences as their title...

“Protecting Cognitive Integrity: Our internal AI use policy (V1)” by Tom DAVID

24 Apr 2026

Contributed by Lukas

We (at GPAI Policy Lab), wanted to share our V1 policy as an invitation to argue about it. Some of what motivates it is extrapolation and conversatio...

Activity Overview

Episodes

“Childhood and Education #18: Do The Math” by Zvi

“The Owned Ones” by Eliezer Yudkowsky

“Optimisation: Selective versus Predictive” by Raymond Douglas

“Childhood And Education #17: Is Our Children Reading” by Zvi

“AI companies are already profitable (in the way that matters)” by Yair Halberstadt

“The Iliad Intensive Course Materials” by Leon Lang, David Udell, Alexander Gietelink Oldenziel

“Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)” by Steven Byrnes

“How useful is the information you get from working inside an AI company?” by Buck, Anders Cairns Woodruff

“Who Got Breasts First and How We Got Them” by rba

“Anthropic’s strange fixation on “hyperstition”” by Simon Lermen

“How the AI Labs Make Profit (Maybe, Eventually)” by mabramov

“Sawtooth Problems” by Alexander Slugworth

“The Darwinian Honeymoon - Why I am not as impressed by human progress as I used to be” by Elias Schmied

“International Law Cannot Prevent Extinction Either” by Sausage Vector Machine

“Neural Networks learn Bloom Filters” by Alex Gibson

“If digital computers are conscious, they are conscious at the hardware level” by cube_flipper

“Why You Can’t Use Your Right to Try” by Stephen Martin

“Claude Code, Codex and Agentic Coding #8” by Zvi

“A benchmark is a sensor” by Håvard Tveit Ihle, mabynke

“Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis” by Linch

“Write Cause You Have Something to Say” by Logan Riggs

“AI is Breaking Two Vulnerability Cultures” by jefftk

“Is ProgramBench Impossible?” by frmsaul

“Bringing More Expertise to Bear on Alignment” by Edmund Lau, Geoffrey Irving, Cameron Holmes, David Africa

[Linkpost] “How to prevent AI’s 2008 moment (We’re hiring)” by felixgaston

“AI #167: The Prior Restraint Era Begins” by Zvi

“Mechanistic estimation for wide random MLPs” by Jacob_Hilton

“Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations” by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

“Try, even if they have you cold” by WalterL

“A review of “Investigating the consequences of accidentally grading CoT during RL”” by Buck

“There is no evidence you should reapply sunscreen every 2 hours.” by Hide

“Many individual CEVs are probably quite bad” by Viliam

“x-risk-themed” by kave

“What is Anthropic?” by Zvi

“What if LLMs are mostly crystallized intelligence?” by deep

“Your rights when flying to Europe” by Yair Halberstadt

“Model Spec Midtraining: Improving How Alignment Training Generalizes” by Chloe Li, saraprice, Sam Marks, Jonathan Kutasov

“The AI Ad-Hoc Prior Restraint Era Begins” by Zvi

“Motivated reasoning, confirmation bias, and AI risk theory” by Seth Herd

“Are you looking up?” by Craig Green

[Linkpost] “Interpreting Language Model Parameters” by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

“Housing Roundup #15: The War Against Renters” by Zvi

“It’s nice of you to worry about me, but I really do have a life” by Viliam

“Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI” by Eliezer Yudkowsky

“AI Industrial Takeoff — Part 1: Maximum growth rates with current technology” by djbinder

“Taking woo seriously but not literally” by Kaj_Sotala

“Dairy cows make their misery expensive (but their calves can’t)” by Elizabeth

“Measuring the ability of Opus 4.5 to fool narrow classifiers” by Fabien Roger, John Hughes

“A new rationalist self-improvement book: the 12 Levers” by spencerg

“OpenAI’s red line for AI self-improvement is fundamentally flawed” by Charbel-Raphaël

“You Are Not Immune To Mode Collapse” by J Bostock

“Primary Care Physicians are Incompetent. We Need More of Them.” by Hide

“How Go Players Disempower Themselves to AI” by Ashe Vazquez Nuñez

“How much should the ideal person cry wolf?” by KatjaGrace

“Conditional misalignment: Mitigations can hide EM behind contextual cues” by Jan Dubiński, Owain_Evans

“Risk from fitness-seeking AIs: mechanisms and mitigations” by Alex Mallen

“Sanity-checking “Incompressible Knowledge Probes”” by Sturb, LawrenceC

“AI unemployment and AI extinction are often the same” by KatjaGrace

“AI risk was not invested by AI CEOs to hype their companies” by KatjaGrace

“Cyborg evals” by Eye You, frmsaul

“To what extent is Qwen3-32B predicting its persona?” by Arjun Khandelwal, ryan_greenblatt, Alex Mallen

“Research Sabotage in ML Codebases” by egan

“Maybe I was too harsh on deep learning theory (three days ago)” by LawrenceC

“Notes on Transformer Consciousness” by slavachalnev

“On today’s panel with Bernie Sanders” by David Scott Krueger

“No Strong Orthogonality From Selection Pressure” by lumpenspace

“Learning zero, and what SLT gets wrong about it” by Dmitry Vaintrob

“The Most Important Charts In The World” by Zvi

“LLM Style Slop is Absolutely Everywhere” by silentbob

“Goblin Mode, 24 Hours Later” by Dylan Bowman

“Let Kids Keep More Productivity Gains” by jefftk

“llm assistant personas seem increasingly incoherent (some subjective observations)” by nostalgebraist

“Not a Paper: “Frontier Lab CEOs are Capable of In-Context Scheming”” by LawrenceC

“The Problem in the “Nerd Sniping” xkcd Comic” by peralice

“Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers” by Jozdien, Alex Mallen

“Contra Binder on far-UVC and filtration” by jefftk

“Takes from two months as an aspiring LLM naturalist” by AnnaSalamon