LessWrong (Curated & Popular)

“The best simple argument for Pausing AI?” by Gary Marcus

03 Jul 2025

Contributed by Lukas

Not saying we should pause AI, but consider the following argument: Alignment without the capacity to follow rules is hopeless. You can’t possibly...

“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes

01 Jul 2025

Contributed by Lukas

2.1 Summary & Table of contents This is the second of a two-post series on foom (previous post) and doom (this post). The last post talked about h...

“Proposal for making credible commitments to AIs.” by Cleo Nardo

30 Jun 2025

Contributed by Lukas

Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil. There has been growing interest in the deal-making agenda: humans make dea...

“X explains Z% of the variance in Y” by Leon Lang

28 Jun 2025

Contributed by Lukas

Audio note: this article contains 218 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text ...

“A case for courage, when speaking of AI danger” by So8res

27 Jun 2025

Contributed by Lukas

I think more people should say what they actually believe about AI dangers, loudly and often. Even if you work in AI policy. I’ve been beating this...

“My pitch for the AI Village” by Daniel Kokotajlo

25 Jun 2025

Contributed by Lukas

I think the AI Village should be funded much more than it currently is; I’d wildly guess that the AI safety ecosystem should be funding it to the tu...

“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes

24 Jun 2025

Contributed by Lukas

1.1 Series summary and Table of Contents This is a two-post series on AI “foom” (this post) and “doom” (next post). A decade or two ago, it wa...

“Futarchy’s fundamental flaw” by dynomight

21 Jun 2025

Contributed by Lukas

Say you’re Robyn Denholm, chair of Tesla's board. And say you’re thinking about firing Elon Musk. One way to make up your mind would be to h...

“Do Not Tile the Lightcone with Your Confused Ontology” by Jan_Kulveit

19 Jun 2025

Contributed by Lukas

Or: How anthropomorphic assumptions about AI identity might create confusion and suffering at scale If you are reading this and you are a human, you ...

“Endometriosis is an incredibly interesting disease” by Abhishaike Mahajan

19 Jun 2025

Contributed by Lukas

Introduction There are several diseases that are canonically recognized as ‘interesting’, even by laymen. Whether that is in their mechanism of ...

“Estrogen: A trip report” by cube_flipper

19 Jun 2025

Contributed by Lukas

I'd like to say thanks to Anna Magpie – who offers literature review as a service – for her help reviewing the section on neuroendocrinology...

“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo

18 Jun 2025

Contributed by Lukas

Nate and Eliezer's forthcoming book has been getting a remarkably strong reception. I was under the impression that there are many people who fi...

[Linkpost] “the void” by nostalgebraist

17 Jun 2025

Contributed by Lukas

This is a link post. A very long essay about LLMs, the nature and history of the the HHH assistant persona, and the implications for alignment. Multi...

“Mech interp is not pre-paradigmatic” by Lee Sharkey

17 Jun 2025

Contributed by Lukas

This is a blogpost version of a talk I gave earlier this year at GDM. Epistemic status: Vague and handwavy. Nuance is often missing. Some of the cl...

“Distillation Robustifies Unlearning” by Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud, TurnTrout

17 Jun 2025

Contributed by Lukas

Current “unlearning” methods only suppress capabilities instead of truly unlearning the capabilities. But if you distill an unlearned model into ...

“Intelligence Is Not Magic, But Your Threshold For ‘Magic’ Is Pretty Low” by Expertium

17 Jun 2025

Contributed by Lukas

A while ago I saw a person in the comments on comments to Scott Alexander's blog arguing that a superintelligent AI would not be able to do anyt...

“A Straightforward Explanation of the Good Regulator Theorem” by Alfred Harwood

17 Jun 2025

Contributed by Lukas

Audio note: this article contains 329 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text ...

“Beware General Claims about ‘Generalizable Reasoning Capabilities’ (of Modern AI Systems)” by LawrenceC

17 Jun 2025

Contributed by Lukas

1. Late last week, researchers at Apple released a paper provocatively titled “The Illusion of Thinking: Understanding the Strengths and Limitations...

“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky

07 Jun 2025

Contributed by Lukas

Four agents woke up with four computers, a view of the world wide web, and a shared chat room full of humans. Like Claude plays Pokemon, you can watc...

“The Best Reference Works for Every Subject” by Parker Conley

06 Jun 2025

Contributed by Lukas

Introduction The Best Textbooks on Every Subject is the Schelling point for the best textbooks on every subject. My The Best Tacit Knowledge Videos o...

“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk

05 Jun 2025

Contributed by Lukas

Has someone you know ever had a “breakthrough” from coaching, meditation, or psychedelics — only to later have it fade? Show tweet For example...

“The Value Proposition of Romantic Relationships” by johnswentworth

04 Jun 2025

Contributed by Lukas

What's the main value proposition of romantic relationships? Now, look, I know that when people drop that kind of question, they’re often abou...

“It’s hard to make scheming evals look realistic” by Igor Ivanov, dan_moken

02 Jun 2025

Contributed by Lukas

Abstract Claude 3.7 Sonnet easily detects when it's being evaluated for scheming. Surface‑level edits to evaluation scenarios, such as lengthe...

[Linkpost] “Social Anxiety Isn’t About Being Liked” by Chipmonk

01 Jun 2025

Contributed by Lukas

This is a link post. There's this popular idea that socially anxious folks are just dying to be liked. It seems logical, right? Why else would so...

“Truth or Dare” by Duncan Sabien (Inactive)

31 May 2025

Contributed by Lukas

Author's note: This is my apparently-annual "I'll put a post on LessWrong in honor of LessOnline" post. These days, my writing g...

“Meditations on Doge” by Martin Sustrik

30 May 2025

Contributed by Lukas

Lessons from shutting down institutions in Eastern Europe. This is a cross post from: https://250bpm.substack.com/p/meditations-on-doge Imagine l...

[Linkpost] “If you’re not sure how to sort a list or grid—seriate it!” by gwern

28 May 2025

Contributed by Lukas

This is a link post. "Getting Things in Order: An Introduction to the R Package seriation": Seriation [or "ordination"), i.e., fin...

“What We Learned from Briefing 70+ Lawmakers on the Threat from AI” by leticiagarcia

28 May 2025

Contributed by Lukas

Between late 2024 and mid-May 2025, I briefed over 70 cross-party UK parliamentarians. Just over one-third were MPs, a similar share were members of ...

“Winning the power to lose” by KatjaGrace

23 May 2025

Contributed by Lukas

Have the Accelerationists won? Last November Kevin Roose announced that those in favor of going fast on AI had now won against those favoring caution...

[Linkpost] “Gemini Diffusion: watch this space” by Yair Halberstadt

22 May 2025

Contributed by Lukas

This is a link post. Google Deepmind has announced Gemini Diffusion. Though buried under a host of other IO announcements it's possible that this...

“AI Doomerism in 1879” by David Gross

21 May 2025

Contributed by Lukas

I’m reading George Eliot's Impressions of Theophrastus Such (1879)—so far a snoozer compared to her novels. But chapter 17 surprised me for ...

“Consider not donating under $100 to political candidates” by DanielFilan

16 May 2025

Contributed by Lukas

Epistemic status: thing people have told me that seems right. Also primarily relevant to US audiences. Also I am speaking in my personal capacity and...

“It’s Okay to Feel Bad for a Bit” by moridinamael

16 May 2025

Contributed by Lukas

"If you kiss your child, or your wife, say that you only kiss things which are human, and thus you will not be disturbed if either of them dies....

“Explaining British Naval Dominance During the Age of Sail” by Arjun Panickssery

15 May 2025

Contributed by Lukas

The other day I discussed how high monitoring costs can explain the emergence of “aristocratic” systems of governance: Aristocracy and Hostage Ca...

“Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies” by So8res

14 May 2025

Contributed by Lukas

Eliezer and I wrote a book. It's titled If Anyone Builds It, Everyone Dies. Unlike a lot of other writing either of us have done, it's bein...

“Too Soon” by Gordon Seidoh Worley

14 May 2025

Contributed by Lukas

It was a cold and cloudy San Francisco Sunday. My wife and I were having lunch with friends at a Korean cafe. My phone buzzed with a text. It said my...

“PSA: The LessWrong Feedback Service” by JustisMills

13 May 2025

Contributed by Lukas

At the bottom of the LessWrong post editor, if you have at least 100 global karma, you may have noticed this button.The button Many people click the ...

“Orienting Toward Wizard Power” by johnswentworth

08 May 2025

Contributed by Lukas

For months, I had the feeling: something is wrong. Some core part of myself had gone missing. I had words and ideas cached, which pointed back to the...

“Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda

05 May 2025

Contributed by Lukas

(Disclaimer: Post written in a personal capacity. These are personal hot takes and do not in any way represent my employer's views.) TL;DR: I do...

“Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall” by Vladimir_Nesov

03 May 2025

Contributed by Lukas

It'll take until ~2050 to repeat the level of scaling that pretraining compute is experiencing this decade, as increasing funding can't sus...

“Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis” by jeanne_, eeeee

01 May 2025

Contributed by Lukas

In this blog post, we analyse how the recent AI 2027 forecast by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean has be...

[Linkpost] “Jaan Tallinn’s 2024 Philanthropy Overview” by jaan

25 Apr 2025

Contributed by Lukas

This is a link post. to follow up my philantropic pledge from 2020, i've updated my philanthropy page with the 2024 results. in 2024 my donations...

“Impact, agency, and taste” by benkuhn

24 Apr 2025

Contributed by Lukas

I’ve been thinking recently about what sets apart the people who’ve done the best work at Anthropic. You might think that the main thing that mak...

[Linkpost] “To Understand History, Keep Former Population Distributions In Mind” by Arjun Panickssery

24 Apr 2025

Contributed by Lukas

This is a link post. Guillaume Blanc has a piece in Works in Progress (I assume based on his paper) about how France's fertility declined earlier...

“AI-enabled coups: a small group could use AI to seize power” by Tom Davidson, Lukas Finnveden, rosehadshar

23 Apr 2025

Contributed by Lukas

We’ve written a new report on the threat of AI-enabled coups. I think this is a very serious risk – comparable in importance to AI takeover but ...

“Accountability Sinks” by Martin Sustrik

23 Apr 2025

Contributed by Lukas

Back in the 1990s, ground squirrels were briefly fashionable pets, but their popularity came to an abrupt end after an incident at Schiphol Airport o...

“Training AGI in Secret would be Unsafe and Unethical” by Daniel Kokotajlo

21 Apr 2025

Contributed by Lukas

Subtitle: Bad for loss of control risks, bad for concentration of power risks I’ve had this sitting in my drafts for the last year. I wish I’d be...

“Why Should I Assume CCP AGI is Worse Than USG AGI?” by Tomás B.

20 Apr 2025

Contributed by Lukas

Though, given my doomerism, I think the natsec framing of the AGI race is likely wrongheaded, let me accept the Dario/Leopold/Altman frame that AGI w...

“Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI” by Kaj_Sotala

17 Apr 2025

Contributed by Lukas

Introduction Writing this post puts me in a weird epistemic position. I simultaneously believe that: The reasoning failures that I'll discuss ar...

“Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study” by Adam Karvonen

16 Apr 2025

Contributed by Lukas

Dario Amodei, CEO of Anthropic, recently worried about a world where only 30% of jobs become automated, leading to class tensions between the automat...

“Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)” by Neel Nanda, lewis smith, Senthooran Rajamanoharan, Arthur Conmy, Callum McDougall, Tom Lieberum, János Kramár, Rohin Shah

12 Apr 2025

Contributed by Lukas

Audio note: this article contains 31 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text i...

[Linkpost] “Playing in the Creek” by Hastings

11 Apr 2025

Contributed by Lukas

This is a link post. When I was a really small kid, one of my favorite activities was to try and dam up the creek in my backyard. I would carefully mo...

“Thoughts on AI 2027” by Max Harms

10 Apr 2025

Contributed by Lukas

This is part of the MIRI Single Author Series. Pieces in this series represent the beliefs and opinions of their named authors, and do not claim to s...

“Short Timelines don’t Devalue Long Horizon Research” by Vladimir_Nesov

09 Apr 2025

Contributed by Lukas

Short AI takeoff timelines seem to leave no time for some lines of alignment research to become impactful. But any research rebalances the mix of cur...

“Alignment Faking Revisited: Improved Classifiers and Open Source Extensions” by John Hughes, abhayesian, Akbir Khan, Fabien Roger

09 Apr 2025

Contributed by Lukas

In this post, we present a replication and extension of an alignment faking model organism: Replication: We replicate the alignment faking (AF) pa...

“METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

07 Apr 2025

Contributed by Lukas

Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently e...

“Why Have Sentence Lengths Decreased?” by Arjun Panickssery

04 Apr 2025

Contributed by Lukas

“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards...

“AI 2027: What Superintelligence Looks Like” by Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo

03 Apr 2025

Contributed by Lukas

In 2021 I wrote what became my most popular blog post: What 2026 Looks Like. I intended to keep writing predictions all the way to AGI and beyond, bu...

“OpenAI #12: Battle of the Board Redux” by Zvi

03 Apr 2025

Contributed by Lukas

Back when the OpenAI board attempted and failed to fire Sam Altman, we faced a highly hostile information environment. The battle was fought largely t...

“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

03 Apr 2025

Contributed by Lukas

Epistemic status: This post aims at an ambitious target: improving intuitive understanding directly. The model for why this is worth trying is that I...

“OpenAI #12: Battle of the Board Redux” by Zvi

03 Apr 2025

Contributed by Lukas

Back when the OpenAI board attempted and failed to fire Sam Altman, we faced a highly hostile information environment. The battle was fought largely t...

“You will crash your car in front of my house within the next week” by Richard Korzekwa

02 Apr 2025

Contributed by Lukas

I'm not writing this to alarm anyone, but it would be irresponsible not to report on something this important. On current trends, every car will...

“My ‘infohazards small working group’ Signal Chat may have encountered minor leaks” by Linch

02 Apr 2025

Contributed by Lukas

Remember: There is no such thing as a pink elephant. Recently, I was made aware that my “infohazards small working group” Signal chat, an informa...

“Leverage, Exit Costs, and Anger: Re-examining Why We Explode at Home, Not at Work” by at_the_zoo

02 Apr 2025

Contributed by Lukas

Let's cut through the comforting narratives and examine a common behavioral pattern with a sharper lens: the stark difference between how anger ...

“PauseAI and E/Acc Should Switch Sides” by WillPetillo

02 Apr 2025

Contributed by Lukas

In the debate over AI development, two movements stand as opposites: PauseAI calls for slowing down AI progress, and e/acc (effective accelerationism...

“VDT: a solution to decision theory” by L Rudolf L

02 Apr 2025

Contributed by Lukas

Introduction Decision theory is about how to behave rationally under conditions of uncertainty, especially if this uncertainty involves being acausal...

“LessWrong has been acquired by EA” by habryka

01 Apr 2025

Contributed by Lukas

Dear LessWrong community, It is with a sense of... considerable cognitive dissonance that I announce a significant development regarding the future t...

“We’re not prepared for an AI market crash” by Remmelt

01 Apr 2025

Contributed by Lukas

Our community is not prepared for an AI crash. We're good at tracking new capability developments, but not as much the company financials. Curre...

“Conceptual Rounding Errors” by Jan_Kulveit

29 Mar 2025

Contributed by Lukas

Epistemic status: Reasonably confident in the basic mechanism. Have you noticed that you keep encountering the same ideas over and over? You read ano...

“Tracing the Thoughts of a Large Language Model” by Adam Jermyn

28 Mar 2025

Contributed by Lukas

[This is our blog post on the papers, which can be found at https://transformer-circuits.pub/2025/attribution-graphs/biology.html and https://transfo...

“Recent AI model progress feels mostly like bullshit” by lc

25 Mar 2025

Contributed by Lukas

About nine months ago, I and three friends decided that AI had gotten good enough to monitor large codebases autonomously for security problems. We s...

“AI for AI safety” by Joe Carlsmith

25 Mar 2025

Contributed by Lukas

(Audio version here (read by the author), or search for "Joe Carlsmith Audio" on your podcast app. This is the fourth essay in a series that...

“Policy for LLM Writing on LessWrong” by jimrandomh

25 Mar 2025

Contributed by Lukas

LessWrong has been receiving an increasing number of posts and contents that look like they might be LLM-written or partially-LLM-written, so we&apos...

“Will Jesus Christ return in an election year?” by Eric Neyman

25 Mar 2025

Contributed by Lukas

Thanks to Jesse Richardson for discussion. Polymarket asks: will Jesus Christ return in 2025? In the three days since the market opened, traders hav...

“Good Research Takes are Not Sufficient for Good Strategic Takes” by Neel Nanda

23 Mar 2025

Contributed by Lukas

TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and re...

“Intention to Treat” by Alicorn

22 Mar 2025

Contributed by Lukas

When my son was three, we enrolled him in a study of a vision condition that runs in my family. They wanted us to put an eyepatch on him for part of ...

“On the Rationality of Deterring ASI” by Dan H

22 Mar 2025

Contributed by Lukas

I’m releasing a new paper “Superintelligence Strategy” alongside Eric Schmidt (formerly Google), and Alexandr Wang (Scale AI). Below is the exec...

[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

19 Mar 2025

Contributed by Lukas

This is a link post. Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has...

“I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?” by shrimpy

19 Mar 2025

Contributed by Lukas

I have, over the last year, become fairly well-known in a small corner of the internet tangentially related to AI.As a result, I've begun making ...

“Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations” by Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer, Marius Hobbhahn

18 Mar 2025

Contributed by Lukas

Note: this is a research note based on observations from evaluating Claude Sonnet 3.7. We’re sharing the results of these ‘work-in-progress’ inv...

“Levels of Friction” by Zvi

18 Mar 2025

Contributed by Lukas

Scott Alexander famously warned us to Beware Trivial Inconveniences.When you make a thing easy to do, people often do vastly more of it.When you put u...

“Why White-Box Redteaming Makes Me Feel Weird” by Zygi Straznickas

17 Mar 2025

Contributed by Lukas

There's this popular trope in fiction about a character being mind controlled without losing awareness of what's happening. Think Jessica Jo...

“Reducing LLM deception at scale with self-other overlap fine-tuning” by Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Mike Vaiana, Cameron Berg

17 Mar 2025

Contributed by Lukas

This research was conducted at AE Studio and supported by the AI Safety Grants programme administered by Foresight Institute with additional support f...

“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam Jermyn, Monte M, evhub

16 Mar 2025

Contributed by Lukas

We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives—by training a model with a hidden misaligned ...

“The Most Forbidden Technique” by Zvi

14 Mar 2025

Contributed by Lukas

The Most Forbidden Technique is training an AI using interpretability techniques.An AI produces a final output [X] via some method [M]. You can analyz...

“Trojan Sky” by Richard_Ngo

13 Mar 2025

Contributed by Lukas

You learn the rules as soon as you’re old enough to speak. Don’t talk to jabberjays. You recite them as soon as you wake up every morning. Keep yo...

“OpenAI:” by Daniel Kokotajlo

11 Mar 2025

Contributed by Lukas

Exciting Update: OpenAI has released this blog post and paper which makes me very happy. It's basically the first steps along the research agenda...

“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis

09 Mar 2025

Contributed by Lukas

LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their produ...

“So how well is Claude playing Pokémon?” by Julian Bradshaw

09 Mar 2025

Contributed by Lukas

Background: After the release of Claude 3.7 Sonnet,[1] an Anthropic employee started livestreaming Claude trying to play through Pokémon Red. The liv...

“Methods for strong human germline engineering” by TsviBT

07 Mar 2025

Contributed by Lukas

Note: an audio narration is not available for this article. Please see the original text. The original text contained 169 footnotes which were omitte...

“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth

06 Mar 2025

Contributed by Lukas

In a recent post, Cole Wyeth makes a bold claim:. . . there is one crucial test (yes this is a crux) that LLMs have not passed. They have never done a...

“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis

06 Mar 2025

Contributed by Lukas

This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where w...

“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard

05 Mar 2025

Contributed by Lukas

This is a critique of How to Make Superbabies on LessWrong.Disclaimer: I am not a geneticist[1], and I've tried to use as little jargon as possib...

“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout

04 Mar 2025

Contributed by Lukas

This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control meas...

“Judgements: Merging Prediction & Evidence” by abramdemski

01 Mar 2025

Contributed by Lukas

I recently wrote about complete feedback, an idea which I think is quite important for AI safety. However, my note was quite brief, explaining the ide...

“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis

26 Feb 2025

Contributed by Lukas

First, let me quote my previous ancient post on the topic:Effective Strategies for Changing Public OpinionThe titular paper is very relevant here. I&a...

“Power Lies Trembling: a three-book review” by Richard_Ngo

26 Feb 2025

Contributed by Lukas

In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme forc...

“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans

26 Feb 2025

Contributed by Lukas

This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable cod...

“The Paris AI Anti-Safety Summit” by Zvi

22 Feb 2025

Contributed by Lukas

It doesn’t look good.What used to be the AI Safety Summits were perhaps the most promising thing happening towards international coordination for AI...

“Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby

20 Feb 2025

Contributed by Lukas

Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility.Circa 2015-2017, a lot of high quality content...

Activity Overview

Episodes

“The best simple argument for Pausing AI?” by Gary Marcus

“Foom & Doom 2: Technical alignment is hard” by Steven Byrnes

“Proposal for making credible commitments to AIs.” by Cleo Nardo

“X explains Z% of the variance in Y” by Leon Lang

“A case for courage, when speaking of AI danger” by So8res

“My pitch for the AI Village” by Daniel Kokotajlo

“Foom & Doom 1: ‘Brain in a box in a basement’” by Steven Byrnes

“Futarchy’s fundamental flaw” by dynomight

“Do Not Tile the Lightcone with Your Confused Ontology” by Jan_Kulveit

“Endometriosis is an incredibly interesting disease” by Abhishaike Mahajan

“Estrogen: A trip report” by cube_flipper

“New Endorsements for ‘If Anyone Builds It, Everyone Dies’” by Malo

[Linkpost] “the void” by nostalgebraist

“Mech interp is not pre-paradigmatic” by Lee Sharkey

“Distillation Robustifies Unlearning” by Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud, TurnTrout

“Intelligence Is Not Magic, But Your Threshold For ‘Magic’ Is Pretty Low” by Expertium

“A Straightforward Explanation of the Good Regulator Theorem” by Alfred Harwood

“Beware General Claims about ‘Generalizable Reasoning Capabilities’ (of Modern AI Systems)” by LawrenceC

“Season Recap of the Village: Agents raise $2,000” by Shoshannah Tekofsky

“The Best Reference Works for Every Subject” by Parker Conley

“‘Flaky breakthroughs’ pervade coaching — and no one tracks them” by Chipmonk

“The Value Proposition of Romantic Relationships” by johnswentworth

“It’s hard to make scheming evals look realistic” by Igor Ivanov, dan_moken

[Linkpost] “Social Anxiety Isn’t About Being Liked” by Chipmonk

“Truth or Dare” by Duncan Sabien (Inactive)

“Meditations on Doge” by Martin Sustrik

[Linkpost] “If you’re not sure how to sort a list or grid—seriate it!” by gwern

“What We Learned from Briefing 70+ Lawmakers on the Threat from AI” by leticiagarcia

“Winning the power to lose” by KatjaGrace

[Linkpost] “Gemini Diffusion: watch this space” by Yair Halberstadt

“AI Doomerism in 1879” by David Gross

“Consider not donating under $100 to political candidates” by DanielFilan

“It’s Okay to Feel Bad for a Bit” by moridinamael

“Explaining British Naval Dominance During the Age of Sail” by Arjun Panickssery

“Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies” by So8res

“Too Soon” by Gordon Seidoh Worley

“PSA: The LessWrong Feedback Service” by JustisMills

“Orienting Toward Wizard Power” by johnswentworth

“Interpretability Will Not Reliably Find Deceptive AI” by Neel Nanda

“Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall” by Vladimir_Nesov

“Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis” by jeanne_, eeeee

[Linkpost] “Jaan Tallinn’s 2024 Philanthropy Overview” by jaan

“Impact, agency, and taste” by benkuhn

[Linkpost] “To Understand History, Keep Former Population Distributions In Mind” by Arjun Panickssery

“AI-enabled coups: a small group could use AI to seize power” by Tom Davidson, Lukas Finnveden, rosehadshar

“Accountability Sinks” by Martin Sustrik

“Training AGI in Secret would be Unsafe and Unethical” by Daniel Kokotajlo

“Why Should I Assume CCP AGI is Worse Than USG AGI?” by Tomás B.

“Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI” by Kaj_Sotala

“Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study” by Adam Karvonen

“Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)” by Neel Nanda, lewis smith, Senthooran Rajamanoharan, Arthur Conmy, Callum McDougall, Tom Lieberum, János Kramár, Rohin Shah

[Linkpost] “Playing in the Creek” by Hastings

“Thoughts on AI 2027” by Max Harms

“Short Timelines don’t Devalue Long Horizon Research” by Vladimir_Nesov

“Alignment Faking Revisited: Improved Classifiers and Open Source Extensions” by John Hughes, abhayesian, Akbir Khan, Fabien Roger

“METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

“Why Have Sentence Lengths Decreased?” by Arjun Panickssery

“AI 2027: What Superintelligence Looks Like” by Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo

“OpenAI #12: Battle of the Board Redux” by Zvi

“The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit

“OpenAI #12: Battle of the Board Redux” by Zvi

“You will crash your car in front of my house within the next week” by Richard Korzekwa

“My ‘infohazards small working group’ Signal Chat may have encountered minor leaks” by Linch

“Leverage, Exit Costs, and Anger: Re-examining Why We Explode at Home, Not at Work” by at_the_zoo

“PauseAI and E/Acc Should Switch Sides” by WillPetillo

“VDT: a solution to decision theory” by L Rudolf L

“LessWrong has been acquired by EA” by habryka

“We’re not prepared for an AI market crash” by Remmelt

“Conceptual Rounding Errors” by Jan_Kulveit

“Tracing the Thoughts of a Large Language Model” by Adam Jermyn

“Recent AI model progress feels mostly like bullshit” by lc

“AI for AI safety” by Joe Carlsmith

“Policy for LLM Writing on LessWrong” by jimrandomh

“Will Jesus Christ return in an election year?” by Eric Neyman

“Good Research Takes are Not Sufficient for Good Strategic Takes” by Neel Nanda

“Intention to Treat” by Alicorn

“On the Rationality of Deterring ASI” by Dan H