LessWrong (Curated & Popular)

“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby

20 Feb 2025

Contributed by Lukas

Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a substa...

“How to Make Superbabies” by GeneSmith, kman

20 Feb 2025

Contributed by Lukas

We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affe...

“A computational no-coincidence principle” by Eric Neyman

19 Feb 2025

Contributed by Lukas

Audio note: this article contains 134 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text ...

“A History of the Future, 2025-2040” by L Rudolf L

19 Feb 2025

Contributed by Lukas

This is an all-in-one crosspost of a scenario I originally published in three parts on my blog (No Set Gauge). Links to the originals: A History of th...

“It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape

18 Feb 2025

Contributed by Lukas

On March 14th, 2015, Harry Potter and the Methods of Rationality made its final post. Wrap parties were held all across the world to read the ending a...

“Some articles in ‘International Security’ that I enjoyed” by Buck

16 Feb 2025

Contributed by Lukas

A friend of mine recently recommended that I read through articles from the journal International Security, in order to learn more about international...

“The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace

16 Feb 2025

Contributed by Lukas

This is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I've seen. I encourage folks to engage with its...

“Murder plots are infohazards” by Chris Monteiro

14 Feb 2025

Contributed by Lukas

Hi allI've been hanging around the rationalist-sphere for many years now, mostly writing about transhumanism, until things started to change in 2...

“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison

11 Feb 2025

Contributed by Lukas

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, ...

“The ‘Think It Faster’ Exercise” by Raemon

09 Feb 2025

Contributed by Lukas

Ultimately, I don’t want to solve complex problems via laborious, complex thinking, if we can help it. Ideally, I'd want to basically intuitive...

“So You Want To Make Marginal Progress...” by johnswentworth

08 Feb 2025

Contributed by Lukas

Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to figure out a driving route from their parking lot ...

“What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus

08 Feb 2025

Contributed by Lukas

Summary In this post, we explore different ways of understanding and measuring malevolence and explain why individuals with concerning levels of male...

“How AI Takeover Might Happen in 2 Years” by joshc

08 Feb 2025

Contributed by Lukas

I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios.I’...

“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit

05 Feb 2025

Contributed by Lukas

Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary ...

“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud

04 Feb 2025

Contributed by Lukas

This is a link post.Full version on arXiv | X Executive summary AI risk scenarios usually portray a relatively sudden loss of human control to AIs, ...

“Planning for Extreme AI Risks” by joshc

03 Feb 2025

Contributed by Lukas

This post should not be taken as a polished recommendation to AI companies and instead should be treated as an informal summary of a worldview. The co...

“Catastrophe through Chaos” by Marius Hobbhahn

03 Feb 2025

Contributed by Lukas

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. Many other people have talked about similar ...

“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt

01 Feb 2025

Contributed by Lukas

I (and co-authors) recently put out "Alignment Faking in Large Language Models" where we show that when Claude strongly dislikes what it is ...

“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes

30 Jan 2025

Contributed by Lukas

Summary and Table of ContentsThe goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evol...

“Ten people on the inside” by Buck

29 Jan 2025

Contributed by Lukas

(Many of these ideas developed in conversation with Ryan Greenblatt)In a shortform, I described some different levels of resources and buy-in for misa...

“Anomalous Tokens in DeepSeek-V3 and r1” by henry

28 Jan 2025

Contributed by Lukas

“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular t...

“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans

28 Jan 2025

Contributed by Lukas

This is the abstract and introduction of our new paper, with some discussion of implications for AI Safety at the end. Authors: Jan Betley*, Xuchan B...

“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell

27 Jan 2025

Contributed by Lukas

The CakeImagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and extended mathematical universe is to bake that cake...

“A Three-Layer Model of LLM Psychology” by Jan_Kulveit

26 Jan 2025

Contributed by Lukas

This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic StatusThis is primarily a phenomenological model ...

“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub

24 Jan 2025

Contributed by Lukas

This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment Science team, which might be of interest to resea...

“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt

24 Jan 2025

Contributed by Lukas

One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable argumen...

“Mechanisms too simple for humans to design” by Malmesbury

24 Jan 2025

Contributed by Lukas

Cross-posted from Telescopic TurnipAs we all know, humans are terrible at building butterflies. We can make a lot of objectively cool things like nucl...

“The Gentle Romance” by Richard_Ngo

22 Jan 2025

Contributed by Lukas

This is a link post.A story I wrote about living through the transition to utopia.This is the one story that I've put the most time and effort in...

“Quotes from the Stargate press conference” by Nikola Jurkovic

22 Jan 2025

Contributed by Lukas

This is a link post.Present alongside President Trump: Sam AltmanLarry Ellison (Oracle executive chairman and CTO)Masayoshi Son (Softbank CEO who be...

“The Case Against AI Control Research” by johnswentworth

21 Jan 2025

Contributed by Lukas

The AI Control Agenda, in its own words:… we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that ...

“Don’t ignore bad vibes you get from people” by Kaj_Sotala

20 Jan 2025

Contributed by Lukas

I think a lot of people have heard so much about internalized prejudice and bias that they think they should ignore any bad vibes they get about a per...

“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem

19 Jan 2025

Contributed by Lukas

(Both characters are fictional, loosely inspired by various traits from various real people. Be careful about combining kratom and alcohol.) The origi...

“Building AI Research Fleets” by bgold, Jesse Hoogland

18 Jan 2025

Contributed by Lukas

From AI scientist to AI research fleetResearch automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6...

“What Is The Alignment Problem?” by johnswentworth

17 Jan 2025

Contributed by Lukas

So we want to align future AGIs. Ultimately we’d like to align them to human values, but in the shorter term we might start with other targets, like...

“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes

14 Jan 2025

Contributed by Lukas

Traditional economics thinking has two strong principles, each based on abundant historical data: Principle (A): No “lump of labor”: If human popu...

“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov

14 Jan 2025

Contributed by Lukas

All quotes, unless otherwise marked, are Tolkien's words as printed in The Letters of J.R.R.Tolkien: Revised and Expanded Edition. All emphases m...

“Parkinson’s Law and the Ideology of Statistics” by Benquo

13 Jan 2025

Contributed by Lukas

The anonymous review of The Anti-Politics Machine published on Astral Codex X focuses on a case study of a World Bank intervention in Lesotho, and tel...

“Capital Ownership Will Not Prevent Human Disempowerment” by beren

11 Jan 2025

Contributed by Lukas

Crossposted from my personal blog. I was inspired to cross-post this here given the discussion that this post on the role of capital in an AI future e...

“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq

10 Jan 2025

Contributed by Lukas

TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activ...

“What o3 Becomes by 2028” by Vladimir_Nesov

09 Jan 2025

Contributed by Lukas

Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces...

“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman

09 Jan 2025

Contributed by Lukas

(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduc...

“How will we update about scheming?” by ryan_greenblatt

08 Jan 2025

Contributed by Lukas

I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, ...

“OpenAI #10: Reflections” by Zvi

08 Jan 2025

Contributed by Lukas

This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There's a bunch of good and interesting answers in the ...

“Maximizing Communication, not Traffic” by jefftk

07 Jan 2025

Contributed by Lukas

As someone who writes for fun, I don't need to get people onto my site: If I write a post and some people are able to get the core ideajust from ...

“What’s the short timeline plan?” by Marius Hobbhahn

02 Jan 2025

Contributed by Lukas

This is a low-effort post. I mostly want to get other people's takes and express concern about the lack of detailed and publicly available plans ...

“Shallow review of technical AI safety, 2024” by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers

30 Dec 2024

Contributed by Lukas

from aisafety.world The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense...

“By default, capital will matter more than ever after AGI” by L Rudolf L

29 Dec 2024

Contributed by Lukas

I've heard many people say something like "money won't matter post-AGI". This has always struck me as odd, and as most likely comp...

“Review: Planecrash” by L Rudolf L

28 Dec 2024

Contributed by Lukas

Take a stereotypical fantasy novel, a textbook on mathematical logic, and Fifty Shades of Grey. Mix them all together and add extra weirdness for spic...

“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth

26 Dec 2024

Contributed by Lukas

A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look...

“When Is Insurance Worth It?” by kqr

23 Dec 2024

Contributed by Lukas

TL;DR: If you want to know whether getting insurance is worth it, use the Kelly Insurance Calculator. If you want to know why or how, read on.Note to ...

“Orienting to 3 year AGI timelines” by Nikola Jurkovic

23 Dec 2024

Contributed by Lukas

My median expectation is that AGI[1] will be created 3 years from now. This has implications on how to behave, and I will share some useful thoughts I...

“What Goes Without Saying” by sarahconstantin

21 Dec 2024

Contributed by Lukas

There are people I can talk to, where all of the following statements are obvious. They go without saying. We can just “be reasonable” together, w...

“o3” by Zach Stein-Perlman

21 Dec 2024

Contributed by Lukas

I'm editing this post.OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons).It gets 25% on FrontierMath, smashing th...

“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit

21 Dec 2024

Contributed by Lukas

I like the research. I mostly trust the results. I dislike the 'Alignment Faking' name and frame, and I'm afraid it will stick and lead...

“AIs Will Increasingly Attempt Shenanigans” by Zvi

19 Dec 2024

Contributed by Lukas

Increasingly, we have seen papers eliciting in AI models various shenanigans.There are a wide variety of scheming behaviors. You’ve got your weight ...

“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck

18 Dec 2024

Contributed by Lukas

What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper...

“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast

15 Dec 2024

Contributed by Lukas

Six months ago, I was a high school English teacher.I wasn’t looking to change careers, even after nineteen sometimes-difficult years. I was good at...

“Biological risk from the mirror world” by jasoncrawford

13 Dec 2024

Contributed by Lukas

A new article in Science Policy Forum voices concern about a particular line of biological research which, if successful in the long term, could event...

“Subskills of ‘Listening to Wisdom’” by Raemon

13 Dec 2024

Contributed by Lukas

A fool learns from their own mistakes The wise learn from the mistakes of others.– Otto von Bismark A problem as old as time: The youth won't ...

“Understanding Shapley Values with Venn Diagrams” by Carson L

13 Dec 2024

Contributed by Lukas

Someone I know, Carson Loughridge, wrote this very nice post explaining the core intuition around Shapley values (which play an important role in imp...

“LessWrong audio: help us choose the new voice” by PeterH

12 Dec 2024

Contributed by Lukas

We make AI narrations of LessWrong posts available via our audio player and podcast feeds.We’re thinking about changing our narrator's voice.Th...

“Understanding Shapley Values with Venn Diagrams” by agucova

11 Dec 2024

Contributed by Lukas

This is a link post. Someone I know wrote this very nice post explaining the core intuition around Shapley values (which play an important role in imp...

“o1: A Technical Primer” by Jesse Hoogland

11 Dec 2024

Contributed by Lukas

TL;DR: In September 2024, OpenAI released o1, its first "reasoning model". This model exhibits remarkable test-time scaling laws, which comp...

“Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout

09 Dec 2024

Contributed by Lukas

We present gradient routing, a way of controlling where learning happens in neural networks. Gradient routing applies masks to limit the flow of gradi...

“Frontier Models are Capable of In-context Scheming” by Marius Hobbhahn, AlexMeinke, Bronson Schoen

06 Dec 2024

Contributed by Lukas

This is a brief summary of what we believe to be the most important takeaways from our new paper and from our findings shown in the o1 system card. We...

“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka

30 Nov 2024

Contributed by Lukas

TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate here, or send me an email, DM or signal message (+1 510 944 3235) if you wa...

“Repeal the Jones Act of 1920” by Zvi

29 Nov 2024

Contributed by Lukas

Balsa Policy Institute chose as its first mission to lay groundwork for the potential repeal, or partial repeal, of section 27 of the Jones Act of 192...

“China Hawks are Manufacturing an AI Arms Race” by garrison

29 Nov 2024

Contributed by Lukas

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, ...

“Information vs Assurance” by johnswentworth

27 Nov 2024

Contributed by Lukas

In contract law, there's this thing called a “representation”. Example: as part of a contract to sell my house, I might “represent that” ...

“You are not too ‘irrational’ to know your preferences.” by DaystarEld

27 Nov 2024

Contributed by Lukas

Epistemic Status: 13 years working as a therapist for a wide variety of populations, 5 of them working with rationalists and EA clients. 7 years teach...

“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi

25 Nov 2024

Contributed by Lukas

[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the con...

“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov

20 Nov 2024

Contributed by Lukas

Audio note: this article contains 33 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text i...

“OpenAI Email Archives” by habryka

19 Nov 2024

Contributed by Lukas

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman...

“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon

18 Nov 2024

Contributed by Lukas

Epistemic status: Toy model. Oversimplified, but has been anecdotally useful to at least a couple people, and I like it as a metaphor. IntroductionI’...

“Neutrality” by sarahconstantin

17 Nov 2024

Contributed by Lukas

Midjourney, “infinite library”I’ve had post-election thoughts percolating, and the sense that I wanted to synthesize something about this moment...

“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio

16 Nov 2024

Contributed by Lukas

Trump and the Republican party will yield broad governmental control during what will almost certainly be a critical period for AGI development. In th...

“OpenAI Email Archives (from Musk v. Altman)” by habryka

16 Nov 2024

Contributed by Lukas

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman...

“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

15 Nov 2024

Contributed by Lukas

Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback.Following up on our recent “Sabotage Evaluations...

“The Online Sports Gambling Experiment Has Failed” by Zvi

12 Nov 2024

Contributed by Lukas

Related: Book Review: On the Edge: The GamblersI have previously been heavily involved in sports betting. That world was very good to me. The times we...

“o1 is a bad idea” by abramdemski

12 Nov 2024

Contributed by Lukas

This post comes a bit late with respect to the news cycle, but I argued in a recent interview that o1 is an unfortunate twist on LLM technologies, mak...

“Current safety training techniques do not fully transfer to the agent setting” by Simon Lermen, Govind Pimpale

09 Nov 2024

Contributed by Lukas

TL;DR: I'm presenting three recent papers which all share a similar finding, i.e. the safety training techniques for chat models don’t transfer...

“Explore More: A Bag of Tricks to Keep Your Life on the Rails” by Shoshannah Tekofsky

04 Nov 2024

Contributed by Lukas

At least, if you happen to be near me in brain space.What advice would you give your younger self?That was the prompt for a class I taught at PAIR 202...

“Survival without dignity” by L Rudolf L

04 Nov 2024

Contributed by Lukas

I open my eyes and find myself lying on a bed in a hospital room. I blink."Hello", says a middle-aged man with glasses, sitting on a chair b...

“The Median Researcher Problem” by johnswentworth

04 Nov 2024

Contributed by Lukas

Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median resear...

“The Compendium, A full argument about extinction risk from AGI” by adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell, Andrea_Miotti

01 Nov 2024

Contributed by Lukas

This is a link post.We (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi) have just published The Compendium, which brings toge...

“What TMS is like” by Sable

31 Oct 2024

Contributed by Lukas

There are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter.TMS stands for Transcranial Magnetic Stimulatio...

“The hostile telepaths problem” by Valentine

28 Oct 2024

Contributed by Lukas

Epistemic status: model-building based on observation, with a few successful unusual predictions. Anecdotal evidence has so far been consistent with t...

“A bird’s eye view of ARC’s research” by Jacob_Hilton

27 Oct 2024

Contributed by Lukas

This post includes a "flattened version" of an interactive diagram that cannot be displayed on this site. I recommend reading the original v...

“A Rocket–Interpretability Analogy” by plex

25 Oct 2024

Contributed by Lukas

1. 4.4% of the US federal budget went into the space race at its peak.This was surprising to me, until a friend pointed out that landing rockets o...

“I got dysentery so you don’t have to” by eukaryote

24 Oct 2024

Contributed by Lukas

This summer, I participated in a human challenge trial at the University of Maryland. I spent the days just prior to my 30th birthday sick with shigel...

“Overcoming Bias Anthology” by Arjun Panickssery

23 Oct 2024

Contributed by Lukas

This is a link post. Part 1: Our Thinking Near and Far1 Abstract/Distant Future Bias2 Abstractly Ideal, Concretely Selfish3 We Add Near, Average Far4 ...

“Arithmetic is an underrated world-modeling technology” by dynomight

22 Oct 2024

Contributed by Lukas

Of all the cognitive tools our ancestors left us, what's best? Society seems to think pretty highly of arithmetic. It's one of the first thi...

“My theory of change for working in AI healthtech” by Andrew_Critch

15 Oct 2024

Contributed by Lukas

This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to day, I'm more focussed on the positive ...

“Why I’m not a Bayesian” by Richard_Ngo

15 Oct 2024

Contributed by Lukas

This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then ...

“The AGI Entente Delusion” by Max Tegmark

14 Oct 2024

Contributed by Lukas

As humanity gets closer to Artificial General Intelligence (AGI), a new geopolitical strategy is gaining traction in US and allied circles, in the Nat...

“Momentum of Light in Glass” by Ben

14 Oct 2024

Contributed by Lukas

I think that most people underestimate how many scientific mysteries remain, even on questions that sound basic.My favourite candidate for "the m...

“Overview of strong human intelligence amplification methods” by TsviBT

09 Oct 2024

Contributed by Lukas

How can we make many humans who are very good at solving difficult problems? Summary (table of made-up numbers)I made up the made-up numbers in this t...

“Struggling like a Shadowmoth” by Raemon

03 Oct 2024

Contributed by Lukas

This post is probably hazardous for one type of person in one particular growth stage, and necessary for people in a different growth stage, and I don...

“Three Subtle Examples of Data Leakage” by abstractapplic

03 Oct 2024

Contributed by Lukas

This is a description of my work on some data science projects, lightly obfuscated and fictionalized to protect the confidentiality of the organizatio...

“the case for CoT unfaithfulness is overstated” by nostalgebraist

30 Sep 2024

Contributed by Lukas

[Meta note: quickly written, unpolished. Also, it's possible that there's some more convincing work on this topic that I'm unaware of –...

Activity Overview

Episodes

“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby

“How to Make Superbabies” by GeneSmith, kman

“A computational no-coincidence principle” by Eric Neyman

“A History of the Future, 2025-2040” by L Rudolf L

“It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape

“Some articles in ‘International Security’ that I enjoyed” by Buck

“The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace

“Murder plots are infohazards” by Chris Monteiro

“Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison

“The ‘Think It Faster’ Exercise” by Raemon

“So You Want To Make Marginal Progress...” by johnswentworth

“What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus

“How AI Takeover Might Happen in 2 Years” by joshc

“Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit

“Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud

“Planning for Extreme AI Risks” by joshc

“Catastrophe through Chaos” by Marius Hobbhahn

“Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt

“‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes

“Ten people on the inside” by Buck

“Anomalous Tokens in DeepSeek-V3 and r1” by henry

“Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans

“Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell

“A Three-Layer Model of LLM Psychology” by Jan_Kulveit

“Training on Documents About Reward Hacking Induces Reward Hacking” by evhub

“AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt

“Mechanisms too simple for humans to design” by Malmesbury

“The Gentle Romance” by Richard_Ngo

“Quotes from the Stargate press conference” by Nikola Jurkovic

“The Case Against AI Control Research” by johnswentworth

“Don’t ignore bad vibes you get from people” by Kaj_Sotala

“[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem

“Building AI Research Fleets” by bgold, Jesse Hoogland

“What Is The Alignment Problem?” by johnswentworth

“Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes

“Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov

“Parkinson’s Law and the Ideology of Statistics” by Benquo

“Capital Ownership Will Not Prevent Human Disempowerment” by beren

“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq

“What o3 Becomes by 2028” by Vladimir_Nesov

“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman

“How will we update about scheming?” by ryan_greenblatt

“OpenAI #10: Reflections” by Zvi

“Maximizing Communication, not Traffic” by jefftk

“What’s the short timeline plan?” by Marius Hobbhahn

“Shallow review of technical AI safety, 2024” by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers

“By default, capital will matter more than ever after AGI” by L Rudolf L

“Review: Planecrash” by L Rudolf L

“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth

“When Is Insurance Worth It?” by kqr

“Orienting to 3 year AGI timelines” by Nikola Jurkovic

“What Goes Without Saying” by sarahconstantin

“o3” by Zach Stein-Perlman

“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit

“AIs Will Increasingly Attempt Shenanigans” by Zvi

“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck

“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast

“Biological risk from the mirror world” by jasoncrawford

“Subskills of ‘Listening to Wisdom’” by Raemon

“Understanding Shapley Values with Venn Diagrams” by Carson L

“LessWrong audio: help us choose the new voice” by PeterH

“Understanding Shapley Values with Venn Diagrams” by agucova

“o1: A Technical Primer” by Jesse Hoogland

“Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout

“Frontier Models are Capable of In-context Scheming” by Marius Hobbhahn, AlexMeinke, Bronson Schoen

“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka

“Repeal the Jones Act of 1920” by Zvi

“China Hawks are Manufacturing an AI Arms Race” by garrison

“Information vs Assurance” by johnswentworth

“You are not too ‘irrational’ to know your preferences.” by DaystarEld

“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi

“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov

“OpenAI Email Archives” by habryka

“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon

“Neutrality” by sarahconstantin

“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio

“OpenAI Email Archives (from Musk v. Altman)” by habryka