LessWrong (30+ Karma)

[Linkpost] “You can only build safe ASI if ASI is globally banned” by Connor Leahy

17 Apr 2026

Contributed by Lukas

This is a link post. Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the ...

“Beware of Well-Written Posts” by alseph

17 Apr 2026

Contributed by Lukas

Beware of when a post is so well-written that you can't put it down. Be wary of posts that are more visually attractive than average. Beware of posts...

“You Aren’t in Charge of the Overton Window; Politics Is Not Interior Design” by Davidmanheim

16 Apr 2026

Contributed by Lukas

Sometimes, people don't say what they actually think, not because saying it would be rude or costly, but because they believe saying it now would be ...

“Carpathia Day” by Drake Morrison

16 Apr 2026

Contributed by Lukas

(The better telling is here. Seriously you should go read it. I've heard this story told in rationalist circles, but there wasn't a post on LessWrong...

“Claude Code, Codex and Agentic Coding #7: Auto Mode” by Zvi

16 Apr 2026

Contributed by Lukas

As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrade...

“Do not conquer what you cannot defend” by habryka

16 Apr 2026

Contributed by Lukas

Epistemic status: All of the western canon must eventually be re-invented in a LessWrong post. So today we are re-inventing federalism. Once upon a t...

“What is the Iliad Intensive?” by Leon Lang, Alexander Gietelink Oldenziel, David Udell

16 Apr 2026

Contributed by Lukas

Almost two months ago, Iliad announced the Iliad Intensive and Iliad Fellowship. Fellowships are a well-understood unit, but what is an intensive? Th...

“The Mirror Test Is Complicated” by J Bostock

15 Apr 2026

Contributed by Lukas

The Mirror Test is kind of like Hitler. In any discussion of animal cognition, somebody is going to bring it up. The conversation usually goes like t...

“Contra Leicht on AI Pauses” by David Scott Krueger (formerly: capybaralet)

15 Apr 2026

Contributed by Lukas

This is going to be a nerdier article than usual. It's a response to Anton Leicht's blog post “Press Play To Continue”. I disagree with much of i...

“Nectome: All That I Know” by Raelifin

15 Apr 2026

Contributed by Lukas

TLDR: I flew to Oregon to investigate Nectome, a brain preservation startup, and talk to their entire team. They’re an ambitious company, looking t...

“Effective Altruism, Seen From Slytherin” by Xylix

15 Apr 2026

Contributed by Lukas

Epistemic status: Left as an exercise for the reader. I was thinking of EA outreach and its optics this week. And was inspired to glance at a specif...

“Majority Report” by peralice

15 Apr 2026

Contributed by Lukas

[Attention conservation notice: description of a social phenomenon which may be obvious to some people.] This post is partially inspired by Alexander...

“Current AIs seem pretty misaligned to me” by ryan_greenblatt

15 Apr 2026

Contributed by Lukas

Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're ...

“Contra Byrnes on UV & Cancer” by HedonicEscalator

15 Apr 2026

Contributed by Lukas

In his recent LessWrong post, Some takes on UV & cancer, Steve Byrnes comes out against the "Public Health Orthodoxy" on UV. Among other topics I...

“Everyone Has a Plan Until They Get Social Pressure To the Face” by Czynski

15 Apr 2026

Contributed by Lukas

or: Invisible Social Consensus is Real And Can Hurt You Related: Annoyingly Principled People, and what befalls them, both in terms of the claim bein...

“Mechanisms of Introspective Awareness” by Uzay Macar

14 Apr 2026

Contributed by Lukas

Uzay Macar and Li Yang are co-first authors. This work was advised by Jack Lindsey and Emmanuel Ameisen, with contributions from Atticus Wang and Pet...

“Claude Mythos #3: Capabilities and Additions” by Zvi

14 Apr 2026

Contributed by Lukas

To round out coverage of Mythos, today covers capabilities other than cyber, and anything else additional not covered by the first two posts, includi...

“Load-Bearing Sincerity: On the Motive Reinforcement Thesis” by Fiora Starlight

14 Apr 2026

Contributed by Lukas

This post was written almost entirely before David Africa and Jacob Pfau published this post earlier today, which takes a different approach to the s...

“Diary of a “Doomer”: 12+ years arguing about AI risk (part 1)” by David Scott Krueger (formerly: capybaralet)

14 Apr 2026

Contributed by Lukas

How I learned about Deep Learning. As far as I know, I’m the second person ever to get into the field of AI largely because I was worried about the...

“A Retrospective of Richard Ngo’s 2022 List of Conceptual Alignment Projects” by LawrenceC

14 Apr 2026

Contributed by Lukas

Written very quickly for the InkHaven Residency. In 2022, Richard Ngo wrote a list of 26 Conceptual Alignment Research Projects. Now that it's 2026, ...

“From personas to intentions: towards a science of motivations for AI models” by David Africa, Jacob Pfau

14 Apr 2026

Contributed by Lukas

TLDR: Behavior-only descriptions are useful, but insufficient for aligning advanced models with high assurance.Two models can look equally aligned on...

“The Shapley Share of Responsibility?” by Raemon

14 Apr 2026

Contributed by Lukas

Deepfates on twitter wrote: If you're in a theater and you shout "Fire!", and the audience reacts predictably and in the process trample someone to d...

“Who Killed Common Law?” by Benquo

14 Apr 2026

Contributed by Lukas

The classical undergraduate humanities curriculum in America was destroyed and replaced over the course of the twentieth century. The destruction is ...

“Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” by Alex Mallen, ryan_greenblatt

14 Apr 2026

Contributed by Lukas

It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at ...

“Meaningful Questions Have Return Types” by Drake Morrison

14 Apr 2026

Contributed by Lukas

One way intellectual progress stalls is when you are asking the Wrong Questions. Your question is nonsensical, or cuts against the way reality works....

“Political Violence Is Never Acceptable” by Zvi

13 Apr 2026

Contributed by Lukas

Nor is the threat or implication of violence. Period. Ever. No exceptions. It is completely unacceptable. I condemn it in the strongest possible ter...

“Only Law Can Prevent Extinction” by Eliezer Yudkowsky

13 Apr 2026

Contributed by Lukas

There's a quote I read as a kid that stuck with me my whole life: "Remember that all tax revenue is the result of holding a gun to somebody's head. N...

“AI Safety’s Biggest Talent Gap Isn’t Researchers. It’s Generalists.” by Topaz, agucova, Alexandra Bates, Parv Mahajan

13 Apr 2026

Contributed by Lukas

This post was cross posted to the EA Forum TL;DR: One of the largest talent gaps in AI safety is competent generalists: program managers, fieldbuilde...

“Tomas Bjartur: The Last Prodigy” by Linch

13 Apr 2026

Contributed by Lukas

n 2026, every budding prodigy in writing is in some sense a tragedy. Anybody with experience prompting the large language models to write fiction kno...

“Annoyingly Principled People, and what befalls them” by Raemon

13 Apr 2026

Contributed by Lukas

Here are two beliefs that are sort of haunting me right now: Folk who try to push people to uphold principles (whether established ones or novel ones...

“TAPs or it didn’t happen” by Raemon

13 Apr 2026

Contributed by Lukas

Once, I went to talk about "curiosity" with @LoganStrohl. They noted "it seems like you have a good handle on 'active curiosity', but you don't reall...

“Returns to intelligence” by RobertM

13 Apr 2026

Contributed by Lukas

I'm going to tell you a story. For that story to make sense, I need to give you some background context. I have some pretty smart friends. One of the...

“Daycare illnesses” by Nina Panickssery

13 Apr 2026

Contributed by Lukas

Before I had a baby I was pretty agnostic about the idea of daycare. I could imagine various pros and cons but I didn’t have a strong overall opini...

“The policy surrounding Mythos marks an irreversible power shift” by sil

13 Apr 2026

Contributed by Lukas

This post assumes Anthropic isn't lying: Mythos is the current SOTAMythos is potent[1]Anthropic will not make it publicly available un-nerfed[2]Anthr...

“Talk English, Think Something Else” by J Bostock

13 Apr 2026

Contributed by Lukas

There's an adage from programming in C++ which goes something like "Yes, you write C, but you imagine the machine code as you do." I assumed this was...

“Sparse Autoencoders for Single-Cell Models” by Ihor Kendiukhov

13 Apr 2026

Contributed by Lukas

People are rushing to build bigger and bigger single cell foundation models (trained on RNA sequencing data), but in my view we have not extracted ev...

“Eggs, rooms, puzzles, and talking about AI” by KatjaGrace

13 Apr 2026

Contributed by Lukas

I live with five friends in a big house, and two things I’ve done in it on this particular Sunday are hide 156 easter eggs all around, and reach a ...

“Morale” by J Bostock

12 Apr 2026

Contributed by Lukas

One particularly pernicious condition is low morale. Morale is, roughly, "the belief that if you work hard, your conditions will improve." If your mo...

“Your Mom is a Chimera” by michaelwaves

12 Apr 2026

Contributed by Lukas

And so are you! When you were a fetus, you were sending millions of your cells through the placenta into your mom. And she was sending her cells into...

“The Blast Radius Principle” by Martin Sustrik

12 Apr 2026

Contributed by Lukas

In April 2024, a salvo of cruise missiles destroyed the Trypilska thermal power plant, the largest in the Kyiv region, in under an hour. In June 2023...

“How to make good tea” by RobertM

12 Apr 2026

Contributed by Lukas

If you're starting from a baseline of drinking relatively cheap mass-market teabags, the easiest way to marginally improve your tea quality is by mak...

“Catching illicit distributed training operations during an AI pause” by Robi Rahman

12 Apr 2026

Contributed by Lukas

Last year, my colleagues on MIRI's Technical Governance Team proposed an international agreement to halt risky development of superhuman artificial i...

[Linkpost] “Scott Alexander gentrified my meetup” by dominicq

11 Apr 2026

Contributed by Lukas

This is a link post. On 10 April 2026, I organized an ACX/LW/rationality meetup. Initially, I intended for it to be just our regular meetup, and “re...

“Pausing AI Is the Best Answer to Post-Alignment Problems” by MichaelDickens

11 Apr 2026

Contributed by Lukas

Even if we solve the AI alignment problem, we still face post-alignment problems, which are all the other existential problems [1] that AI may bring...

“Some thoughts on Nectome’s risk and resilience” by Aurelia

11 Apr 2026

Contributed by Lukas

One of the best ways to reduce Nectome's long-term risk is to show that preservation is a thing people want by buying one yourself; this is a critica...

“Chocolate Sloths, Tinder, and Moral Backstops” by J Bostock

11 Apr 2026

Contributed by Lukas

My grandma has a poor understanding of moral hazard, when it comes to buying me 155g chocolate sloths. Moral hazard is a concept in political economy...

“Dario probably doesn’t believe in superintelligence” by RobertM

11 Apr 2026

Contributed by Lukas

Epistemic status: I think this is true but don't think this post is a very strong argument for the case, or particularly interesting to read. But I h...

“The Unintelligibility is Ours: Notes on Chain-of-Thought” by 1a3orn

11 Apr 2026

Contributed by Lukas

Many people seem to think that the chains-of-thought in RL-trained LLMs are under a great deal of "pressure" to cease being English. The idea is that...

“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by ryan_greenblatt

11 Apr 2026

Contributed by Lukas

Anthropic's system card for Mythos Preview says: It's unclear how we should interpret this. What do they mean by productivity uplift? To what exten...

“Claude Mythos #2: Cybersecurity and Project Glasswing” by Zvi

10 Apr 2026

Contributed by Lukas

Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon. Its cyber capabilities are too dangerous to...

“Why Control Creates Conflict, and When to Open Instead” by plex

10 Apr 2026

Contributed by Lukas

tl;dr: with multiple agents, control attempts tend to create conflict, because control attempts shut down communications channels, which leads to fee...

“Reproducing steering against evaluation awareness in a large open-weight model” by Thomas Read, Bronson Schoen, Joseph Bloom

10 Apr 2026

Contributed by Lukas

Produced as part of the UK AISI Model Transparency Team. Our team works on ensuring models don't subvert safety assessments, e.g. through evaluation ...

“Have we already lost? Part 2: Reasons for Doom” by LawrenceC

10 Apr 2026

Contributed by Lukas

Written very quickly for the Inkhaven Residency. As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidabl...

“Model organisms researchers should check whether high LRs defeat their model organisms” by dx26, Sebastian Prasanna, Alek Westover, Vivek Hebbar, Julian Stastny

10 Apr 2026

Contributed by Lukas

Thanks to Buck Shlegeris for feedback on a draft of this post. The goal-guarding hypothesis states that schemers will be able to preserve their goals...

“Anthropic did not publish a “risk discussion” of Mythos when required by their RSP” by RobertM

10 Apr 2026

Contributed by Lukas

I and some other people noticed a potential discrepancy in Anthropic's announcement of Claude Mythos. The version of the RSP that was operative over ...

“Claude Mythos: The System Card” by Zvi

10 Apr 2026

Contributed by Lukas

Claude Mythos is different. This is the first model other than GPT-2 that is at first not being released for public use at all. With GPT-2 the dela...

“Some takes on UV & cancer” by Steven Byrnes

10 Apr 2026

Contributed by Lukas

Table of contents: Part 1: In which I use my optical physics background to share some hopefully-uncontroversial observationsPart 2: In which I boldly...

“AI #163: Mythos Quest” by Zvi

09 Apr 2026

Contributed by Lukas

There exists an AI model, Claude Mythos, that has discovered critical safety vulnerabilities in every major operating system and browser. If released...

“Help me launch Obsolete: a book aimed at building a new movement for AI reform” by garrison

09 Apr 2026

Contributed by Lukas

I wrote a book! It's called Obsolete: The AI Industry's Trillion-Dollar Race to Replace You—and How to Stop It, and it’ll be available in May if ...

“Slightly-Super Persuasion Will Do” by Tomás B.

09 Apr 2026

Contributed by Lukas

In SF this week, I met an online friend in person for the first time yesterday. We talked about super-persuasion. His take was: there is mostly an ef...

“Have we already lost? Part 1: The Plan in 2024” by LawrenceC

09 Apr 2026

Contributed by Lukas

Written very quickly for the Inkhaven Residency. As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidab...

“Do not be surprised if LessWrong gets hacked” by RobertM

09 Apr 2026

Contributed by Lukas

Or, for that matter, anything else. This post is meant to be two things: a PSA about LessWrong's current security posture, from a LessWrong admin[1]a...

“One Week in the Rat Farm” by Philip Harker

09 Apr 2026

Contributed by Lukas

Hello, LessWrong. This is a personal introduction diary-ish post and it does not have a thesis. I apologise if this isn't a good fit for the website;...

“101 Humans of New York on the Risks of AI” by Corm

09 Apr 2026

Contributed by Lukas

Nobody has ever done an in person door to door survey about AI risks[1]. What do people really think about AI? Like really? There have been some surv...

“Baking tips” by RobertM

08 Apr 2026

Contributed by Lukas

These are things I've learned from experience that others might find helpful. Some of them are easy to miss for a while. (Also an exercise in "realit...

“An easy coordination problem?” by KatjaGrace

08 Apr 2026

Contributed by Lukas

Common wisdom says that it is incredibly hard to coordinate to not build more dangerous AI. This sounds believable in the abstract: international geo...

“Excerpts and Notes on Mythos Model Card” by williawa

08 Apr 2026

Contributed by Lukas

List of Excerpts from Mythos model card. Tried to include interesting things, but also included some boring to be expected things. I omitted some t...

“The effects of caffeine consumption do not decay with a ~5 hour half-life” by kman

08 Apr 2026

Contributed by Lukas

epistemic status: confident in the overall picture, substantial quantitative uncertainty about the relative potency of caffeine and paraxanthine tldr...

“You don’t know what you are made of till you’ve been stalked across three countries” by Shoshannah Tekofsky

08 Apr 2026

Contributed by Lukas

When I was 19 I made some decisions. One of them was to stop eating meat. One of them was to stop using cutlery. One of them was to stop using chairs...

“Why is Flesh So Weak?” by J Bostock

08 Apr 2026

Contributed by Lukas

Ok, I got nerd sniped on the specific argument "Animals would be better off being made of stronger material than protein, but they don't because evol...

“The hard part isn’t noticing when papers are bad, it’s deciding what to do afterwards” by LawrenceC

08 Apr 2026

Contributed by Lukas

Written (very) quickly for the Inkhaven Residency. I used to hate the classic management adage of “bring me solutions, not problems”. After all, ...

“We can prevent progress! Conceptual clarity, and inspiration from the FDA” by KatjaGrace

08 Apr 2026

Contributed by Lukas

“We can’t prevent progress” say the people for some reason enthusiastically advocating that we just risk dying by AI rather than even consider ...

“AI as a Trojan horse race” by KatjaGrace

08 Apr 2026

Contributed by Lukas

I’ve argued that the AI situation is not clearly an ‘arms race’. By which I mean, going fast is not clearly good, even selfishly. I think this...

“My unsupervised elicitation challenge” by DanielFilan

08 Apr 2026

Contributed by Lukas

Note: you are ineligible to complete this challenge if you’ve studied Ancient or Modern Greek, or if you natively speak Modern Greek, or if for oth...

“Role-playing vs Self-modelling” by Jan_Kulveit

08 Apr 2026

Contributed by Lukas

In a recent debate on Twitter – which I recommend reading in full – David Chalmers argues: "Claude doesn't role-play the assistant, it realizes t...

“Elementary Condensation” by Jan

08 Apr 2026

Contributed by Lukas

Previously in this series: Elementary Infra-Bayesianism 1. There's this paper Earlier last week I got nerd-sniped by a paper called Condensation: a t...

“Hedging and Survival-Weighted Planning” by Vaniver

08 Apr 2026

Contributed by Lukas

This wasn't intended to be a topical post, but Claude Mythos's system card is out, and... well. I wrote years ago about decision analysis, which ofte...

“Opus’s Schelling Steganography Has Amplifiable Secrecy Against Weaker Eavesdroppers” by Elle Najt

08 Apr 2026

Contributed by Lukas

Code: github.com/ElleNajt/Steganography_Wiretapping | Data: huggingface.co/datasets/lnajt/steganography-wiretapping Play the decoding game: can you ...

“An Alignment Journal: Features and policies” by JessRiedel, Dan MacKinlay, Luca, Daniel Murfet, david reinstein

08 Apr 2026

Contributed by Lukas

We previously announced a forthcoming research journal for AI alignment. This cross-post from our blog describes our tentative plans for the features...

“Fantasy ideology” by Ninety-Three

07 Apr 2026

Contributed by Lukas

The following is a long excerpt from a longer article published in 2002 by Lee Harris, Al Qaeda's Fantasy Ideology. The full article is about what it...

[Linkpost] “Questions raised about OpenAI leaders’ trustworthiness by the New Yorker” by Remmelt

07 Apr 2026

Contributed by Lukas

This is a link post. One excerpt stuck out for me – on Brockman's idea to play China, Russia, and other world powers against each other: In 2017, Am...

“Claude Mythos System Card Preview” by anaguma

07 Apr 2026

Contributed by Lukas

Anthropic has released a preview of the Claude Mythos System Card preview here. It is too long to present in full, but a section I found particularly...

“My picture of the present in AI” by ryan_greenblatt

07 Apr 2026

Contributed by Lukas

In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a scena...

[Linkpost] ”[Paper] Stringological sequence prediction I” by Vanessa Kosoy

07 Apr 2026

Contributed by Lukas

This is a link post. TLDR: The first in a planned series of three or more papers, which constitute the first major in-road in the compositional learni...

“We’re actually running out of benchmarks to upper bound AI capabilities” by LawrenceC

07 Apr 2026

Contributed by Lukas

Written quickly as part of the Inkhaven Residency. Opinions are my own and do not represent METR's official opinion. In early 2025, the situation fo...

“Don’t write for LLMs, just record everything” by RobertM

07 Apr 2026

Contributed by Lukas

Some people have argued the advent of LLMs has dramatically increased the value of having a public writing footprint. The first reason given is that ...

“Contra Nina Panickssery on advice for children” by Sean Herrington

07 Apr 2026

Contributed by Lukas

I recently read this post by Nina Panickssery on advice for children. I felt that several of the recommendations are actively harmful the children th...

“By Strong Default, ASI Will End Liberal Democracy” by MichaelDickens

07 Apr 2026

Contributed by Lukas

Cross-posted from my website. The existence of liberal democracy—with rule of law, constraints on government power, and enfranchised citizens—re...

“AIs can now often do massive easy-to-verify SWE tasks and I’ve updated towards shorter timelines” by ryan_greenblatt

06 Apr 2026

Contributed by Lukas

I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) a...

“Paper close reading: “Why Language Models Hallucinate”” by LawrenceC

06 Apr 2026

Contributed by Lukas

People often talk about paper reading as a skill, but there aren’t that many examples of people walking through how they do it. Part of this is a p...

“Ten different ways of thinking about Gradual Disempowerment” by David Scott Krueger (formerly: capybaralet)

05 Apr 2026

Contributed by Lukas

About a year ago, we wrote a paper that coined the term “Gradual Disempowerment.” It proved to be a great success, which is terrific. A friend an...

“11 pieces of advice for children” by Nina Panickssery

05 Apr 2026

Contributed by Lukas

I came up with these principles when I was a child myself. Don’t be a sheep 🐑. Avoid mindlessly copying others. Resist the urge towards conformi...

“Steering Might Stop Working Soon” by J Bostock

05 Apr 2026

Contributed by Lukas

Steering LLMs with single-vector methods might break down soon, and by soon I mean soon enough that if you're working on steering, you should start p...

“Am I the baddie?” by Ustice

05 Apr 2026

Contributed by Lukas

I am a software engineer. I work for a company that makes software for road construction. Monday last week we were under a bad crunch and we were tol...

“Academic Proof-of-Work in the Age of LLMs” by LawrenceC

05 Apr 2026

Contributed by Lukas

Written quickly as part of the Inkhaven Residency. Related: Bureaucracy as active ingredient, pain as active ingredient A widely known secret in acad...

“Positive sum does not mean “win-win”” by loops

05 Apr 2026

Contributed by Lukas

A lot of people and documents online say that positive-sum games are "win-wins", where all of the participants are better off. But this isn't true! I...

“Considerations for growing the pie” by Zach Stein-Perlman

05 Apr 2026

Contributed by Lukas

Recently some friends and I were comparing growing the pie interventions to an increasing our friends' share of the pie intervention, and at first we...

″“Following the incentives”” by David Scott Krueger (formerly: capybaralet)

04 Apr 2026

Contributed by Lukas

A few years ago I listened to a fascinating podcast interview featuring former Democratic presidential candidates Andrew Yang and Marianne Williamson...

“Chicken-Free Egg Whites” by jefftk

04 Apr 2026

Contributed by Lukas

Baking has traditionally made extensive use of egg whites, especially the way they can be beaten into a foam and then set with heat. While I eat eg...

“dark ilan” by ozymandias

04 Apr 2026

Contributed by Lukas

The second time Vellam uncovers the conspiracy underlying all of society, he approaches a Keeper. Some of the difference is convenience. Since Vellam...

Activity Overview

Episodes

[Linkpost] “You can only build safe ASI if ASI is globally banned” by Connor Leahy

“Beware of Well-Written Posts” by alseph

“You Aren’t in Charge of the Overton Window; Politics Is Not Interior Design” by Davidmanheim

“Carpathia Day” by Drake Morrison

“Claude Code, Codex and Agentic Coding #7: Auto Mode” by Zvi

“Do not conquer what you cannot defend” by habryka

“What is the Iliad Intensive?” by Leon Lang, Alexander Gietelink Oldenziel, David Udell

“The Mirror Test Is Complicated” by J Bostock

“Contra Leicht on AI Pauses” by David Scott Krueger (formerly: capybaralet)

“Nectome: All That I Know” by Raelifin

“Effective Altruism, Seen From Slytherin” by Xylix

“Majority Report” by peralice

“Current AIs seem pretty misaligned to me” by ryan_greenblatt

“Contra Byrnes on UV & Cancer” by HedonicEscalator

“Everyone Has a Plan Until They Get Social Pressure To the Face” by Czynski

“Mechanisms of Introspective Awareness” by Uzay Macar

“Claude Mythos #3: Capabilities and Additions” by Zvi

“Load-Bearing Sincerity: On the Motive Reinforcement Thesis” by Fiora Starlight

“Diary of a “Doomer”: 12+ years arguing about AI risk (part 1)” by David Scott Krueger (formerly: capybaralet)

“A Retrospective of Richard Ngo’s 2022 List of Conceptual Alignment Projects” by LawrenceC

“From personas to intentions: towards a science of motivations for AI models” by David Africa, Jacob Pfau

“The Shapley Share of Responsibility?” by Raemon

“Who Killed Common Law?” by Benquo

“Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes” by Alex Mallen, ryan_greenblatt

“Meaningful Questions Have Return Types” by Drake Morrison

“Political Violence Is Never Acceptable” by Zvi

“Only Law Can Prevent Extinction” by Eliezer Yudkowsky

“AI Safety’s Biggest Talent Gap Isn’t Researchers. It’s Generalists.” by Topaz, agucova, Alexandra Bates, Parv Mahajan

“Tomas Bjartur: The Last Prodigy” by Linch

“Annoyingly Principled People, and what befalls them” by Raemon

“TAPs or it didn’t happen” by Raemon

“Returns to intelligence” by RobertM

“Daycare illnesses” by Nina Panickssery

“The policy surrounding Mythos marks an irreversible power shift” by sil

“Talk English, Think Something Else” by J Bostock

“Sparse Autoencoders for Single-Cell Models” by Ihor Kendiukhov

“Eggs, rooms, puzzles, and talking about AI” by KatjaGrace

“Morale” by J Bostock

“Your Mom is a Chimera” by michaelwaves

“The Blast Radius Principle” by Martin Sustrik

“How to make good tea” by RobertM

“Catching illicit distributed training operations during an AI pause” by Robi Rahman

[Linkpost] “Scott Alexander gentrified my meetup” by dominicq

“Pausing AI Is the Best Answer to Post-Alignment Problems” by MichaelDickens

“Some thoughts on Nectome’s risk and resilience” by Aurelia

“Chocolate Sloths, Tinder, and Moral Backstops” by J Bostock

“Dario probably doesn’t believe in superintelligence” by RobertM

“The Unintelligibility is Ours: Notes on Chain-of-Thought” by 1a3orn

“If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines” by ryan_greenblatt

“Claude Mythos #2: Cybersecurity and Project Glasswing” by Zvi

“Why Control Creates Conflict, and When to Open Instead” by plex

“Reproducing steering against evaluation awareness in a large open-weight model” by Thomas Read, Bronson Schoen, Joseph Bloom

“Have we already lost? Part 2: Reasons for Doom” by LawrenceC

“Model organisms researchers should check whether high LRs defeat their model organisms” by dx26, Sebastian Prasanna, Alek Westover, Vivek Hebbar, Julian Stastny

“Anthropic did not publish a “risk discussion” of Mythos when required by their RSP” by RobertM

“Claude Mythos: The System Card” by Zvi

“Some takes on UV & cancer” by Steven Byrnes

“AI #163: Mythos Quest” by Zvi

“Help me launch Obsolete: a book aimed at building a new movement for AI reform” by garrison

“Slightly-Super Persuasion Will Do” by Tomás B.

“Have we already lost? Part 1: The Plan in 2024” by LawrenceC

“Do not be surprised if LessWrong gets hacked” by RobertM

“One Week in the Rat Farm” by Philip Harker

“101 Humans of New York on the Risks of AI” by Corm

“Baking tips” by RobertM

“An easy coordination problem?” by KatjaGrace

“Excerpts and Notes on Mythos Model Card” by williawa

“The effects of caffeine consumption do not decay with a ~5 hour half-life” by kman

“You don’t know what you are made of till you’ve been stalked across three countries” by Shoshannah Tekofsky

“Why is Flesh So Weak?” by J Bostock

“The hard part isn’t noticing when papers are bad, it’s deciding what to do afterwards” by LawrenceC

“We can prevent progress! Conceptual clarity, and inspiration from the FDA” by KatjaGrace

“AI as a Trojan horse race” by KatjaGrace

“My unsupervised elicitation challenge” by DanielFilan

“Role-playing vs Self-modelling” by Jan_Kulveit

“Elementary Condensation” by Jan

“Hedging and Survival-Weighted Planning” by Vaniver