LessWrong (Curated & Popular)
Episodes
Introducing Alignment Stress-Testing at Anthropic
14 Jan 2024
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Following on from our recent paper, “Sleeper Agents: Training ...
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
13 Jan 2024
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a linkpost for https://arxiv.org/abs/2401.05566I'm ...
[HUMAN VOICE] "Meaning & Agency" by Abram Demski
07 Jan 2024
Contributed by Lukas
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThe goal of this post is to clarify a few concepts relatin...
What’s up with LLMs representing XORs of arbitrary features?
07 Jan 2024
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Thanks to Clément Dumas, Nikola Jurković, Nora Belrose, Arthur...
Gentleness and the artificial Other
05 Jan 2024
Contributed by Lukas
(Cross-posted from my website. Audio version here, or search "Joe Carlsmith Audio" on your podcast app.This is the first essay in a series t...
MIRI 2024 Mission and Strategy Update
05 Jan 2024
Contributed by Lukas
As we announced back in October, I have taken on the senior leadership role at MIRI as its CEO. It's a big pair of shoes to fill, and an awesome ...
The Plan - 2023 Version
04 Jan 2024
Contributed by Lukas
Background: The Plan, The Plan: 2022 Update. If you haven’t read those, don’t worry, we’re going to go through things from the top this year, an...
Apologizing is a Core Rationalist Skill
03 Jan 2024
Contributed by Lukas
In certain circumstances, apologizing can also be a countersignalling power-move, i.e. “I am so high status that I can grovel a bit without anybody ...
[HUMAN VOICE] "A case for AI alignment being difficult" by jessicata
02 Jan 2024
Contributed by Lukas
This is a linkpost for https://unstableontology.com/2023/12/31/a-case-for-ai-alignment-being-difficult/Support ongoing human narrations of LessWrong&a...
The Dark Arts
01 Jan 2024
Contributed by Lukas
lsusrIt is my understanding that you won all of your public forum debates this year. That's very impressive. I thought it would be interesting to...
Critical review of Christiano’s disagreements with Yudkowsky
28 Dec 2023
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a review of Paul Christiano's article "where I...
Most People Don’t Realize We Have No Idea How Our AIs Work
27 Dec 2023
Contributed by Lukas
This point feels fairly obvious, yet seems worth stating explicitly.Those of us familiar with the field of AI after the deep-learning revolution know ...
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
26 Dec 2023
Contributed by Lukas
TL;DR: Contrast-consistent search (CCS) seemed exciting to us and we were keen to apply it. At this point, we think it is unlikely to be directly help...
Succession
24 Dec 2023
Contributed by Lukas
This is a linkpost for https://www.narrativeark.xyz/p/succession“A table beside the evening sea where you sit shelling pistachios, flicking the next...
Nonlinear’s Evidence: Debunking False and Misleading Claims
21 Dec 2023
Contributed by Lukas
Recently, Ben Pace wrote a well-intentioned blog post mostly based on complaints from 2 (of 21) Nonlinear employees who 1) wanted more money, 2) felt...
Effective Aspersions: How the Nonlinear Investigation Went Wrong
20 Dec 2023
Contributed by Lukas
The New York Times Picture a scene: the New York Times is releasing an article on Effective Altruism (EA) with an express goal to dig up every piece ...
Constellations are Younger than Continents
20 Dec 2023
Contributed by Lukas
At the Bay Area Solstice, I heard the song Bold Orion for the first time. I like it a lot. It does, however, have one problem:He has seen the rise and...
The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda
19 Dec 2023
Contributed by Lukas
Many thanks to Samuel Hammond, Cate Hall, Beren Millidge, Steve Byrnes, Lucius Bushnaq, Joar Skalse, Kyle Gracey, Gunnar Zarncke, Ross Nordby, David L...
“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
18 Dec 2023
Contributed by Lukas
When discussing AGI Risk, people often talk about it in terms of a war between humanity and an AGI. Comparisons between the amounts of resources at bo...
Is being sexy for your homies?
17 Dec 2023
Contributed by Lukas
Epistemic status: Speculation. An unholy union of evo psych, introspection, random stuff I happen to observe & hear about, and thinking. Done on a...
[HUMAN VOICE] "Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible" by Gene Smith and Kman
17 Dec 2023
Contributed by Lukas
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedTL;DR versionIn the course of my life, there have been a h...
[HUMAN VOICE] "Moral Reality Check (a short story)" by jessicata
15 Dec 2023
Contributed by Lukas
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThis is a linkpost for https://unstableontology.com/2023/1...
AI Control: Improving Safety Despite Intentional Subversion
15 Dec 2023
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.We’ve released a paper, AI Control: Improving Safety Despite I...
2023 Unofficial LessWrong Census/Survey
13 Dec 2023
Contributed by Lukas
The Less Wrong General Census is unofficially here! You can take it at this link.It's that time again.If you are reading this post and identify a...
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
13 Dec 2023
Contributed by Lukas
If you are interested in the longevity scene, like I am, you probably have seen press releases about the dog longevity company, Loyal for Dogs, gettin...
[HUMAN VOICE] "What are the results of more parental supervision and less outdoor play?" by Julia Wise
13 Dec 2023
Contributed by Lukas
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedCrossposted from OtherwiseParents supervise their children...
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
12 Dec 2023
Contributed by Lukas
In the course of my life, there have been a handful of times I discovered an idea that changed the way I thought about the world. The first occurred w...
re: Yudkowsky on biological materials
11 Dec 2023
Contributed by Lukas
I was asked to respond to this comment by Eliezer Yudkowsky. This post is partly redundant with my previous post.Why is flesh weaker than diamond?When...
Speaking to Congressional staffers about AI risk
05 Dec 2023
Contributed by Lukas
In May and June of 2023, I (Akash) had about 50-70 meetings about AI risks with congressional staffers. I had been meaning to write a post reflecting ...
[HUMAN VOICE] "Shallow review of live agendas in alignment & safety" by technicalities & Stag
04 Dec 2023
Contributed by Lukas
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedYou can’t optimise an allocation of resources if you don...
Thoughts on “AI is easy to control” by Pope & Belrose
02 Dec 2023
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Quintin Pope & Nora Belrose have a new “AI Optimists” we...
The 101 Space You Will Always Have With You
30 Nov 2023
Contributed by Lukas
Any community which ever adds new people will need to either routinely teach the new and (to established members) blindingly obvious information to th...
[HUMAN VOICE] "Social Dark Matter" by Duncan Sabien
28 Nov 2023
Contributed by Lukas
The author's Substack:https://substack.com/@homosabiensSupport ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCurat...
Shallow review of live agendas in alignment & safety
28 Nov 2023
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Summary.You can’t optimise an allocation of resources if you d...
Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
25 Nov 2023
Contributed by Lukas
Status: Vague, sorry. The point seems almost tautological to me, and yet also seems like the correct answer to the people going around saying “LLMs ...
[HUMAN VOICE] "The 6D effect: When companies take risks, one email can be very powerful." by scasper
23 Nov 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedRecently, I have been learning about industry norms, legal discovery procee...
OpenAI: The Battle of the Board
22 Nov 2023
Contributed by Lukas
Previously: OpenAI: Facts from a Weekend. On Friday afternoon, OpenAI's board fired CEO Sam Altman. Overnight, an agreement in principle was reac...
OpenAI: Facts from a Weekend
20 Nov 2023
Contributed by Lukas
Approximately four GPTs and seven years ago, OpenAI's founders brought forth on this corporate landscape a new entity, conceived in liberty, and ...
Sam Altman fired from OpenAI
18 Nov 2023
Contributed by Lukas
This is a linkpost for https://openai.com/blog/openai-announces-leadership-transitionBasically just the title, see the OAI blog post for more details....
Social Dark Matter
17 Nov 2023
Contributed by Lukas
You know it must be out there, but you mostly never see it.Author's Note 1: I'm something like 75% confident that this will be the last essa...
[HUMAN VOICE] "Thinking By The Clock" by Screwtape
17 Nov 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedI'm sure Harry Potter and the Methods of Rationality taught me some of...
"You can just spontaneously call people you haven't met in years" by lc
17 Nov 2023
Contributed by Lukas
Here's a recent conversation I had with a friend:Me: "I wish I had more friends. You guys are great, but I only get to hang out with you lik...
[HUMAN VOICE] "AI Timelines" by habryka, Daniel Kokotajlo, Ajeya Cotra, Ege Erdil
17 Nov 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedHow many years will pass before transformative AI is built? Three people wh...
"EA orgs' legal structure inhibits risk taking and information sharing on the margin" by Elizabeth
17 Nov 2023
Contributed by Lukas
It’s fairly common for EA orgs to provide fiscal sponsorship to other EA orgs. Wait, no, that sentence is not quite right. The more accurate sente...
"Integrity in AI Governance and Advocacy" by habryka, Olivia Jimenez
17 Nov 2023
Contributed by Lukas
habrykaOk, so we both had some feelings about the recent Conjecture post on "lots of people in AI Alignment are lying", and the associated m...
Loudly Give Up, Don’t Quietly Fade
16 Nov 2023
Contributed by Lukas
1.There's a supercharged, dire wolf form of the bystander effect that I’d like to shine a spotlight on.First, a quick recap. The Bystander Effe...
"Does davidad's uploading moonshot work?" by jacobjabob et al.
09 Nov 2023
Contributed by Lukas
davidad has a 10-min talk out on a proposal about which he says: “the first time I’ve seen a concrete plan that might work to get human uploads be...
[HUMAN VOICE] "Deception Chess: Game #1" by Zane et al.
09 Nov 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCurated(You can sign up to play deception chess here if you haven't already.)...
[HUMAN VOICE] "Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
09 Nov 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedThis is a linkpost for https://transformer-circuits.pub/2023/monosemantic-f...
"The 6D effect: When companies take risks, one email can be very powerful." by scasper
09 Nov 2023
Contributed by Lukas
Recently, I have been learning about industry norms, legal discovery proceedings, and incentive structures related to companies building risky systems...
"The other side of the tidal wave" by Katja Grace
09 Nov 2023
Contributed by Lukas
I guess there’s maybe a 10-20% chance of AI causing human extinction in the coming decades, but I feel more distressed about it than even that sugge...
"Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk" by 1a3orn
09 Nov 2023
Contributed by Lukas
I examined all the biorisk-relevant citations from a policy paper arguing that we should ban powerful open source LLMs.None of them provide good evide...
"My thoughts on the social response to AI risk" by Matthew Barnett
09 Nov 2023
Contributed by Lukas
A common theme implicit in many AI risk stories has been that broader society will either fail to anticipate the risks of AI until it is too late, or ...
Comp Sci in 2027 (Short story by Eliezer Yudkowsky)
09 Nov 2023
Contributed by Lukas
This is a linkpost for https://nitter.net/ESYudkowsky/status/1718654143110512741Comp sci in 2017:Student: I get the feeling the compiler is just ign...
"Thoughts on the AI Safety Summit company policy requests and responses" by So8res
03 Nov 2023
Contributed by Lukas
Over the next two days, the UK government is hosting an AI Safety Summit focused on “the safe and responsible development of frontier AI”. They re...
"President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence" by Tristan Williams
03 Nov 2023
Contributed by Lukas
This is a linkpost for https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-sa...
[Human Voice] "Book Review: Going Infinite" by Zvi
31 Oct 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedPreviously: Sadly, FTXI doubted whether it would be a good use of time to r...
"Thoughts on responsible scaling policies and regulation" by Paul Christiano
30 Oct 2023
Contributed by Lukas
I am excited about AI developers implementing responsible scaling policies; I’ve recently been spending time refining this idea and advocating for i...
"We're Not Ready: thoughts on "pausing" and responsible scaling policies" by Holden Karnofsky
30 Oct 2023
Contributed by Lukas
Views are my own, not Open Philanthropy’s. I am married to the President of Anthropic and have a financial interest in both Anthropic and OpenAI via...
"At 87, Pearl is still able to change his mind" by rotatingpaguro
30 Oct 2023
Contributed by Lukas
Judea Pearl is a famous researcher, known for Bayesian networks (the standard way of representing Bayesian models), and his statistical formalization ...
"Architects of Our Own Demise: We Should Stop Developing AI" by Roko
30 Oct 2023
Contributed by Lukas
Some brief thoughts at a difficult time in the AI risk debate.Imagine you go back in time to the year 1999 and tell people that in 24 years time, huma...
"AI as a science, and three obstacles to alignment strategies" by Nate Soares
30 Oct 2023
Contributed by Lukas
AI used to be a science. In the old days (back when AI didn't work very well), people were attempting to develop a working theory of cognition.Th...
"Announcing Timaeus" by Jesse Hoogland et al.
30 Oct 2023
Contributed by Lukas
Timaeus is a new AI safety research organization dedicated to making fundamental breakthroughs in technical AI alignment using deep ideas from mathema...
[HUMAN VOICE] "Alignment Implications of LLM Successes: a Debate in One Act" by Zack M Davis
23 Oct 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedDoomimir: Humanity has made no progress on the alignment problem. Not only ...
"Holly Elmore and Rob Miles dialogue on AI Safety Advocacy" by jacobjacob, Robert Miles & Holly_Elmore
23 Oct 2023
Contributed by Lukas
Holly is an independent AI Pause organizer, which includes organizing protests (like this upcoming one). Rob is an AI Safety YouTuber. I (jacobjacob) ...
"LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B" by Simon Lermen & Jeffrey Ladish.
23 Oct 2023
Contributed by Lukas
Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, under the mentorship of Jeffrey Ladish. TL;DR LoRA fine-tunin...
"Labs should be explicit about why they are building AGI" by Peter Barnett
19 Oct 2023
Contributed by Lukas
Three of the big AI labs say that they care about alignment and that they think misaligned AI poses a potentially existential threat to humanity. Thes...
[HUMAN VOICE] "Sum-threshold attacks" by TsviBT
18 Oct 2023
Contributed by Lukas
Support ongoing human narrations of curated posts:www.patreon.com/LWCuratedHow do you affect something far away, a lot, without anyone noticing?(Note:...
"Will no one rid me of this turbulent pest?" by Metacelsus
18 Oct 2023
Contributed by Lukas
Last year, I wrote about the promise of gene drives to wipe out mosquito species and end malaria.In the time since my previous writing, gene drives ha...
[HUMAN VOICE] "Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
15 Oct 2023
Contributed by Lukas
Patreon to support human narration. (Narrations will remain freely available on this feed, but you can optionally support them if you'd like me t...
"RSPs are pauses done right" by evhub
15 Oct 2023
Contributed by Lukas
COI: I am a research scientist at Anthropic, where I work on model organisms of misalignment; I was also involved in the drafting process for Anthropi...
"Comparing Anthropic's Dictionary Learning to Ours" by Robert_AIZI
15 Oct 2023
Contributed by Lukas
Readers may have noticed many similarities between Anthropic's recent publication Towards Monosemanticity: Decomposing Language Models With Dicti...
"Announcing MIRI’s new CEO and leadership team" by Gretta Duleba
15 Oct 2023
Contributed by Lukas
In 2023, MIRI has shifted focus in the direction of broad public communication—see, for example, our recent TED talk, our piece in TIME magazine “...
"Cohabitive Games so Far" by mako yass
15 Oct 2023
Contributed by Lukas
A cohabitive game[1] is a partially cooperative, partially competitive multiplayer game that provides an anarchic dojo for development in applied coop...
"Announcing Dialogues" by Ben Pace
09 Oct 2023
Contributed by Lukas
As of today, everyone is able to create a new type of content on LessWrong: Dialogues.In contrast with posts, which are for monologues, and comment se...
"Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn" by Zvi
09 Oct 2023
Contributed by Lukas
Response to: Evolution Provides No Evidence For the Sharp Left Turn, due to it winning first prize in The Open Philanthropy Worldviews contest. Quint...
"Evaluating the historical value misspecification argument" by Matthew Barnett
09 Oct 2023
Contributed by Lukas
ETA: I'm not saying that MIRI thought AIs wouldn't understand human values. If there's only one thing you take away from this post, ple...
"Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
09 Oct 2023
Contributed by Lukas
Neural networks are trained on data, not programmed to follow rules. We understand the math of the trained network exactly – each neuron in a neural...
"Thomas Kwa's MIRI research experience" by Thomas Kwa and others
06 Oct 2023
Contributed by Lukas
Moderator note: the following is a dialogue using LessWrong’s new dialogue feature. The exchange is not completed: new replies might be added contin...
"'Diamondoid bacteria' nanobots: deadly threat or dead-end? A nanotech investigation" by titotal
03 Oct 2023
Contributed by Lukas
A lot of people are highly concerned that a malevolent AI or insane human will, in the near future, set out to destroy humanity. If such an entity wan...
"The Lighthaven Campus is open for bookings" by Habryka
03 Oct 2023
Contributed by Lukas
Lightcone Infrastructure (the organization that grew from and houses the LessWrong team) has just finished renovating a 7-building physical campus tha...
"How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" by Jan Brauner et al.
03 Oct 2023
Contributed by Lukas
Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrab...
"EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem" by Elizabeth
03 Oct 2023
Contributed by Lukas
Effective altruism prides itself on truthseeking. That pride is justified in the sense that EA is better at truthseeking than most members of its refe...
"The King and the Golem" by Richard Ngo
29 Sep 2023
Contributed by Lukas
This is a linkpost for https://narrativeark.substack.com/p/the-king-and-the-golemLong ago there was a mighty king who had everything in the world that...
"Sparse Autoencoders Find Highly Interpretable Directions in Language Models" by Logan Riggs et al
27 Sep 2023
Contributed by Lukas
This is a linkpost for Sparse Autoencoders Find Highly Interpretable Directions in Language ModelsWe use a scalable and unsupervised method called Spa...
"Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
26 Sep 2023
Contributed by Lukas
Epistemic status: model which I find sometimes useful, and which emphasizes some true things about many parts of the world which common alternative mo...
"There should be more AI safety orgs" by Marius Hobbhahn
25 Sep 2023
Contributed by Lukas
I’m writing this in my own capacity. The views expressed are my own, and should not be taken to represent the views of Apollo Research or any other ...
"The Talk: a brief explanation of sexual dimorphism" by Malmesbury
22 Sep 2023
Contributed by Lukas
Cross-posted from substack."Everything in the world is about sex, except sex. Sex is about clonal interference."– Oscar Wilde (kind of)As ...
"A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX" by jacobjacob
20 Sep 2023
Contributed by Lukas
Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you y...
"AI presidents discuss AI alignment agendas" by TurnTrout & Garrett Baker
19 Sep 2023
Contributed by Lukas
This is a linkpost for https://www.youtube.com/watch?v=02kbWY5mahQNone of the presidents fully represent my (TurnTrout's) views.TurnTrout wrote t...
"UDT shows that decision theory is more puzzling than ever" by Wei Dai
18 Sep 2023
Contributed by Lukas
I feel like MIRI perhaps mispositioned FDT (their variant of UDT) as a clear advancement in decision theory, whereas maybe they could have attracted m...
"Sum-threshold attacks" by TsviBT
11 Sep 2023
Contributed by Lukas
How do you affect something far away, a lot, without anyone noticing?(Note: you can safely skip sections. It is also safe to skip the essay entirely, ...
"A list of core AI safety problems and how I hope to solve them" by Davidad
09 Sep 2023
Contributed by Lukas
Context: I sometimes find myself referring back to this tweet and wanted to give it a more permanent home. While I'm at it, I thought I would try...
"Report on Frontier Model Training" by Yafah Edelman
09 Sep 2023
Contributed by Lukas
This is a linkpost for https://docs.google.com/document/d/1TsYkDYtV6BKiCN9PAOirRAy3TrNDu2XncUZ5UZfaAKA/edit?usp=sharingUnderstanding what drives the r...
"One Minute Every Moment" by abramdemski
08 Sep 2023
Contributed by Lukas
About how much information are we keeping in working memory at a given moment?"Miller's Law" dictates that the number of things humans ...
"Sharing Information About Nonlinear" by Ben Pace
08 Sep 2023
Contributed by Lukas
Added (11th Sept): Nonlinear have commented that they intend to write a response, have written a short follow-up, and claim that they dispute 85 claim...
"Defunding My Mistake" by ymeskhout
08 Sep 2023
Contributed by Lukas
Until about five years ago, I unironically parroted the slogan All Cops Are Bastards (ACAB) and earnestly advocated to abolish the police and prison s...
"What I would do if I wasn’t at ARC Evals" by LawrenceC
08 Sep 2023
Contributed by Lukas
In which: I list 9 projects that I would work on if I wasn’t busy working on safety standards at ARC Evals, and explain why they might be good to wo...
"Meta Questions about Metaphilosophy" by Wei Dai
04 Sep 2023
Contributed by Lukas
To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age th...
"The U.S. is becoming less stable" by lc
04 Sep 2023
Contributed by Lukas
We focus so much on arguing over who is at fault in this country that I think sometimes we fail to alert on what's actually happening. I would ju...