LessWrong (Curated & Popular)
Episodes
“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq
10 Jan 2025
Contributed by Lukas
TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activ...
“What o3 Becomes by 2028” by Vladimir_Nesov
09 Jan 2025
Contributed by Lukas
Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces...
“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman
09 Jan 2025
Contributed by Lukas
(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduc...
“How will we update about scheming?” by ryan_greenblatt
08 Jan 2025
Contributed by Lukas
I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, ...
“OpenAI #10: Reflections” by Zvi
08 Jan 2025
Contributed by Lukas
This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There's a bunch of good and interesting answers in the ...
“Maximizing Communication, not Traffic” by jefftk
07 Jan 2025
Contributed by Lukas
As someone who writes for fun, I don't need to get people onto my site: If I write a post and some people are able to get the core ideajust from ...
“What’s the short timeline plan?” by Marius Hobbhahn
02 Jan 2025
Contributed by Lukas
This is a low-effort post. I mostly want to get other people's takes and express concern about the lack of detailed and publicly available plans ...
“Shallow review of technical AI safety, 2024” by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers
30 Dec 2024
Contributed by Lukas
from aisafety.world The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense...
“By default, capital will matter more than ever after AGI” by L Rudolf L
29 Dec 2024
Contributed by Lukas
I've heard many people say something like "money won't matter post-AGI". This has always struck me as odd, and as most likely comp...
“Review: Planecrash” by L Rudolf L
28 Dec 2024
Contributed by Lukas
Take a stereotypical fantasy novel, a textbook on mathematical logic, and Fifty Shades of Grey. Mix them all together and add extra weirdness for spic...
“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth
26 Dec 2024
Contributed by Lukas
A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look...
“When Is Insurance Worth It?” by kqr
23 Dec 2024
Contributed by Lukas
TL;DR: If you want to know whether getting insurance is worth it, use the Kelly Insurance Calculator. If you want to know why or how, read on.Note to ...
“Orienting to 3 year AGI timelines” by Nikola Jurkovic
23 Dec 2024
Contributed by Lukas
My median expectation is that AGI[1] will be created 3 years from now. This has implications on how to behave, and I will share some useful thoughts I...
“What Goes Without Saying” by sarahconstantin
21 Dec 2024
Contributed by Lukas
There are people I can talk to, where all of the following statements are obvious. They go without saying. We can just “be reasonable” together, w...
“o3” by Zach Stein-Perlman
21 Dec 2024
Contributed by Lukas
I'm editing this post.OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons).It gets 25% on FrontierMath, smashing th...
“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit
21 Dec 2024
Contributed by Lukas
I like the research. I mostly trust the results. I dislike the 'Alignment Faking' name and frame, and I'm afraid it will stick and lead...
“AIs Will Increasingly Attempt Shenanigans” by Zvi
19 Dec 2024
Contributed by Lukas
Increasingly, we have seen papers eliciting in AI models various shenanigans.There are a wide variety of scheming behaviors. You’ve got your weight ...
“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck
18 Dec 2024
Contributed by Lukas
What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper...
“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast
15 Dec 2024
Contributed by Lukas
Six months ago, I was a high school English teacher.I wasn’t looking to change careers, even after nineteen sometimes-difficult years. I was good at...
“Biological risk from the mirror world” by jasoncrawford
13 Dec 2024
Contributed by Lukas
A new article in Science Policy Forum voices concern about a particular line of biological research which, if successful in the long term, could event...
“Subskills of ‘Listening to Wisdom’” by Raemon
13 Dec 2024
Contributed by Lukas
A fool learns from their own mistakes The wise learn from the mistakes of others.– Otto von Bismark A problem as old as time: The youth won't ...
“Understanding Shapley Values with Venn Diagrams” by Carson L
13 Dec 2024
Contributed by Lukas
Someone I know, Carson Loughridge, wrote this very nice post explaining the core intuition around Shapley values (which play an important role in imp...
“LessWrong audio: help us choose the new voice” by PeterH
12 Dec 2024
Contributed by Lukas
We make AI narrations of LessWrong posts available via our audio player and podcast feeds.We’re thinking about changing our narrator's voice.Th...
“Understanding Shapley Values with Venn Diagrams” by agucova
11 Dec 2024
Contributed by Lukas
This is a link post. Someone I know wrote this very nice post explaining the core intuition around Shapley values (which play an important role in imp...
“o1: A Technical Primer” by Jesse Hoogland
11 Dec 2024
Contributed by Lukas
TL;DR: In September 2024, OpenAI released o1, its first "reasoning model". This model exhibits remarkable test-time scaling laws, which comp...
“Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout
09 Dec 2024
Contributed by Lukas
We present gradient routing, a way of controlling where learning happens in neural networks. Gradient routing applies masks to limit the flow of gradi...
“Frontier Models are Capable of In-context Scheming” by Marius Hobbhahn, AlexMeinke, Bronson Schoen
06 Dec 2024
Contributed by Lukas
This is a brief summary of what we believe to be the most important takeaways from our new paper and from our findings shown in the o1 system card. We...
“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka
30 Nov 2024
Contributed by Lukas
TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate here, or send me an email, DM or signal message (+1 510 944 3235) if you wa...
“Repeal the Jones Act of 1920” by Zvi
29 Nov 2024
Contributed by Lukas
Balsa Policy Institute chose as its first mission to lay groundwork for the potential repeal, or partial repeal, of section 27 of the Jones Act of 192...
“China Hawks are Manufacturing an AI Arms Race” by garrison
29 Nov 2024
Contributed by Lukas
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, ...
“Information vs Assurance” by johnswentworth
27 Nov 2024
Contributed by Lukas
In contract law, there's this thing called a “representation”. Example: as part of a contract to sell my house, I might “represent that” ...
“You are not too ‘irrational’ to know your preferences.” by DaystarEld
27 Nov 2024
Contributed by Lukas
Epistemic Status: 13 years working as a therapist for a wide variety of populations, 5 of them working with rationalists and EA clients. 7 years teach...
“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi
25 Nov 2024
Contributed by Lukas
[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the con...
“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov
20 Nov 2024
Contributed by Lukas
Audio note: this article contains 33 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text i...
“OpenAI Email Archives” by habryka
19 Nov 2024
Contributed by Lukas
As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman...
“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon
18 Nov 2024
Contributed by Lukas
Epistemic status: Toy model. Oversimplified, but has been anecdotally useful to at least a couple people, and I like it as a metaphor. IntroductionI’...
“Neutrality” by sarahconstantin
17 Nov 2024
Contributed by Lukas
Midjourney, “infinite library”I’ve had post-election thoughts percolating, and the sense that I wanted to synthesize something about this moment...
“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio
16 Nov 2024
Contributed by Lukas
Trump and the Republican party will yield broad governmental control during what will almost certainly be a critical period for AGI development. In th...
“OpenAI Email Archives (from Musk v. Altman)” by habryka
16 Nov 2024
Contributed by Lukas
As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman...
“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub
15 Nov 2024
Contributed by Lukas
Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback.Following up on our recent “Sabotage Evaluations...
“The Online Sports Gambling Experiment Has Failed” by Zvi
12 Nov 2024
Contributed by Lukas
Related: Book Review: On the Edge: The GamblersI have previously been heavily involved in sports betting. That world was very good to me. The times we...
“o1 is a bad idea” by abramdemski
12 Nov 2024
Contributed by Lukas
This post comes a bit late with respect to the news cycle, but I argued in a recent interview that o1 is an unfortunate twist on LLM technologies, mak...
“Current safety training techniques do not fully transfer to the agent setting” by Simon Lermen, Govind Pimpale
09 Nov 2024
Contributed by Lukas
TL;DR: I'm presenting three recent papers which all share a similar finding, i.e. the safety training techniques for chat models don’t transfer...
“Explore More: A Bag of Tricks to Keep Your Life on the Rails” by Shoshannah Tekofsky
04 Nov 2024
Contributed by Lukas
At least, if you happen to be near me in brain space.What advice would you give your younger self?That was the prompt for a class I taught at PAIR 202...
“Survival without dignity” by L Rudolf L
04 Nov 2024
Contributed by Lukas
I open my eyes and find myself lying on a bed in a hospital room. I blink."Hello", says a middle-aged man with glasses, sitting on a chair b...
“The Median Researcher Problem” by johnswentworth
04 Nov 2024
Contributed by Lukas
Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median resear...
“The Compendium, A full argument about extinction risk from AGI” by adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell, Andrea_Miotti
01 Nov 2024
Contributed by Lukas
This is a link post.We (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi) have just published The Compendium, which brings toge...
“What TMS is like” by Sable
31 Oct 2024
Contributed by Lukas
There are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter.TMS stands for Transcranial Magnetic Stimulatio...
“The hostile telepaths problem” by Valentine
28 Oct 2024
Contributed by Lukas
Epistemic status: model-building based on observation, with a few successful unusual predictions. Anecdotal evidence has so far been consistent with t...
“A bird’s eye view of ARC’s research” by Jacob_Hilton
27 Oct 2024
Contributed by Lukas
This post includes a "flattened version" of an interactive diagram that cannot be displayed on this site. I recommend reading the original v...
“A Rocket–Interpretability Analogy” by plex
25 Oct 2024
Contributed by Lukas
1. 4.4% of the US federal budget went into the space race at its peak.This was surprising to me, until a friend pointed out that landing rockets o...
“I got dysentery so you don’t have to” by eukaryote
24 Oct 2024
Contributed by Lukas
This summer, I participated in a human challenge trial at the University of Maryland. I spent the days just prior to my 30th birthday sick with shigel...
“Overcoming Bias Anthology” by Arjun Panickssery
23 Oct 2024
Contributed by Lukas
This is a link post. Part 1: Our Thinking Near and Far1 Abstract/Distant Future Bias2 Abstractly Ideal, Concretely Selfish3 We Add Near, Average Far4 ...
“Arithmetic is an underrated world-modeling technology” by dynomight
22 Oct 2024
Contributed by Lukas
Of all the cognitive tools our ancestors left us, what's best? Society seems to think pretty highly of arithmetic. It's one of the first thi...
“My theory of change for working in AI healthtech” by Andrew_Critch
15 Oct 2024
Contributed by Lukas
This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to day, I'm more focussed on the positive ...
“Why I’m not a Bayesian” by Richard_Ngo
15 Oct 2024
Contributed by Lukas
This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then ...
“The AGI Entente Delusion” by Max Tegmark
14 Oct 2024
Contributed by Lukas
As humanity gets closer to Artificial General Intelligence (AGI), a new geopolitical strategy is gaining traction in US and allied circles, in the Nat...
“Momentum of Light in Glass” by Ben
14 Oct 2024
Contributed by Lukas
I think that most people underestimate how many scientific mysteries remain, even on questions that sound basic.My favourite candidate for "the m...
“Overview of strong human intelligence amplification methods” by TsviBT
09 Oct 2024
Contributed by Lukas
How can we make many humans who are very good at solving difficult problems? Summary (table of made-up numbers)I made up the made-up numbers in this t...
“Struggling like a Shadowmoth” by Raemon
03 Oct 2024
Contributed by Lukas
This post is probably hazardous for one type of person in one particular growth stage, and necessary for people in a different growth stage, and I don...
“Three Subtle Examples of Data Leakage” by abstractapplic
03 Oct 2024
Contributed by Lukas
This is a description of my work on some data science projects, lightly obfuscated and fictionalized to protect the confidentiality of the organizatio...
“the case for CoT unfaithfulness is overstated” by nostalgebraist
30 Sep 2024
Contributed by Lukas
[Meta note: quickly written, unpolished. Also, it's possible that there's some more convincing work on this topic that I'm unaware of –...
“Cryonics is free” by Mati_Roy
30 Sep 2024
Contributed by Lukas
I've been wanting to write a nice post for a few months, but should probably just write a one sooner instead. This is a top-level post not becaus...
“Stanislav Petrov Quarterly Performance Review” by Ricki Heicklen
29 Sep 2024
Contributed by Lukas
Quarterly Performance Review, Autumn 1983Colonel Yuri Kuznetsov looked out the window anxiously. The endless gray landscape did little to soothe his ...
“Laziness death spirals” by PatrickDFarley
29 Sep 2024
Contributed by Lukas
I’ve claimed that Willpower compounds and that small wins in the present make it easier to get bigger wins in the future. Unfortunately, procrastina...
“‘Slow’ takeoff is a terrible term for ‘maybe even faster takeoff, actually’” by Raemon
29 Sep 2024
Contributed by Lukas
For a long time, when I heard "slow takeoff", I assumed it meant "takeoff that takes longer calendar time than fast takeoff." (i.e...
“ASIs will not leave just a little sunlight for Earth ” by Eliezer Yudkowsky
23 Sep 2024
Contributed by Lukas
A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just be...
“Skills from a year of Purposeful Rationality Practice ” by Raemon
21 Sep 2024
Contributed by Lukas
A year ago, I started trying to deliberate practice skills that would "help people figure out the answers to confusing, important questions."...
“How I started believing religion might actually matter for rationality and moral philosophy ” by zhukeepa
19 Sep 2024
Contributed by Lukas
After the release of Ben Pace's extended interview with me about my views on religion, I felt inspired to publish more of my thinking about relig...
“Did Christopher Hitchens change his mind about waterboarding? ” by Isaac King
17 Sep 2024
Contributed by Lukas
There's a popular story that goes like this: Christopher Hitchens used to be in favor of the US waterboarding terrorists because he though it&apo...
“The Great Data Integration Schlep ” by sarahconstantin
15 Sep 2024
Contributed by Lukas
Midjourney, “Fourth Industrial Revolution Digital Transformation”This is a little rant I like to give, because it's something I learned on th...
“Contra papers claiming superhuman AI forecasting ” by nikos, Peter Mühlbacher, Lawrence Phillips, dschwarz
14 Sep 2024
Contributed by Lukas
[Conflict of interest disclaimer: We are FutureSearch, a company working on AI-powered forecasting and other types of quantitative reasoning. If thin ...
“OpenAI o1 ” by Zach Stein-Perlman
13 Sep 2024
Contributed by Lukas
This is a link post. --- First published: September 12th, 2024 Source: https://www.lesswrong.com/posts/bhY5aE...
“The Best Lay Argument is not a Simple English Yud Essay ” by J Bostock
11 Sep 2024
Contributed by Lukas
Epistemic status: these are my own opinions on AI risk communication, based primarily on my own instincts on the subject and discussions with people l...
“My Number 1 Epistemology Book Recommendation: Inventing Temperature ” by adamShimi
10 Sep 2024
Contributed by Lukas
In my last post, I wrote that no resource out there exactly captured my model of epistemology, which is why I wanted to share a half-baked version of ...
“That Alien Message - The Animation ” by Writer
09 Sep 2024
Contributed by Lukas
Our new video is an adaptation of That Alien Message, by @Eliezer Yudkowsky. This time, the text has been significantly adapted, so I include it below...
“Pay Risk Evaluators in Cash, Not Equity ” by Adam Scholl
07 Sep 2024
Contributed by Lukas
Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute bas...
“Survey: How Do Elite Chinese Students Feel About the Risks of AI? ” by Nick Corvino
07 Sep 2024
Contributed by Lukas
IntroIn April 2024, my colleague and I (both affiliated with Peking University) conducted a survey involving 510 students from Tsinghua University and...
“things that confuse me about the current AI market. ” by DMMF
02 Sep 2024
Contributed by Lukas
Paging Gwern or anyone else who can shed light on the current state of the AI market—I have several questions.Since the release of ChatGPT, at least...
“Nursing doubts ” by dynomight
01 Sep 2024
Contributed by Lukas
If you ask the internet if breastfeeding is good, you will soon learn that YOU MUST BREASTFEED because BREAST MILK = OPTIMAL FOOD FOR BABY. But if you...
“Principles for the AGI Race ” by William_S
31 Aug 2024
Contributed by Lukas
Crossposted from https://williamrsaunders.substack.com/p/principles-for-the-agi-race Why form principles for the AGI Race?I worked at OpenAI for 3 yea...
“The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it ” by Martín Soto
29 Aug 2024
Contributed by Lukas
Two new The Information articles with insider information on OpenAI's next models and moves.They are paywalled, but here are the new bits of info...
“What is it to solve the alignment problem? ” by Joe Carlsmith
28 Aug 2024
Contributed by Lukas
People often talk about “solving the alignment problem.” But what is it to do such a thing? I wanted to clarify my thinking about this topic, so I...
“Limitations on Formal Verification for AI Safety ” by Andrew Dickson
27 Aug 2024
Contributed by Lukas
In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of co...
“Would catching your AIs trying to escape convince AI developers to slow down or undeploy? ” by Buck
27 Aug 2024
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.I often talk to people who think that if frontier models were eg...
“Liability regimes for AI ” by Ege Erdil
23 Aug 2024
Contributed by Lukas
For many products, we face a choice of who to hold liable for harms that would not have occurred if not for the existence of the product. For instance...
“AGI Safety and Alignment at Google DeepMind:A Summary of Recent Work ” by Rohin Shah, Seb Farquhar, Anca Dragan
21 Aug 2024
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.We wanted to share a recap of our recent outputs with the AF com...
“Fields that I reference when thinking about AI takeover prevention” by Buck
15 Aug 2024
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.Is AI takeover like a nuclear meltdown? A co...
“WTH is Cerebrolysin, actually?” by gsfitzgerald, delton137
13 Aug 2024
Contributed by Lukas
[This article was originally published on Dan Elton's blog, More is Different.]Cerebrolysin is an unregulated medical product made from enzymatic...
“You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex
10 Aug 2024
Contributed by Lukas
This work was produced at Apollo Research, based on initial research done at MATS.LayerNorm is annoying for mechanstic interpretability research (“[...
“Leaving MIRI, Seeking Funding” by abramdemski
09 Aug 2024
Contributed by Lukas
This is slightly old news at this point, but: as part of MIRI's recent strategy pivot, they've eliminated the Agent Foundations research tea...
“How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage” by orthonormal
08 Aug 2024
Contributed by Lukas
This is a story about a flawed Manifold market, about how easy it is to buy significant objective-sounding publicity for your preferred politics, and ...
“This is already your second chance” by Malmesbury
07 Aug 2024
Contributed by Lukas
Cross-posted from Substack. 1.And the sky opened, and from the celestial firmament descended a cube of ivory the size of a skyscraper, lifted by ten t...
“0. CAST: Corrigibility as Singular Target” by Max Harms
07 Aug 2024
Contributed by Lukas
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my car...
“Self-Other Overlap: A Neglected Approach to AI Alignment” by Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena
07 Aug 2024
Contributed by Lukas
Figure 1. Image generated by DALL-3 to represent the concept of self-other overlapMany thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, ...
“You don’t know how bad most things are nor precisely how they’re bad.” by Solenoid_Entity
07 Aug 2024
Contributed by Lukas
TL;DR: Your discernment in a subject often improves as you dedicate time and attention to that subject. The space of possible subjects is huge, so on ...
“Recommendation: reports on the search for missing hiker Bill Ewasko” by eukaryote
07 Aug 2024
Contributed by Lukas
This is a link post.Content warning: About an IRL death.Today's post isn’t so much an essay as a recommendation for two bodies of work on the s...
“The ‘strong’ feature hypothesis could be wrong” by lsgos
07 Aug 2024
Contributed by Lukas
NB. I am on the Google Deepmind language model interpretability team. But the arguments/views in this post are my own, and shouldn't be read as a...
“‘AI achieves silver-medal standard solving International Mathematical Olympiad problems’” by gjm
30 Jul 2024
Contributed by Lukas
This is a link post.Google DeepMind reports on a system for solving mathematical problems that allegedly is able to give complete solutions to four of...
“Decomposing Agency — capabilities without desires” by owencb, Raymond D
29 Jul 2024
Contributed by Lukas
This is a link post.What is an agent? It's a slippery concept with no commonly accepted formal definition, but informally the concept seems to be...