LessWrong (Curated & Popular)

“Cryonics is free” by Mati_Roy

30 Sep 2024

Contributed by Lukas

I've been wanting to write a nice post for a few months, but should probably just write a one sooner instead. This is a top-level post not becaus...

“Stanislav Petrov Quarterly Performance Review” by Ricki Heicklen

29 Sep 2024

Contributed by Lukas

Quarterly Performance Review, Autumn 1983Colonel Yuri Kuznetsov looked out the window anxiously. The endless gray landscape did little to soothe his ...

“Laziness death spirals” by PatrickDFarley

29 Sep 2024

Contributed by Lukas

I’ve claimed that Willpower compounds and that small wins in the present make it easier to get bigger wins in the future. Unfortunately, procrastina...

“‘Slow’ takeoff is a terrible term for ‘maybe even faster takeoff, actually’” by Raemon

29 Sep 2024

Contributed by Lukas

For a long time, when I heard "slow takeoff", I assumed it meant "takeoff that takes longer calendar time than fast takeoff." (i.e...

“ASIs will not leave just a little sunlight for Earth ” by Eliezer Yudkowsky

23 Sep 2024

Contributed by Lukas

A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just be...

“Skills from a year of Purposeful Rationality Practice ” by Raemon

21 Sep 2024

Contributed by Lukas

A year ago, I started trying to deliberate practice skills that would "help people figure out the answers to confusing, important questions.&quot...

“How I started believing religion might actually matter for rationality and moral philosophy ” by zhukeepa

19 Sep 2024

Contributed by Lukas

After the release of Ben Pace's extended interview with me about my views on religion, I felt inspired to publish more of my thinking about relig...

“Did Christopher Hitchens change his mind about waterboarding? ” by Isaac King

17 Sep 2024

Contributed by Lukas

There's a popular story that goes like this: Christopher Hitchens used to be in favor of the US waterboarding terrorists because he though it&apo...

“The Great Data Integration Schlep ” by sarahconstantin

15 Sep 2024

Contributed by Lukas

Midjourney, “Fourth Industrial Revolution Digital Transformation”This is a little rant I like to give, because it's something I learned on th...

“Contra papers claiming superhuman AI forecasting ” by nikos, Peter Mühlbacher, Lawrence Phillips, dschwarz

14 Sep 2024

Contributed by Lukas

[Conflict of interest disclaimer: We are FutureSearch, a company working on AI-powered forecasting and other types of quantitative reasoning. If thin ...

“OpenAI o1 ” by Zach Stein-Perlman

13 Sep 2024

Contributed by Lukas

This is a link post. --- First published: September 12th, 2024 Source: https://www.lesswrong.com/posts/bhY5aE...

“The Best Lay Argument is not a Simple English Yud Essay ” by J Bostock

11 Sep 2024

Contributed by Lukas

Epistemic status: these are my own opinions on AI risk communication, based primarily on my own instincts on the subject and discussions with people l...

“My Number 1 Epistemology Book Recommendation: Inventing Temperature ” by adamShimi

10 Sep 2024

Contributed by Lukas

In my last post, I wrote that no resource out there exactly captured my model of epistemology, which is why I wanted to share a half-baked version of ...

“That Alien Message - The Animation ” by Writer

09 Sep 2024

Contributed by Lukas

Our new video is an adaptation of That Alien Message, by @Eliezer Yudkowsky. This time, the text has been significantly adapted, so I include it below...

“Pay Risk Evaluators in Cash, Not Equity ” by Adam Scholl

07 Sep 2024

Contributed by Lukas

Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute bas...

“Survey: How Do Elite Chinese Students Feel About the Risks of AI? ” by Nick Corvino

07 Sep 2024

Contributed by Lukas

IntroIn April 2024, my colleague and I (both affiliated with Peking University) conducted a survey involving 510 students from Tsinghua University and...

“things that confuse me about the current AI market. ” by DMMF

02 Sep 2024

Contributed by Lukas

Paging Gwern or anyone else who can shed light on the current state of the AI market—I have several questions.Since the release of ChatGPT, at least...

“Nursing doubts ” by dynomight

01 Sep 2024

Contributed by Lukas

If you ask the internet if breastfeeding is good, you will soon learn that YOU MUST BREASTFEED because BREAST MILK = OPTIMAL FOOD FOR BABY. But if you...

“Principles for the AGI Race ” by William_S

31 Aug 2024

Contributed by Lukas

Crossposted from https://williamrsaunders.substack.com/p/principles-for-the-agi-race Why form principles for the AGI Race?I worked at OpenAI for 3 yea...

“The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it ” by Martín Soto

29 Aug 2024

Contributed by Lukas

Two new The Information articles with insider information on OpenAI's next models and moves.They are paywalled, but here are the new bits of info...

“What is it to solve the alignment problem? ” by Joe Carlsmith

28 Aug 2024

Contributed by Lukas

People often talk about “solving the alignment problem.” But what is it to do such a thing? I wanted to clarify my thinking about this topic, so I...

“Limitations on Formal Verification for AI Safety ” by Andrew Dickson

27 Aug 2024

Contributed by Lukas

In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of co...

“Would catching your AIs trying to escape convince AI developers to slow down or undeploy? ” by Buck

27 Aug 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.I often talk to people who think that if frontier models were eg...

“Liability regimes for AI ” by Ege Erdil

23 Aug 2024

Contributed by Lukas

For many products, we face a choice of who to hold liable for harms that would not have occurred if not for the existence of the product. For instance...

“AGI Safety and Alignment at Google DeepMind:A Summary of Recent Work ” by Rohin Shah, Seb Farquhar, Anca Dragan

21 Aug 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.We wanted to share a recap of our recent outputs with the AF com...

“Fields that I reference when thinking about AI takeover prevention” by Buck

15 Aug 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.Is AI takeover like a nuclear meltdown? A co...

“WTH is Cerebrolysin, actually?” by gsfitzgerald, delton137

13 Aug 2024

Contributed by Lukas

[This article was originally published on Dan Elton's blog, More is Different.]Cerebrolysin is an unregulated medical product made from enzymatic...

“You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex

10 Aug 2024

Contributed by Lukas

This work was produced at Apollo Research, based on initial research done at MATS.LayerNorm is annoying for mechanstic interpretability research (“[...

“Leaving MIRI, Seeking Funding” by abramdemski

09 Aug 2024

Contributed by Lukas

This is slightly old news at this point, but: as part of MIRI's recent strategy pivot, they've eliminated the Agent Foundations research tea...

“How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage” by orthonormal

08 Aug 2024

Contributed by Lukas

This is a story about a flawed Manifold market, about how easy it is to buy significant objective-sounding publicity for your preferred politics, and ...

“This is already your second chance” by Malmesbury

07 Aug 2024

Contributed by Lukas

Cross-posted from Substack. 1.And the sky opened, and from the celestial firmament descended a cube of ivory the size of a skyscraper, lifted by ten t...

“0. CAST: Corrigibility as Singular Target” by Max Harms

07 Aug 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my car...

“Self-Other Overlap: A Neglected Approach to AI Alignment” by Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena

07 Aug 2024

Contributed by Lukas

Figure 1. Image generated by DALL-3 to represent the concept of self-other overlapMany thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, ...

“You don’t know how bad most things are nor precisely how they’re bad.” by Solenoid_Entity

07 Aug 2024

Contributed by Lukas

TL;DR: Your discernment in a subject often improves as you dedicate time and attention to that subject. The space of possible subjects is huge, so on ...

“Recommendation: reports on the search for missing hiker Bill Ewasko” by eukaryote

07 Aug 2024

Contributed by Lukas

This is a link post.Content warning: About an IRL death.Today's post isn’t so much an essay as a recommendation for two bodies of work on the s...

“The ‘strong’ feature hypothesis could be wrong” by lsgos

07 Aug 2024

Contributed by Lukas

NB. I am on the Google Deepmind language model interpretability team. But the arguments/views in this post are my own, and shouldn't be read as a...

“‘AI achieves silver-medal standard solving International Mathematical Olympiad problems’” by gjm

30 Jul 2024

Contributed by Lukas

This is a link post.Google DeepMind reports on a system for solving mathematical problems that allegedly is able to give complete solutions to four of...

“Decomposing Agency — capabilities without desires” by owencb, Raymond D

29 Jul 2024

Contributed by Lukas

This is a link post.What is an agent? It's a slippery concept with no commonly accepted formal definition, but informally the concept seems to be...

“Universal Basic Income and Poverty” by Eliezer Yudkowsky

27 Jul 2024

Contributed by Lukas

(Crossposted from Twitter)I'm skeptical that Universal Basic Income can get rid of grinding poverty, since somehow humanity's 100-fold produ...

“Optimistic Assumptions, Longterm Planning, and ‘Cope’” by Raemon

19 Jul 2024

Contributed by Lukas

Eliezer Yudkowsky periodically complains about people coming up with questionable plans with questionable assumptions to deal with AI, and then either...

“Superbabies: Putting The Pieces Together” by sarahconstantin

15 Jul 2024

Contributed by Lukas

This post was inspired by some talks at the recent LessOnline conference including one by LessWrong user “Gene Smith”.Let's say you want to h...

“Poker is a bad game for teaching epistemics. Figgie is a better one.” by rossry

12 Jul 2024

Contributed by Lukas

This is a link post.Editor's note: Somewhat after I posted this on my own blog, Max Chiswick cornered me at LessOnline / Manifest and gave me a w...

“Reliable Sources: The Story of David Gerard” by TracingWoodgrains

11 Jul 2024

Contributed by Lukas

This is a linkpost for https://www.tracingwoodgrains.com/p/reliable-sources-how-wikipedia-admin, posted in full here given its relevance to this commu...

“When is a mind me?” by Rob Bensinger

08 Jul 2024

Contributed by Lukas

xlr8harder writes:In general I don’t think an uploaded mind is you, but rather a copy. But one thought experiment makes me question this. A Ship of ...

“80,000 hours should remove OpenAI from the Job Board (and similar orgs should do similarly)” by Raemon

04 Jul 2024

Contributed by Lukas

I haven't shared this post with other relevant parties – my experience has been that private discussion of this sort of thing is more paralyzi...

[Linkpost] “introduction to cancer vaccines” by bhauth

02 Jul 2024

Contributed by Lukas

This is a linkpost for https://www.bhauth.com/blog/biology/cancer%20vaccines.html cancer neoantigensFor cells to become cancerous, they must have muta...

“Priors and Prejudice” by MathiasKB

02 Jul 2024

Contributed by Lukas

IImagine an alternate version of the Effective Altruism movement, whose early influences came from socialist intellectual communities such as the Fab...

“My experience using financial commitments to overcome akrasia” by William Howard

02 Jul 2024

Contributed by Lukas

About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forf...

“The Incredible Fentanyl-Detecting Machine” by sarahconstantin

01 Jul 2024

Contributed by Lukas

An NII machine in Nogales, AZ. (Image source)There's bound to be a lot of discussion of the Biden-Trump presidential debates last night, but I wa...

“AI catastrophes and rogue deployments” by Buck

01 Jul 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.[Thanks to Aryan Bhatt, Ansh Radhakrishnan, Adam Kaufman, Vivek ...

“Loving a world you don’t trust” by Joe Carlsmith

01 Jul 2024

Contributed by Lukas

(Cross-posted from my website. Audio version here, or search for "Joe Carlsmith Audio" on your podcast app.)This is the final essay in a ser...

“Formal verification, heuristic explanations and surprise accounting” by paulfchristiano

27 Jun 2024

Contributed by Lukas

ARC's current research focus can be thought of as trying to combine mechanistic interpretability and formal verification. If we had a deep unders...

“LLM Generality is a Timeline Crux” by eggsyntax

25 Jun 2024

Contributed by Lukas

Summary Summary . LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. Longer summaryThere...

“SAE feature geometry is outside the superposition hypothesis” by jake_mendel

25 Jun 2024

Contributed by Lukas

Summary: Superposition-based interpretations of neural network activation spaces are incomplete. The specific locations of feature vectors contain cru...

“Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data” by Johannes Treutlein, Owain_Evans

23 Jun 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.TL;DR: We published a new paper on out-of-co...

“Boycott OpenAI” by PeterMcCluskey

21 Jun 2024

Contributed by Lukas

This is a link post.I have canceled my OpenAI subscription in protest over OpenAI's lack ofethics.In particular, I object to: threats to confisca...

“Sycophancy to subterfuge: Investigating reward tampering in large language models” by evhub, Carson Denison

20 Jun 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.New Anthropic model organisms research paper...

“I would have shit in that alley, too” by Declan Molony

18 Jun 2024

Contributed by Lukas

After living in a suburb for most of my life, when I moved to a major U.S. city the first thing I noticed was the feces. At first I assumed it was dog...

“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by ryan_greenblatt

18 Jun 2024

Contributed by Lukas

ARC-AGI post Getting 50% (SoTA) on ARC-AGI with GPT-4oI recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a...

“Why I don’t believe in the placebo effect” by transhumanist_atom_understander

15 Jun 2024

Contributed by Lukas

Have you heard this before? In clinical trials, medicines have to be compared to a placebo to separate the effect of the medicine from the psychologic...

“Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)” by Andrew_Critch

14 Jun 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.As an AI researcher who wants to do technical work that helps hu...

“My AI Model Delta Compared To Christiano” by johnswentworth

13 Jun 2024

Contributed by Lukas

Preamble: Delta vs CruxThis section is redundant if you already read My AI Model Delta Compared To Yudkowsky.I don’t natively think in terms of cru...

“My AI Model Delta Compared To Yudkowsky” by johnswentworth

10 Jun 2024

Contributed by Lukas

Preamble: Delta vs CruxI don’t natively think in terms of cruxes. But there's a similar concept which is more natural for me, which I’ll cal...

“Response to Aschenbrenner’s ‘Situational Awareness’” by Rob Bensinger

07 Jun 2024

Contributed by Lukas

(Cross-posted from Twitter.) My take on Leopold Aschenbrenner's new report: I think Leopold gets it right on a bunch of important counts.Three th...

“Humming is not a free $100 bill” by Elizabeth

07 Jun 2024

Contributed by Lukas

Last month I posted about humming as a cheap and convenient way to flood your nose with nitric oxide (NO), a known antiviral. Alas, the economists wer...

“Announcing ILIAD — Theoretical AI Alignment Conference ” by Nora_Ammann, Alexander Gietelink Oldenziel

06 Jun 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.We are pleased to announce ILIAD — a 5-day conference bringing...

“Non-Disparagement Canaries for OpenAI” by aysja, Adam Scholl

31 May 2024

Contributed by Lukas

Since at least 2017, OpenAI has asked departing employees to sign offboarding agreements which legally bind them to permanently—that is, for the res...

“MIRI 2024 Communications Strategy” by Gretta Duleba

30 May 2024

Contributed by Lukas

As we explained in our MIRI 2024 Mission and Strategy update, MIRI has pivoted to prioritize policy, communications, and technical governance research...

“OpenAI: Fallout” by Zvi

28 May 2024

Contributed by Lukas

Previously: OpenAI: Exodus (contains links at top to earlier episodes), Do Not Mess With Scarlett JohanssonWe have learned more since last week. It&ap...

[HUMAN VOICE] Update on human narration for this podcast

28 May 2024

Contributed by Lukas

Contact: patreon.com/lwcurated or [perrin dot j dot walker plus lesswrong fnord gmail].All Solenoid's narration work found here.

“Maybe Anthropic’s Long-Term Benefit Trust is powerless” by Zach Stein-Perlman

28 May 2024

Contributed by Lukas

Crossposted from AI Lab Watch. Subscribe on Substack.Introduction. Anthropic has an unconventional governance mechanism: an independent "Long-...

“Notifications Received in 30 Minutes of Class” by tanagrabeast

27 May 2024

Contributed by Lukas

Introduction. If you are choosing to read this post, you've probably seen the image below depicting all the notifications students received on...

“AI companies aren’t really using external evaluators” by Zach Stein-Perlman

24 May 2024

Contributed by Lukas

New blog: AI Lab Watch. Subscribe on Substack.Many AI safety folks think that METR is close to the labs, with ongoing relationships that grant it acce...

“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper

24 May 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.Part 13 of 12 in the Engineer's Interpretability Sequence. ...

“What’s Going on With OpenAI’s Messaging?” by ozziegoen

22 May 2024

Contributed by Lukas

This is a quickly-written opinion piece, of what I understand about OpenAI. I first posted it to Facebook, where it had some discussion. Some arg...

“Language Models Model Us” by eggsyntax

21 May 2024

Contributed by Lukas

Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica RumbelowOne-sentence summary: On a dataset of human-written essay...

Jaan Tallinn’s 2023 Philanthropy Overview

21 May 2024

Contributed by Lukas

This is a link post.to follow up my philantropic pledge from 2020, i've updated my philanthropy page with 2023 results.in 2023 my donations funde...

“OpenAI: Exodus” by Zvi

21 May 2024

Contributed by Lukas

Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board ...

DeepMind’s ”Frontier Safety Framework” is weak and unambitious

20 May 2024

Contributed by Lukas

FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's K...

Do you believe in hundred dollar bills lying on the ground? Consider humming

18 May 2024

Contributed by Lukas

Introduction. [Reminder: I am an internet weirdo with no medical credentials]A few months ago, I published some crude estimates of the power of nitri...

Deep Honesty

12 May 2024

Contributed by Lukas

Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are b...

On Not Pulling The Ladder Up Behind You

02 May 2024

Contributed by Lukas

Epistemic Status: Musing and speculation, but I think there's a real thing here. 1.When I was a kid, a friend of mine had a tree fort. If you&apo...

Mechanistically Eliciting Latent Behaviors in Language Models

02 May 2024

Contributed by Lukas

Produced as part of the MATS Winter 2024 program, under the mentorship of Alex Turner (TurnTrout).TL,DR: I introduce a method for eliciting latent beh...

Ironing Out the Squiggles

01 May 2024

Contributed by Lukas

Adversarial Examples: A ProblemThe apparent successes of the deep learning revolution conceal a dark underbelly. It may seem that we now know how to ...

Introducing AI Lab Watch

01 May 2024

Contributed by Lukas

This is a linkpost for https://ailabwatch.orgI'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then eval...

Refusal in LLMs is mediated by a single direction

28 Apr 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This work was produced as part of Neel Nanda's stream in th...

Funny Anecdote of Eliezer From His Sister

24 Apr 2024

Contributed by Lukas

This comes from a podcast called 18Forty, of which the main demographic of Orthodox Jews. Eliezer's sister (Hannah) came on and talked about her ...

Thoughts on seed oil

21 Apr 2024

Contributed by Lukas

This is a linkpost for https://dynomight.net/seed-oil/A friend has spent the last three years hounding me about seed oils. Every time I thought I was ...

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

19 Apr 2024

Contributed by Lukas

Yesterday Adam Shai put up a cool post which… well, take a look at the visual:Yup, it sure looks like that fractal is very noisily embedded in the r...

Express interest in an “FHI of the West”

18 Apr 2024

Contributed by Lukas

TLDR: I am investigating whether to found a spiritual successor to FHI, housed under Lightcone Infrastructure, providing a rich cultural environment a...

Transformers Represent Belief State Geometry in their Residual Stream

17 Apr 2024

Contributed by Lukas

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. ...

Paul Christiano named as US AI Safety Institute Head of AI Safety

16 Apr 2024

Contributed by Lukas

This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safetyU.S. S...

[HUMAN VOICE] "On green" by Joe Carlsmith

12 Apr 2024

Contributed by Lukas

Cross-posted from my website. Podcast version here, or search for "Joe Carlsmith Audio" on your podcast app.This essay is part of a series t...

[HUMAN VOICE] "Toward a Broader Conception of Adverse Selection" by Ricki Heicklen

12 Apr 2024

Contributed by Lukas

Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThis is a linkpost for https://bayesshammai.substack.com/p...

[HUMAN VOICE] "My PhD thesis: Algorithmic Bayesian Epistemology" by Eric Neyman

12 Apr 2024

Contributed by Lukas

Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedIn January, I defended my PhD thesis, which I called Algor...

[HUMAN VOICE] "How could I have thought that faster?" by mesaoptimizer

12 Apr 2024

Contributed by Lukas

Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThis is a linkpost for https://twitter.com/ESYudkowsky/sta...

LLMs for Alignment Research: a safety priority?

06 Apr 2024

Contributed by Lukas

A recent short story by Gabriel Mukobi illustrates a near-term scenario where things go bad because new developments in LLMs allow LLMs to accelerate ...

[HUMAN VOICE] "Scale Was All We Needed, At First" by Gabriel Mukobi

05 Apr 2024

Contributed by Lukas

Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/xLDwCemt5qvchzgHd/s...

[HUMAN VOICE] "Using axis lines for good or evil" by dynomight

05 Apr 2024

Contributed by Lukas

Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/Yay8SbQiwErRyDKGb/u...

[HUMAN VOICE] "Social status part 1/2: negotiations over object-level preferences" by Steven Byrnes

05 Apr 2024

Contributed by Lukas

Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/SPBm67otKq5ET5CWP/s...

Activity Overview

Episodes

“Cryonics is free” by Mati_Roy

“Stanislav Petrov Quarterly Performance Review” by Ricki Heicklen

“Laziness death spirals” by PatrickDFarley

“‘Slow’ takeoff is a terrible term for ‘maybe even faster takeoff, actually’” by Raemon

“ASIs will not leave just a little sunlight for Earth ” by Eliezer Yudkowsky

“Skills from a year of Purposeful Rationality Practice ” by Raemon

“How I started believing religion might actually matter for rationality and moral philosophy ” by zhukeepa

“Did Christopher Hitchens change his mind about waterboarding? ” by Isaac King

“The Great Data Integration Schlep ” by sarahconstantin

“Contra papers claiming superhuman AI forecasting ” by nikos, Peter Mühlbacher, Lawrence Phillips, dschwarz

“OpenAI o1 ” by Zach Stein-Perlman

“The Best Lay Argument is not a Simple English Yud Essay ” by J Bostock

“My Number 1 Epistemology Book Recommendation: Inventing Temperature ” by adamShimi

“That Alien Message - The Animation ” by Writer

“Pay Risk Evaluators in Cash, Not Equity ” by Adam Scholl

“Survey: How Do Elite Chinese Students Feel About the Risks of AI? ” by Nick Corvino

“things that confuse me about the current AI market. ” by DMMF

“Nursing doubts ” by dynomight

“Principles for the AGI Race ” by William_S

“The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it ” by Martín Soto

“What is it to solve the alignment problem? ” by Joe Carlsmith

“Limitations on Formal Verification for AI Safety ” by Andrew Dickson

“Would catching your AIs trying to escape convince AI developers to slow down or undeploy? ” by Buck

“Liability regimes for AI ” by Ege Erdil

“AGI Safety and Alignment at Google DeepMind:A Summary of Recent Work ” by Rohin Shah, Seb Farquhar, Anca Dragan

“Fields that I reference when thinking about AI takeover prevention” by Buck

“WTH is Cerebrolysin, actually?” by gsfitzgerald, delton137

“You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex

“Leaving MIRI, Seeking Funding” by abramdemski

“How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage” by orthonormal

“This is already your second chance” by Malmesbury

“0. CAST: Corrigibility as Singular Target” by Max Harms

“Self-Other Overlap: A Neglected Approach to AI Alignment” by Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena

“You don’t know how bad most things are nor precisely how they’re bad.” by Solenoid_Entity

“Recommendation: reports on the search for missing hiker Bill Ewasko” by eukaryote

“The ‘strong’ feature hypothesis could be wrong” by lsgos

“‘AI achieves silver-medal standard solving International Mathematical Olympiad problems’” by gjm

“Decomposing Agency — capabilities without desires” by owencb, Raymond D

“Universal Basic Income and Poverty” by Eliezer Yudkowsky

“Optimistic Assumptions, Longterm Planning, and ‘Cope’” by Raemon

“Superbabies: Putting The Pieces Together” by sarahconstantin

“Poker is a bad game for teaching epistemics. Figgie is a better one.” by rossry

“Reliable Sources: The Story of David Gerard” by TracingWoodgrains

“When is a mind me?” by Rob Bensinger

“80,000 hours should remove OpenAI from the Job Board (and similar orgs should do similarly)” by Raemon

[Linkpost] “introduction to cancer vaccines” by bhauth

“Priors and Prejudice” by MathiasKB

“My experience using financial commitments to overcome akrasia” by William Howard

“The Incredible Fentanyl-Detecting Machine” by sarahconstantin

“AI catastrophes and rogue deployments” by Buck

“Loving a world you don’t trust” by Joe Carlsmith

“Formal verification, heuristic explanations and surprise accounting” by paulfchristiano

“LLM Generality is a Timeline Crux” by eggsyntax

“SAE feature geometry is outside the superposition hypothesis” by jake_mendel

“Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data” by Johannes Treutlein, Owain_Evans

“Boycott OpenAI” by PeterMcCluskey

“Sycophancy to subterfuge: Investigating reward tampering in large language models” by evhub, Carson Denison

“I would have shit in that alley, too” by Declan Molony

“Getting 50% (SoTA) on ARC-AGI with GPT-4o” by ryan_greenblatt

“Why I don’t believe in the placebo effect” by transhumanist_atom_understander

“Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)” by Andrew_Critch

“My AI Model Delta Compared To Christiano” by johnswentworth

“My AI Model Delta Compared To Yudkowsky” by johnswentworth

“Response to Aschenbrenner’s ‘Situational Awareness’” by Rob Bensinger

“Humming is not a free $100 bill” by Elizabeth

“Announcing ILIAD — Theoretical AI Alignment Conference ” by Nora_Ammann, Alexander Gietelink Oldenziel

“Non-Disparagement Canaries for OpenAI” by aysja, Adam Scholl

“MIRI 2024 Communications Strategy” by Gretta Duleba

“OpenAI: Fallout” by Zvi

[HUMAN VOICE] Update on human narration for this podcast

“Maybe Anthropic’s Long-Term Benefit Trust is powerless” by Zach Stein-Perlman

“Notifications Received in 30 Minutes of Class” by tanagrabeast

“AI companies aren’t really using external evaluators” by Zach Stein-Perlman

“EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024” by scasper

“What’s Going on With OpenAI’s Messaging?” by ozziegoen

“Language Models Model Us” by eggsyntax

Jaan Tallinn’s 2023 Philanthropy Overview

DeepMind’s ”Frontier Safety Framework” is weak and unambitious