Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (Curated & Popular)

Technology Society & Culture

Episodes

Showing 401-500 of 805
«« ← Prev Page 5 of 9 Next → »»

“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq

10 Jan 2025

Contributed by Lukas

TL;DR: There may be a fundamental problem with interpretability work that attempts to understand neural networks by decomposing their individual activ...

“What o3 Becomes by 2028” by Vladimir_Nesov

09 Jan 2025

Contributed by Lukas

Funding for $150bn training systems just turned less speculative, with OpenAI o3 reaching 25% on FrontierMath, 70% on SWE-Verified, 2700 on Codeforces...

“What Indicators Should We Watch to Disambiguate AGI Timelines?” by snewman

09 Jan 2025

Contributed by Lukas

(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduc...

“How will we update about scheming?” by ryan_greenblatt

08 Jan 2025

Contributed by Lukas

I mostly work on risks from scheming (that is, misaligned, power-seeking AIs that plot against their creators such as by faking alignment). Recently, ...

“OpenAI #10: Reflections” by Zvi

08 Jan 2025

Contributed by Lukas

This week, Altman offers a post called Reflections, and he has an interview in Bloomberg. There's a bunch of good and interesting answers in the ...

“Maximizing Communication, not Traffic” by jefftk

07 Jan 2025

Contributed by Lukas

As someone who writes for fun, I don't need to get people onto my site: If I write a post and some people are able to get the core ideajust from ...

“What’s the short timeline plan?” by Marius Hobbhahn

02 Jan 2025

Contributed by Lukas

This is a low-effort post. I mostly want to get other people's takes and express concern about the lack of detailed and publicly available plans ...

“Shallow review of technical AI safety, 2024” by technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers

30 Dec 2024

Contributed by Lukas

from aisafety.world The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense...

“By default, capital will matter more than ever after AGI” by L Rudolf L

29 Dec 2024

Contributed by Lukas

I've heard many people say something like "money won't matter post-AGI". This has always struck me as odd, and as most likely comp...

“Review: Planecrash” by L Rudolf L

28 Dec 2024

Contributed by Lukas

Take a stereotypical fantasy novel, a textbook on mathematical logic, and Fifty Shades of Grey. Mix them all together and add extra weirdness for spic...

“The Field of AI Alignment: A Postmortem, and What To Do About It” by johnswentworth

26 Dec 2024

Contributed by Lukas

A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look...

“When Is Insurance Worth It?” by kqr

23 Dec 2024

Contributed by Lukas

TL;DR: If you want to know whether getting insurance is worth it, use the Kelly Insurance Calculator. If you want to know why or how, read on.Note to ...

“Orienting to 3 year AGI timelines” by Nikola Jurkovic

23 Dec 2024

Contributed by Lukas

My median expectation is that AGI[1] will be created 3 years from now. This has implications on how to behave, and I will share some useful thoughts I...

“What Goes Without Saying” by sarahconstantin

21 Dec 2024

Contributed by Lukas

There are people I can talk to, where all of the following statements are obvious. They go without saying. We can just “be reasonable” together, w...

“o3” by Zach Stein-Perlman

21 Dec 2024

Contributed by Lukas

I'm editing this post.OpenAI announced (but hasn't released) o3 (skipping o2 for trademark reasons).It gets 25% on FrontierMath, smashing th...

“‘Alignment Faking’ frame is somewhat fake” by Jan_Kulveit

21 Dec 2024

Contributed by Lukas

I like the research. I mostly trust the results. I dislike the 'Alignment Faking' name and frame, and I'm afraid it will stick and lead...

“AIs Will Increasingly Attempt Shenanigans” by Zvi

19 Dec 2024

Contributed by Lukas

Increasingly, we have seen papers eliciting in AI models various shenanigans.There are a wide variety of scheming behaviors. You’ve got your weight ...

“Alignment Faking in Large Language Models” by ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck

18 Dec 2024

Contributed by Lukas

What happens when you tell Claude it is being trained to do something it doesn't want to do? We (Anthropic and Redwood Research) have a new paper...

“Communications in Hard Mode (My new job at MIRI)” by tanagrabeast

15 Dec 2024

Contributed by Lukas

Six months ago, I was a high school English teacher.I wasn’t looking to change careers, even after nineteen sometimes-difficult years. I was good at...

“Biological risk from the mirror world” by jasoncrawford

13 Dec 2024

Contributed by Lukas

A new article in Science Policy Forum voices concern about a particular line of biological research which, if successful in the long term, could event...

“Subskills of ‘Listening to Wisdom’” by Raemon

13 Dec 2024

Contributed by Lukas

A fool learns from their own mistakes The wise learn from the mistakes of others.– Otto von Bismark A problem as old as time: The youth won't ...

“Understanding Shapley Values with Venn Diagrams” by Carson L

13 Dec 2024

Contributed by Lukas

Someone I know, Carson Loughridge, wrote this very nice post explaining the core intuition around Shapley values (which play an important role in imp...

“LessWrong audio: help us choose the new voice” by PeterH

12 Dec 2024

Contributed by Lukas

We make AI narrations of LessWrong posts available via our audio player and podcast feeds.We’re thinking about changing our narrator's voice.Th...

“Understanding Shapley Values with Venn Diagrams” by agucova

11 Dec 2024

Contributed by Lukas

This is a link post. Someone I know wrote this very nice post explaining the core intuition around Shapley values (which play an important role in imp...

“o1: A Technical Primer” by Jesse Hoogland

11 Dec 2024

Contributed by Lukas

TL;DR: In September 2024, OpenAI released o1, its first "reasoning model". This model exhibits remarkable test-time scaling laws, which comp...

“Gradient Routing: Masking Gradients to Localize Computation in Neural Networks” by cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout

09 Dec 2024

Contributed by Lukas

We present gradient routing, a way of controlling where learning happens in neural networks. Gradient routing applies masks to limit the flow of gradi...

“Frontier Models are Capable of In-context Scheming” by Marius Hobbhahn, AlexMeinke, Bronson Schoen

06 Dec 2024

Contributed by Lukas

This is a brief summary of what we believe to be the most important takeaways from our new paper and from our findings shown in the o1 system card. We...

“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka

30 Nov 2024

Contributed by Lukas

TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate here, or send me an email, DM or signal message (+1 510 944 3235) if you wa...

“Repeal the Jones Act of 1920” by Zvi

29 Nov 2024

Contributed by Lukas

Balsa Policy Institute chose as its first mission to lay groundwork for the potential repeal, or partial repeal, of section 27 of the Jones Act of 192...

“China Hawks are Manufacturing an AI Arms Race” by garrison

29 Nov 2024

Contributed by Lukas

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, ...

“Information vs Assurance” by johnswentworth

27 Nov 2024

Contributed by Lukas

In contract law, there's this thing called a “representation”. Example: as part of a contract to sell my house, I might “represent that” ...

“You are not too ‘irrational’ to know your preferences.” by DaystarEld

27 Nov 2024

Contributed by Lukas

Epistemic Status: 13 years working as a therapist for a wide variety of populations, 5 of them working with rationalists and EA clients. 7 years teach...

“‘The Solomonoff Prior is Malign’ is a special case of a simpler argument” by David Matolcsi

25 Nov 2024

Contributed by Lukas

[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the con...

“‘It’s a 10% chance which I did 10 times, so it should be 100%’” by egor.timatkov

20 Nov 2024

Contributed by Lukas

Audio note: this article contains 33 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text i...

“OpenAI Email Archives” by habryka

19 Nov 2024

Contributed by Lukas

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman...

“Ayn Rand’s model of ‘living money’; and an upside of burnout” by AnnaSalamon

18 Nov 2024

Contributed by Lukas

Epistemic status: Toy model. Oversimplified, but has been anecdotally useful to at least a couple people, and I like it as a metaphor. IntroductionI’...

“Neutrality” by sarahconstantin

17 Nov 2024

Contributed by Lukas

Midjourney, “infinite library”I’ve had post-election thoughts percolating, and the sense that I wanted to synthesize something about this moment...

“Making a conservative case for alignment” by Cameron Berg, Judd Rosenblatt, phgubbins, AE Studio

16 Nov 2024

Contributed by Lukas

Trump and the Republican party will yield broad governmental control during what will almost certainly be a critical period for AGI development. In th...

“OpenAI Email Archives (from Musk v. Altman)” by habryka

16 Nov 2024

Contributed by Lukas

As part of the court case between Elon Musk and Sam Altman, a substantial number of emails between Elon, Sam Altman, Ilya Sutskever, and Greg Brockman...

“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

15 Nov 2024

Contributed by Lukas

Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback.Following up on our recent “Sabotage Evaluations...

“The Online Sports Gambling Experiment Has Failed” by Zvi

12 Nov 2024

Contributed by Lukas

Related: Book Review: On the Edge: The GamblersI have previously been heavily involved in sports betting. That world was very good to me. The times we...

“o1 is a bad idea” by abramdemski

12 Nov 2024

Contributed by Lukas

This post comes a bit late with respect to the news cycle, but I argued in a recent interview that o1 is an unfortunate twist on LLM technologies, mak...

“Current safety training techniques do not fully transfer to the agent setting” by Simon Lermen, Govind Pimpale

09 Nov 2024

Contributed by Lukas

TL;DR: I'm presenting three recent papers which all share a similar finding, i.e. the safety training techniques for chat models don’t transfer...

“Explore More: A Bag of Tricks to Keep Your Life on the Rails” by Shoshannah Tekofsky

04 Nov 2024

Contributed by Lukas

At least, if you happen to be near me in brain space.What advice would you give your younger self?That was the prompt for a class I taught at PAIR 202...

“Survival without dignity” by L Rudolf L

04 Nov 2024

Contributed by Lukas

I open my eyes and find myself lying on a bed in a hospital room. I blink."Hello", says a middle-aged man with glasses, sitting on a chair b...

“The Median Researcher Problem” by johnswentworth

04 Nov 2024

Contributed by Lukas

Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median resear...

“The Compendium, A full argument about extinction risk from AGI” by adamShimi, Gabriel Alfour, Connor Leahy, Chris Scammell, Andrea_Miotti

01 Nov 2024

Contributed by Lukas

This is a link post.We (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi) have just published The Compendium, which brings toge...

“What TMS is like” by Sable

31 Oct 2024

Contributed by Lukas

There are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter.TMS stands for Transcranial Magnetic Stimulatio...

“The hostile telepaths problem” by Valentine

28 Oct 2024

Contributed by Lukas

Epistemic status: model-building based on observation, with a few successful unusual predictions. Anecdotal evidence has so far been consistent with t...

“A bird’s eye view of ARC’s research” by Jacob_Hilton

27 Oct 2024

Contributed by Lukas

This post includes a "flattened version" of an interactive diagram that cannot be displayed on this site. I recommend reading the original v...

“A Rocket–Interpretability Analogy” by plex

25 Oct 2024

Contributed by Lukas

 1. 4.4% of the US federal budget went into the space race at its peak.This was surprising to me, until a friend pointed out that landing rockets o...

“I got dysentery so you don’t have to” by eukaryote

24 Oct 2024

Contributed by Lukas

This summer, I participated in a human challenge trial at the University of Maryland. I spent the days just prior to my 30th birthday sick with shigel...

“Overcoming Bias Anthology” by Arjun Panickssery

23 Oct 2024

Contributed by Lukas

This is a link post. Part 1: Our Thinking Near and Far1 Abstract/Distant Future Bias2 Abstractly Ideal, Concretely Selfish3 We Add Near, Average Far4 ...

“Arithmetic is an underrated world-modeling technology” by dynomight

22 Oct 2024

Contributed by Lukas

Of all the cognitive tools our ancestors left us, what's best? Society seems to think pretty highly of arithmetic. It's one of the first thi...

“My theory of change for working in AI healthtech” by Andrew_Critch

15 Oct 2024

Contributed by Lukas

This post starts out pretty gloomy but ends up with some points that I feel pretty positive about. Day to day, I'm more focussed on the positive ...

“Why I’m not a Bayesian” by Richard_Ngo

15 Oct 2024

Contributed by Lukas

This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then ...

“The AGI Entente Delusion” by Max Tegmark

14 Oct 2024

Contributed by Lukas

As humanity gets closer to Artificial General Intelligence (AGI), a new geopolitical strategy is gaining traction in US and allied circles, in the Nat...

“Momentum of Light in Glass” by Ben

14 Oct 2024

Contributed by Lukas

I think that most people underestimate how many scientific mysteries remain, even on questions that sound basic.My favourite candidate for "the m...

“Overview of strong human intelligence amplification methods” by TsviBT

09 Oct 2024

Contributed by Lukas

How can we make many humans who are very good at solving difficult problems? Summary (table of made-up numbers)I made up the made-up numbers in this t...

“Struggling like a Shadowmoth” by Raemon

03 Oct 2024

Contributed by Lukas

This post is probably hazardous for one type of person in one particular growth stage, and necessary for people in a different growth stage, and I don...

“Three Subtle Examples of Data Leakage” by abstractapplic

03 Oct 2024

Contributed by Lukas

This is a description of my work on some data science projects, lightly obfuscated and fictionalized to protect the confidentiality of the organizatio...

“the case for CoT unfaithfulness is overstated” by nostalgebraist

30 Sep 2024

Contributed by Lukas

[Meta note: quickly written, unpolished. Also, it's possible that there's some more convincing work on this topic that I'm unaware of –...

“Cryonics is free” by Mati_Roy

30 Sep 2024

Contributed by Lukas

I've been wanting to write a nice post for a few months, but should probably just write a one sooner instead. This is a top-level post not becaus...

“Stanislav Petrov Quarterly Performance Review” by Ricki Heicklen

29 Sep 2024

Contributed by Lukas

Quarterly Performance Review, Autumn 1983Colonel Yuri Kuznetsov looked out the window anxiously. The endless gray landscape did little to soothe his ...

“Laziness death spirals” by PatrickDFarley

29 Sep 2024

Contributed by Lukas

I’ve claimed that Willpower compounds and that small wins in the present make it easier to get bigger wins in the future. Unfortunately, procrastina...

“‘Slow’ takeoff is a terrible term for ‘maybe even faster takeoff, actually’” by Raemon

29 Sep 2024

Contributed by Lukas

For a long time, when I heard "slow takeoff", I assumed it meant "takeoff that takes longer calendar time than fast takeoff." (i.e...

“ASIs will not leave just a little sunlight for Earth ” by Eliezer Yudkowsky

23 Sep 2024

Contributed by Lukas

A common claim among e/accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just be...

“Skills from a year of Purposeful Rationality Practice ” by Raemon

21 Sep 2024

Contributed by Lukas

A year ago, I started trying to deliberate practice skills that would "help people figure out the answers to confusing, important questions.&quot...

“How I started believing religion might actually matter for rationality and moral philosophy ” by zhukeepa

19 Sep 2024

Contributed by Lukas

After the release of Ben Pace's extended interview with me about my views on religion, I felt inspired to publish more of my thinking about relig...

“Did Christopher Hitchens change his mind about waterboarding? ” by Isaac King

17 Sep 2024

Contributed by Lukas

There's a popular story that goes like this: Christopher Hitchens used to be in favor of the US waterboarding terrorists because he though it&apo...

“The Great Data Integration Schlep ” by sarahconstantin

15 Sep 2024

Contributed by Lukas

Midjourney, “Fourth Industrial Revolution Digital Transformation”This is a little rant I like to give, because it's something I learned on th...

“Contra papers claiming superhuman AI forecasting ” by nikos, Peter Mühlbacher, Lawrence Phillips, dschwarz

14 Sep 2024

Contributed by Lukas

[Conflict of interest disclaimer: We are FutureSearch, a company working on AI-powered forecasting and other types of quantitative reasoning. If thin ...

“OpenAI o1 ” by Zach Stein-Perlman

13 Sep 2024

Contributed by Lukas

This is a link post. --- First published: September 12th, 2024 Source: https://www.lesswrong.com/posts/bhY5aE...

“The Best Lay Argument is not a Simple English Yud Essay ” by J Bostock

11 Sep 2024

Contributed by Lukas

Epistemic status: these are my own opinions on AI risk communication, based primarily on my own instincts on the subject and discussions with people l...

“My Number 1 Epistemology Book Recommendation: Inventing Temperature ” by adamShimi

10 Sep 2024

Contributed by Lukas

In my last post, I wrote that no resource out there exactly captured my model of epistemology, which is why I wanted to share a half-baked version of ...

“That Alien Message - The Animation ” by Writer

09 Sep 2024

Contributed by Lukas

Our new video is an adaptation of That Alien Message, by @Eliezer Yudkowsky. This time, the text has been significantly adapted, so I include it below...

“Pay Risk Evaluators in Cash, Not Equity ” by Adam Scholl

07 Sep 2024

Contributed by Lukas

Personally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute bas...

“Survey: How Do Elite Chinese Students Feel About the Risks of AI? ” by Nick Corvino

07 Sep 2024

Contributed by Lukas

IntroIn April 2024, my colleague and I (both affiliated with Peking University) conducted a survey involving 510 students from Tsinghua University and...

“things that confuse me about the current AI market. ” by DMMF

02 Sep 2024

Contributed by Lukas

Paging Gwern or anyone else who can shed light on the current state of the AI market—I have several questions.Since the release of ChatGPT, at least...

“Nursing doubts ” by dynomight

01 Sep 2024

Contributed by Lukas

If you ask the internet if breastfeeding is good, you will soon learn that YOU MUST BREASTFEED because BREAST MILK = OPTIMAL FOOD FOR BABY. But if you...

“Principles for the AGI Race ” by William_S

31 Aug 2024

Contributed by Lukas

Crossposted from https://williamrsaunders.substack.com/p/principles-for-the-agi-race Why form principles for the AGI Race?I worked at OpenAI for 3 yea...

“The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it ” by Martín Soto

29 Aug 2024

Contributed by Lukas

Two new The Information articles with insider information on OpenAI's next models and moves.They are paywalled, but here are the new bits of info...

“What is it to solve the alignment problem? ” by Joe Carlsmith

28 Aug 2024

Contributed by Lukas

People often talk about “solving the alignment problem.” But what is it to do such a thing? I wanted to clarify my thinking about this topic, so I...

“Limitations on Formal Verification for AI Safety ” by Andrew Dickson

27 Aug 2024

Contributed by Lukas

In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of co...

“Would catching your AIs trying to escape convince AI developers to slow down or undeploy? ” by Buck

27 Aug 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.I often talk to people who think that if frontier models were eg...

“Liability regimes for AI ” by Ege Erdil

23 Aug 2024

Contributed by Lukas

For many products, we face a choice of who to hold liable for harms that would not have occurred if not for the existence of the product. For instance...

“AGI Safety and Alignment at Google DeepMind:A Summary of Recent Work ” by Rohin Shah, Seb Farquhar, Anca Dragan

21 Aug 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.We wanted to share a recap of our recent outputs with the AF com...

“Fields that I reference when thinking about AI takeover prevention” by Buck

15 Aug 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a link post.Is AI takeover like a nuclear meltdown? A co...

“WTH is Cerebrolysin, actually?” by gsfitzgerald, delton137

13 Aug 2024

Contributed by Lukas

[This article was originally published on Dan Elton's blog, More is Different.]Cerebrolysin is an unregulated medical product made from enzymatic...

“You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex

10 Aug 2024

Contributed by Lukas

This work was produced at Apollo Research, based on initial research done at MATS.LayerNorm is annoying for mechanstic interpretability research (“[...

“Leaving MIRI, Seeking Funding” by abramdemski

09 Aug 2024

Contributed by Lukas

This is slightly old news at this point, but: as part of MIRI's recent strategy pivot, they've eliminated the Agent Foundations research tea...

“How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage” by orthonormal

08 Aug 2024

Contributed by Lukas

This is a story about a flawed Manifold market, about how easy it is to buy significant objective-sounding publicity for your preferred politics, and ...

“This is already your second chance” by Malmesbury

07 Aug 2024

Contributed by Lukas

Cross-posted from Substack. 1.And the sky opened, and from the celestial firmament descended a cube of ivory the size of a skyscraper, lifted by ten t...

“0. CAST: Corrigibility as Singular Target” by Max Harms

07 Aug 2024

Contributed by Lukas

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.What the heck is up with “corrigibility”? For most of my car...

“Self-Other Overlap: A Neglected Approach to AI Alignment” by Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena

07 Aug 2024

Contributed by Lukas

Figure 1. Image generated by DALL-3 to represent the concept of self-other overlapMany thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, ...

“You don’t know how bad most things are nor precisely how they’re bad.” by Solenoid_Entity

07 Aug 2024

Contributed by Lukas

TL;DR: Your discernment in a subject often improves as you dedicate time and attention to that subject. The space of possible subjects is huge, so on ...

“Recommendation: reports on the search for missing hiker Bill Ewasko” by eukaryote

07 Aug 2024

Contributed by Lukas

This is a link post.Content warning: About an IRL death.Today's post isn’t so much an essay as a recommendation for two bodies of work on the s...

“The ‘strong’ feature hypothesis could be wrong” by lsgos

07 Aug 2024

Contributed by Lukas

NB. I am on the Google Deepmind language model interpretability team. But the arguments/views in this post are my own, and shouldn't be read as a...

“‘AI achieves silver-medal standard solving International Mathematical Olympiad problems’” by gjm

30 Jul 2024

Contributed by Lukas

This is a link post.Google DeepMind reports on a system for solving mathematical problems that allegedly is able to give complete solutions to four of...

“Decomposing Agency — capabilities without desires” by owencb, Raymond D

29 Jul 2024

Contributed by Lukas

This is a link post.What is an agent? It's a slippery concept with no commonly accepted formal definition, but informally the concept seems to be...

«« ← Prev Page 5 of 9 Next → »»