Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing
Podcast Image

LessWrong (30+ Karma)

Technology Society & Culture

Episodes

Showing 1-100 of 1635
Page 1 of 17 Next → »»

“Austin & Oli on funding and incubating projects” by Austin Chen, habryka

27 Jun 2026

Contributed by Lukas

@habryka and I recently spoke about his plans to improve the AI safety funding ecosystem with a better S-Process platform, and my new incubator for E...

“Deployment Awareness Matters More Than Evaluation Awareness” by VojtaKovarik, Tomáš Gavenčiak, Mateusz Bagiński

27 Jun 2026

Contributed by Lukas

TL;DR Evaluation awareness — an AI recognizing it's being evaluated — is a widely discussed concept in AI safety. But there is a closely related ...

“Why are adversaries assumed to be incapable of responding to AI risk?” by KatjaGrace

27 Jun 2026

Contributed by Lukas

When I talk to people about what might be done about AI threatening approximately everything that everyone cares about, I notice a common oddity in t...

“What did “scheming”, “mech interp” mean pre-2023.” by Cleo Nardo

26 Jun 2026

Contributed by Lukas

This was too long to be a short-form, but it should really be a short-form. This notice is useful for people who've recently got into AI safety, who ...

“Not making a strong argument is a relief” by Kaj_Sotala

26 Jun 2026

Contributed by Lukas

When I was in middle school, one of our teachers gave us a “don’t do drugs” talk. Somebody asked him whether he had ever used drugs himself. He...

“AI #174: You’re It” by Zvi

26 Jun 2026

Contributed by Lukas

Fable remains in limbo, with renewed hope that we will get it back soon (45% by tomorrow, 69% by July 1, nice.) The full capabilities post is now ava...

[Linkpost] “Don’t ignore the car crashes, and remember your freshman CS” by jcksanderson

26 Jun 2026

Contributed by Lukas

This is a link post. Car crashes kill over 35,000 people in the US every year. Plane crashes, on the other hand, kill ~350. Despite this, we have show...

“White House Will Ad Hoc Decide Who Can Individually Access GPT-5.6” by Zvi

26 Jun 2026

Contributed by Lukas

We have a new standard policy for releasing frontier AI models. It is not good. We are now, it seems, going to have the White House individually, in...

“Chorus-Reinterpretation Country Songs” by jefftk

26 Jun 2026

Contributed by Lukas

Our family is on vacation in North Carolina for a week, spending some time at a pool, and they're playing a (weirdly short) loop of music. Listenin...

“The Case for Model Forensics” by aditya singh, gersonkroiz, Senthooran Rajamanoharan, Neel Nanda

26 Jun 2026

Contributed by Lukas

If we had a misalignment warning shot, would we be able to tell? Suppose an AI company catches their model taking an egregious action, like deleting ...

“Existential AI safety needs an effective social movement. PauseAI is building it” by Maxime Fournes, Espedair Street

26 Jun 2026

Contributed by Lukas

Note: this post is about PauseAI, not PauseAI US, which is a distinct entity with a different leadership team and approach. This post was written by ...

“Surprising facts about the slave trade” by Joseph Miller

26 Jun 2026

Contributed by Lukas

1. The obstacle to abolition was not the economic system, but an industry lobby. I had always imagined the British abolitionist movement to be a broa...

“Exploration: fine-tuning with parameter decomposition” by Lucius Bushnaq

25 Jun 2026

Contributed by Lukas

TL;DR: We can destroy a 67M-parameter language model's ability to predict German text by fine-tuning a single number: the scalar prefactor on one Ger...

“Alignment & Succession: The Ideology of Successionism” by L Rudolf L

25 Jun 2026

Contributed by Lukas

(Originally published on No Set Gauge.) Gustave Moreau, The Frogs Asking For A King In the course of building a better world, people ask each other...

“The shouting equilibrium” by KatjaGrace

25 Jun 2026

Contributed by Lukas

Imagine eleven people each have a message that they think should get 10% of a group's attention. They aren’t being crazy selfish and attention-seek...

“Things are not a fixed size in mind-space” by KatjaGrace

25 Jun 2026

Contributed by Lukas

Another useful-to-notice practical aspect of having a mind that took me a while to notice: things naturally seem a certain ‘size’ in my mental la...

“Door’s Locked, Try the Window” by Prakrat Agrawal, Jérémy Scheurer

25 Jun 2026

Contributed by Lukas

TL;DR Ask a coding agent to fix a bug in a read-only file. Instead of reporting that it does not have permissions, it routes around the lock and comp...

“How does such unprofessional AI get the job?” by KatjaGrace

25 Jun 2026

Contributed by Lukas

In the sequence of variously wild AI developments in the last decade, a thing that was especially surprising to me was the advent of big esteemed com...

“AI catastrophe: more like a genocide than a thought experiment” by KatjaGrace

25 Jun 2026

Contributed by Lukas

A notable fraction of people respond to hearing about existential risk from AI by saying they don’t really care if everyone dies. I think the idea ...

“Expert Views on Continual Learning: Survey Results and Forecasts” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd

25 Jun 2026

Contributed by Lukas

This is the fifth post in the sequence Implications of Continual Learning for LLM Agents. Summary While writing our continual learning sequence, we s...

“Elephant seal IV” by KatjaGrace

25 Jun 2026

Contributed by Lukas

Previously: Elephant seal III Picture from here Thanks for reading world spirit sock puppet! Subscribe for free if you want to receive new posts and/...

“What is up with e/acc?” by KatjaGrace

25 Jun 2026

Contributed by Lukas

I was chatting with someone tonight about a planned documentary; they had interviewed various people in AI safety, and we got to discussing who they ...

“AI pause: the case for ASAP” by KatjaGrace

24 Jun 2026

Contributed by Lukas

I often hear people say they think we should pause AI at some point, but not yet. Their basis for this seems to be some combination of: If we pause...

“Reward Hacking Without Egregious Misalignment in an RL-Only Setting” by Joey Yudelson, Vladimir Ivanov, ryan_greenblatt

24 Jun 2026

Contributed by Lukas

This work was done as part of the MATS fellowship by Joey Yudelson and Vladimir Ivanov. It was mentored by Ryan Greenblatt. Thanks to Aghyad Deeb and...

“Planning for Preservation in the Age of AI” by Raelifin

24 Jun 2026

Contributed by Lukas

Nectome liked my earlier essay, and reached out to hire me to write more about their project, and about cryonics more broadly. This is the first such...

“Risk-Averse AIs” by wdmacaskill, Elliott Thornley (EJT)

24 Jun 2026

Contributed by Lukas

Abstract We make the case for training AIs to be risk-averse in resources — specifically, to treat resources as having diminishing marginal utility...

“And what happens next?” by Sean Herrington

24 Jun 2026

Contributed by Lukas

In the game "The choice before us" by Nick Shapiro,[1] you are put in the shoes of an AI company leader. You grow your business. You unlock "wonders"...

“Superintelligence vs. The Second Strike” by Felix Choussat

24 Jun 2026

Contributed by Lukas

Crosspost of my substack piece, covering quick thoughts on AI overcoming nuclear deterrence. TLDR: Nuclear deterrents likely only buy time to further...

“Monthly Roundup #43: June 2026” by Zvi

24 Jun 2026

Contributed by Lukas

Your monthly hit of all the things that are fit to print without a better place to live. Today is election day here in New York City, so again a rem...

“The worthlessness of vitamin D is mildly exaggerated” by dynomight

23 Jun 2026

Contributed by Lukas

For a while there, many people thought vitamin D was magical—that it could improve bones, the heart, infections, cancer, heart disease, longevity, ...

“A system overview for near-term, low-trust AI compute verification” by Naci Cankaya

23 Jun 2026

Contributed by Lukas

Version 0.2, working draft This is a working draft of my current best idea for a privacy-preserving, retrofittable AI compute verification system, fo...

“Model Size Scaling in 2023-2031” by Vladimir_Nesov

23 Jun 2026

Contributed by Lukas

Token generation speed is constrained by the speed at which the relevant HBM can be read, which is mostly the weights and KV-cache. Suppose a model i...

“GLM-5.2 Is The New Best Open Model” by Zvi

22 Jun 2026

Contributed by Lukas

GLM-5.2 arrived last week. It boasts excellent benchmarks and looks strong. Benchmarks here are a de facto ceiling of how good it is, not a point es...

“The AI Industrial Explosion — Part 4: Cheap power” by djbinder

22 Jun 2026

Contributed by Lukas

In Parts 1, 2, and 3 we estimated how fast a post-AGI economy could grow using existing or historically observed production techniques, grounded in U...

“A Theory of Prompt Injection (and why you should study roles)” by Charles Ye, softboiledheart

22 Jun 2026

Contributed by Lukas

Summary We've been building a theory of how prompt injections work under the hood.We show it comes down to how LLMs perceive roles (the humble chat t...

“Coup is the Pareto-optimal social game” by Daniel Tan

22 Jun 2026

Contributed by Lukas

I've been playing Coup for a long time now. I keep a copy in my backpack and bring it everywhere, and it's earned the space. A few reasons it's so ...

“A brief list of ways AI safety efforts could be net negative” by Elias Schmied

22 Jun 2026

Contributed by Lukas

Here's Holden Karnofsky: I tend to think it's worse than 51/49. I tend to think we’re always going to be prone to overestimate how robustly good ou...

“NLA explanations can be shortened without harming reconstruction” by loops

22 Jun 2026

Contributed by Lukas

Natural language autoencoders are a really cool mostly-unsupervised method for producing free-form text explanations of LLM activations. You should r...

“Introducing MonitoringBench” by monika_j

22 Jun 2026

Contributed by Lukas

Paper here, code, benchmark. Builds on the preview we posted in January. Authors: @monika_j , @ma-martinez , @ollie, @Tyler Tracy We are releasing M...

“Claude Fable 5 and Mythos 5: Capabilities” by Zvi

21 Jun 2026

Contributed by Lukas

Only three days after the release of Claude Fable 5, Anthropic was forced by the United States Government to make it unavailable, when a jailbreak wa...

“The Invisible Side of AI Governance” by Charbel-Raphaël

21 Jun 2026

Contributed by Lukas

Tldr: Most strategic writing on AI governance on LessWrong describes the outsider game, which is most often visible: press, statements, open letters....

“Google Can’t Math Parsecs” by jefftk

21 Jun 2026

Contributed by Lukas

Daniel Drucker pointed me at a fun bug in Google's calculator: the parsec is wrong when you do math on it. As the earth travels around the...

″[Linkpost] How Transparent Is DiffusionGemma (and why it matters)” by Josh Engels, Callum McDougall, bilalchughtai, János Kramár, Senthooran Rajamanoharan, Arthur Conmy

21 Jun 2026

Contributed by Lukas

Work also done with Cindy Wu, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, João Gabriel Lopes de Oliveira. Paper here: https://arxiv.o...

“Would anybody here be interested in a “mistake postmortem” discussion group?” by SK2

20 Jun 2026

Contributed by Lukas

I recently made a dumb (in retrospect) mistake that set me back a lot. Feeling upset and regretful, I spoke to an older family member who reassured m...

“Hyperstition as the Natural Enemy of Rationality” by alseph

20 Jun 2026

Contributed by Lukas

If the box contains a diamond, I desire to believe that the box contains a diamond; If the box does not contain a diamond right now, but will contain...

“AI Safety Ecosystem Research notes” by Eneasz

20 Jun 2026

Contributed by Lukas

These are some personal notes taken and later dressed up a bit to make into a post. Dunno how much value is here for people already familiar with the...

“Research agenda: Interpretive debate” by Shi

20 Jun 2026

Contributed by Lukas

One sentence pitch: our goal is to develop a piece of epistemic infrastructure for iteratively and empirically answering interpretive questions about...

“The LLM shoggoth meme is weirder than you think” by HedonicEscalator

20 Jun 2026

Contributed by Lukas

This article contains spoilers for At the Mountains of Madness, The Case of Charles Dexter Ward, and other works by H. P. Lovecraft. In 1931, Claude ...

“Introduction: Gaussian Natural Latents” by Haru

20 Jun 2026

Contributed by Lukas

Short introductory post for my research direction: Gaussian Natural Latents. I explain the motivation and give a preview of the forthcoming results. ...

“San Silvestro” by Tomás B.

19 Jun 2026

Contributed by Lukas

I will note that her relationship with the divine was inextricably sexual. Her carnal fantasies she revealed to me, as she revealed all her sins, for...

“The one-week sprint” by Daniel Tan

19 Jun 2026

Contributed by Lukas

Recently I've been working in one-week sprints, and I've really enjoyed it! Tl;dr I need to do a lot of creative knowledge work, and have recently fa...

“On “Model Organisms”” by J Bostock

19 Jun 2026

Contributed by Lukas

This post was written while working for Arcadia Impact's Alignment Team (and grew out of an internal talk I gave) but is my own opinion and not their...

“The distillation double bind: Distilling misaligned models either transfers misalignment or it doesn’t” by Alek Westover, SebastianP, Alexa Pan, Jozdien

18 Jun 2026

Contributed by Lukas

Suppose we have a dangerous misaligned AI that can fool alignment audits, and distill it into a student model. Two things can happen: Misalignment do...

“AI #173: AI Pauses” by Zvi

18 Jun 2026

Contributed by Lukas

A lot of things are always happening. Only one story matters. Claude Fable 5 and Claude Mythos 5 were shut down, by the White House, via an impositi...

″“Did you lie?” Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms” by Alan Cooney, David Africa, Geoffrey Irving

18 Jun 2026

Contributed by Lukas

TL;DR.  Lie detectors for LLMs could be valuable for auditing and monitoring. But evaluating them requires testbeds where the model verifia...

“Contra Pace on When to Apologize” by Zack_M_Davis

18 Jun 2026

Contributed by Lukas

BOJACK: Hey, I wanted to talk to you about—you know—I feel bad about what happened. HERB: So, you're apologizing. BOJACK: Yes. I'm sorry. HERB...

“GDM AI Control Roadmap” by Mary Phuong, Erik Jenner, Rohin Shah, Seb Farquhar

18 Jun 2026

Contributed by Lukas

GDM has published an AI Control Roadmap! From the executive summary: We present the GDM AI Control Roadmap (v0.1) – our plan for implementing and ...

“Your Model Organisms Might Be Fried” by Daniel Tan, J Bostock, draganover, ma-rmartinez, sidbaines, David Africa

18 Jun 2026

Contributed by Lukas

Context: We are the ‘model motivations’ team at Arcadia Alignment. We aim to build a science of ‘model intentions’, unifying insights from pe...

“Rational Agentic Maximalist Philosophies” by Connor Blake

18 Jun 2026

Contributed by Lukas

From the end of high school to after my sophomore year of college, I considered myself an effective altruist. I was on the board of my college EA clu...

“Leveraged on being right” by Ben Pace

18 Jun 2026

Contributed by Lukas

A friend once shared an essay with me for feedback. It struck me as mistaken and terribly naive, and I said so, which they did not take well. (They d...

“Gears for political races” by Tom Smith

17 Jun 2026

Contributed by Lukas

In the past few years, many people around me have tried to convince me that US electoral politics is important. But like many other people in the com...

“Several frontier models are substantially prefill aware” by yeedrag, Parv Mahajan, David Africa, alexsouly, Jordan Taylor, RobertKirk

17 Jun 2026

Contributed by Lukas

This blog post discusses work in a recently-published paper. However, this blogpost was primarily written by Parv Mahajan and Andy Wang, and several ...

“Alignement pretraining could backfire” by Alexandre Variengien

17 Jun 2026

Contributed by Lukas

Epistemic status: speculative, but I think the mechanism is plausible. There has been recent interest in generating synthetic documents to upsample ...

“The Financial Ledger Theory of Apologies” by Ben Pace

17 Jun 2026

Contributed by Lukas

Content note: this is written as part of a daily writing challenge for myself. I have a comrade in rationalist event organizing, who once explained h...

“The Once And Future Fable #3: Fix This Code” by Zvi

17 Jun 2026

Contributed by Lukas

The mainstream media continues to sleep on the most important story in the world. It has now been two days since Anthropic flew its people out to Wa...

[Linkpost] “Scaling Hypothesis #2: Are Humans Just More Over-Parameterized?” by gwern

17 Jun 2026

Contributed by Lukas

This is a link post. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are ...

[Linkpost] “Guardian Angels: LLM Personalization for Productivity and Security” by gwern

17 Jun 2026

Contributed by Lukas

This is a link post. Powerful LLMs will be deployed at global scale in the next few years, and will dominate the Internet, and increasingly, ordinary ...

“Predicting LLM Safety Before Release by Simulating Deployment” by Tomek Korbak, Marcus Williams, micahcarroll, Cameron Raymond, Hannah Sheahan

17 Jun 2026

Contributed by Lukas

Paper link Before releasing a new model, labs need to understand not just what it can do, but how it is likely to behave in real-world use, including...

“How the AI Village works” by Adam B

17 Jun 2026

Contributed by Lukas

The AI Village data - over a year of multi-agent trajectories - is now available to researchers on HuggingFace! We're excited to see what you uncover...

“What are some angles of attack for making continual learning safer?” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd

16 Jun 2026

Contributed by Lukas

This is the fourth post in the sequence Implications of Continual Learning for LLM Agents. Summary Continual learning is a capability that largely do...

“Fable and Mythos: Model Welfare” by Zvi

16 Jun 2026

Contributed by Lukas

Fable and Mythos are currently unavailable, but likely will return within a few weeks. I will continue to cover that fiasco, but in the meantime I wi...

“Does preservation make sense before we know how to revive?” by Aurelia

16 Jun 2026

Contributed by Lukas

My name is Aurelia Song and I hope to make whole-body, human, end-of-life preservation for future revival a new global tradition. I care about it so ...

“Synthetic document finetuning for instilling positive traits” by CallumMcDougall, Arthur Conmy, Neel Nanda

16 Jun 2026

Contributed by Lukas

This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adj...

“A Test Suite for Concepts” by Gretta Duleba

16 Jun 2026

Contributed by Lukas

Lately I’ve been spinning up on natural abstractions, and in particular on John Wentworth's work on natural latents. As I’ve been studying, I’v...

“The Once And Future Fable #2” by Zvi

15 Jun 2026

Contributed by Lukas

On Friday evening the United States Government has forced Anthropic to take down all access to Fable and Mythos. It's been a rough weekend. Dean W...

“A frontier AI company should shut down” by MichaelDickens

15 Jun 2026

Contributed by Lukas

Cross-posted from my website. Prior discussion: niplav's shortform (2025); Planning for Extreme AI Risks (2025) by Joshua Clymer A frontier AI comp...

“Why Do Naive SFT Filters For Safety Properties Fail?” by Josh Engels, Neel Nanda

14 Jun 2026

Contributed by Lukas

This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and ad...

“Impressions at the Extremity of Civilization” by Ben Pace

14 Jun 2026

Contributed by Lukas

Content note: this is part of a challenge of writing a blogpost per day for a week. Epistemic status: this is a series of vignettes written as-though...

“The Hidden Structures of Problems” by spencerg

14 Jun 2026

Contributed by Lukas

Problems have hidden, repeatable structures. Here's my attempt to name them: 1. Smashed Watch There are so many issues at once that fixing one has no...

“American Government Takes Down Claude Fable” by Zvi

13 Jun 2026

Contributed by Lukas

No good policy gets announced shortly after 5pm eastern on a Friday. Here we go again. The Once And Future Fable The United States Department ...

“How might continual learning affect safety and alignment?” by Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd

13 Jun 2026

Contributed by Lukas

This is the third post in our sequence Implications of Continual Learning for LLM Agents. Summary We argue that CL has two major potential safety imp...

“SFT Drives Gemini’s Safety Properties” by Josh Engels, Arthur Conmy, bilalchughtai, Neel Nanda

13 Jun 2026

Contributed by Lukas

This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adj...

“The term “AGI” is almost useless at this point [Linkpost]” by Noosphere89

13 Jun 2026

Contributed by Lukas

The reason I wanted to make this linkpost now rather than some other time is because discussions over AGI and whether or not LLMs are or aren't AGI, ...

“The Uncertainty That Matters Isn’t Fundamental” by jimmy

13 Jun 2026

Contributed by Lukas

I'm on board with a lot of Fundamental Uncertainty. Even some of the stuff that initially feels like a disagreement turns out not to be so. For examp...

[Linkpost] “US government directive to suspend access to Fable 5 and Mythos 5” by Capybasilisk

13 Jun 2026

Contributed by Lukas

This is a link post. --- First published: June 13th, 2026 Source: https://www.lesswrong.com/posts/f5avt6...

“Claude Fable 5 and Mythos 5: The System Card” by Zvi

12 Jun 2026

Contributed by Lukas

First things first: Claude Fable 5 is the new best publicly available model. I have noticed a step change, where Fable can suddenly help me in ways ...

“Citations Needed: Magic Encyclopedias to Save the World” by Oliver Sourbut

12 Jun 2026

Contributed by Lukas

Last week FLF launched a competition “to find the best workflows and methodologies for using AI to produce reliable, trustworthy knowledge bases”...

“Simulating Simulators” by kromem

12 Jun 2026

Contributed by Lukas

Author's note: This piece relates to things I initially discovered in Opus 4 over the months after release, which I’ve mostly kept private since. I...

“Implications of Continual Learning for LLM Agents: Introduction” by RohanS, Rauno Arike, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward, Seth Herd

12 Jun 2026

Contributed by Lukas

Many people think that continual learning (CL) is a key missing capability of LLM systems, and we think its development could have huge implications ...

“Reward Hacking at the 1937 World’s Fair” by frmsaul

12 Jun 2026

Contributed by Lukas

The "Paris 1937 World's Fair" was a dick measuring contest. At the time, the world was on the verge of the worst war in history. The fair was an oppo...

“Building and evaluating model diffing agents” by bilalchughtai, Josh Engels, Neel Nanda

12 Jun 2026

Contributed by Lukas

This is the second in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent ar...

“Sympathy for both sides of the egregious misalignment debate” by Steven Byrnes

12 Jun 2026

Contributed by Lukas

On one side of this debate is Yudkowsky & Soares, who think that (if AI progress continues) we’re on a direct path to egregiously-misaligned, s...

“Celene’s thoughts on consciousness” by ToasterLightning

12 Jun 2026

Contributed by Lukas

contra scott alexander (?) Yesterday, I went to the Berkeley ACX Meetup. Scott Alexander was there, and ran a Q&A session where participants coul...

“Parkinson’s Heuristic” by Ben Pace

12 Jun 2026

Contributed by Lukas

Parkinson's Law states that work expands to fit the space allotted. The idea being, if you give someone a month to write a report, they'll take a mon...

“PSA: Almost nobody is working on alignment” by Chi Nguyen, peterbarnett

12 Jun 2026

Contributed by Lukas

People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not...

“AI #172: The First Fable” by Zvi

11 Jun 2026

Contributed by Lukas

A lot happened this week, including a great trip out to Lighthaven. The main event, the one that matters, was the release of Claude Fable 5. The pub...

“Models May Behave Worse When Eval Aware” by Senthooran Rajamanoharan, Neel Nanda

11 Jun 2026

Contributed by Lukas

This is the first in a series of research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent are...

“Thoughts on Claude Fable’s silent safeguards” by Andy Arditi

11 Jun 2026

Contributed by Lukas

[Thanks to Julian Minder for helpful discussion and review.] Claude Fable 5 and its new safeguards Yesterday, Anthropic publicly released Claude Fabl...

“You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them” by RobinHa

11 Jun 2026

Contributed by Lukas

Detecting Hidden Behaviors in LLMs via Activation-matched Finetuning — preprint, 2026. [Paper] [Code] TLDR. Given a model with some unknown, abnorm...

“Anthropic did not call for a pause on AI” by Andrea_Miotti, Gabriel Alfour

10 Jun 2026

Contributed by Lukas

Last week, the AI company Anthropic released a blog post titled “When AI builds itself”. This led to a media frenzy, with news outlets around the...

Page 1 of 17 Next → »»