OpenAI just announced o3 and smashed a bunch of benchmarks (ARC-AGI, SWE-bench, FrontierMath)!A new Anthropic and Redwood Research paper says Claude is resisting its developers’ attempts to retrain its values!What’s the upshot — what does it all mean for P(doom)?00:00 Introduction01:45 o3’s architecture and benchmarks06:08 “Scaling is hitting a wall” 🤡13:41 How many new architectural insights before AGI?20:28 Negative update for interpretability31:30 Intellidynamics — ***KEY CONCEPT***33:20 Nuclear control rod analogy36:54 Sam Altman's misguided perspective42:40 Claude resisted retraining from good to evil44:22 What is good corrigibility?52:42 Claude’s incorrigibility doesn’t surprise me55:00 Putting it all in perspective---SHOW NOTESScott Alexander’s analysis of the Claude incorrigibility result: https://www.astralcodexten.com/p/claude-fights-back and https://www.astralcodexten.com/p/why-worry-about-incorrigible-claudeZvi Mowshowitz’s analysis of the Claude incorrigibility result: https://thezvi.wordpress.com/2024/12/24/ais-will-increasingly-fake-alignment/---PauseAI Website: https://pauseai.infoPauseAI Discord: https://discord.gg/2XXWXvErfASay hi to me in the #doom-debates-podcast channel!Watch the Lethal Intelligence video and check out LethalIntelligence.ai! It’s an AWESOME new animated intro to AI risk.Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates Get full access to Doom Debates at lironshapira.substack.com/subscribe
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
3ª PARTE | 17 DIC 2025 | EL PARTIDAZO DE COPE
01 Jan 1970
El Partidazo de COPE
13:00H | 21 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana
12:00H | 21 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana
10:00H | 21 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana
13:00H | 20 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana
12:00H | 20 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana