Doom Debates
We Found AI's Preferences — What David Shapiro MISSED in this bombshell Center for AI Safety paper
21 Feb 2025
The Center for AI Safety just dropped a fascinating paper — they discovered that today’s AIs like GPT-4 and Claude have preferences! As in, coherent utility functions. We knew this was inevitable, but we didn’t know it was already happening.This episode has two parts:In Part I (48 minutes), I react to David Shapiro’s coverage of the paper and push back on many of his points.In Part II (60 minutes), I explain the paper myself.00:00 Episode Introduction05:25 PART I: REACTING TO DAVID SHAPIRO10:06 Critique of David Shapiro's Analysis19:19 Reproducing the Experiment35:50 David's Definition of Coherence37:14 Does AI have “Temporal Urgency”?40:32 Universal Values and AI Alignment49:13 PART II: EXPLAINING THE PAPER51:37 How The Experiment Works01:11:33 Instrumental Values and Coherence in AI01:13:04 Exchange Rates and AI Biases01:17:10 Temporal Discounting in AI Models01:19:55 Power Seeking, Fitness Maximization, and Corrigibility01:20:20 Utility Control and Bias Mitigation01:21:17 Implicit Association Test01:28:01 Emailing with the Paper’s Authors01:43:23 My TakeawayShow NotesDavid’s source video: https://www.youtube.com/watch?v=XGu6ejtRz-0The research paper: http://emergent-values.aiWatch the Lethal Intelligence Guide, the ultimate introduction to AI x-risk! https://www.youtube.com/@lethal-intelligencePauseAI, the volunteer organization I’m part of: https://pauseai.infoJoin the PauseAI Discord — https://discord.gg/2XXWXvErfA — and say hi to me in the #doom-debates-podcast channel!Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at https://doomdebates.com and to https://youtube.com/@DoomDebates Get full access to Doom Debates at lironshapira.substack.com/subscribe
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other recent transcribed episodes
Transcribed and ready to explore now
3ª PARTE | 17 DIC 2025 | EL PARTIDAZO DE COPE
01 Jan 1970
El Partidazo de COPE
13:00H | 21 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana
12:00H | 21 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana
10:00H | 21 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana
13:00H | 20 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana
12:00H | 20 DIC 2025 | Fin de Semana
01 Jan 1970
Fin de Semana