Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AXRP - the AI X-risk Research Podcast

38.5 - Adrià Garriga-Alonso on Detecting AI Scheming

20 Jan 2025

Description

Suppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/01/20/episode-38_5-adria-garriga-alonso-detecting-ai-scheming.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch The Alignment Workshop: https://www.alignment-workshop.com/   Topics we discuss, and timestamps: 01:04 - The Alignment Workshop 02:49 - How to detect scheming AIs 05:29 - Sokoban-solving networks taking time to think 12:18 - Model organisms of long-term planning 19:44 - How and why to study planning in networks   Links: Adrià's website: https://agarri.ga/ An investigation of model-free planning: https://arxiv.org/abs/1901.03559 Model-Free Planning: https://tuphs28.github.io/projects/interpplanning/ Planning in a recurrent neural network that plays Sokoban: https://arxiv.org/abs/2407.15421   Episode art by Hamish Doodles: hamishdoodles.com

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.