Inside every AI is a bigger AI, trying to get out You've probably heard AI is a "black box". No one knows how it works. Researchers simulate a weird type of pseudo-neural-tissue, "reward" it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want. But God only knows what goes on inside of it. This is bad for safety. For safety, it would be nice to look inside the AI and see whether it's executing an algorithm like "do the thing" or more like "trick the humans into thinking I'm doing the thing". But we can't. Because we can't look inside an AI at all. Until now! Towards Monosemanticity, recently out of big AI company/research lab Anthropic, claims to have gazed inside an AI and seen its soul. It looks like this: https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand
No persons identified in this episode.
This episode hasn't been transcribed yet
Help us prioritize this episode for transcription by upvoting it.
Popular episodes get transcribed faster
Other episodes from Astral Codex Ten Podcast
Transcribed and ready to explore now
Your Review: Joan of Arc
07 Aug 2025
Astral Codex Ten Podcast
Book Review: Selfish Reasons To Have More Kids
03 Jun 2025
Astral Codex Ten Podcast
Links For February 2025
11 Mar 2025
Astral Codex Ten Podcast
The Emotional Support Animal Racket
28 May 2024
Astral Codex Ten Podcast
The Psychopolitics Of Trauma
27 Jan 2024
Astral Codex Ten Podcast
Book Review: A Clinical Introduction To Lacanian Psychoanalysis
27 Apr 2022
Astral Codex Ten Podcast