Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI: post transformers

Anthropic: Introspective Awareness in LLMs

31 Oct 2025

Description

On October 29, 2025 Anthropic presented research investigating the existence of **functional introspective awareness** in large language models (LLMs), specifically focusing on Anthropic's Claude models. The core methodology involves using **concept injection**, where researchers manipulate a model's internal activations with representations of specific concepts to see if the model can accurately **report on these altered internal states**. Experiments demonstrate that models can, at times, notice injected "thoughts," distinguish these internal representations from text inputs, detect when pre-filled outputs were unintentional by referring to prior intentions, and even **modulate their internal states** when instructed to "think about" a concept. The findings indicate that while this introspective capacity is often **unreliable and context-dependent**, the most capable models, such as Claude Opus 4 and 4.1, exhibit the strongest signs of this ability, suggesting it may emerge with increased model sophistication.Source:https://transformer-circuits.pub/2025/introspection/index.html

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.