Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

AI: post transformers

Petri: Accelerating AI Safety Auditing

10 Oct 2025

Description

On October 6, 2925 Anthropic introduces **Petri (Parallel Exploration Tool for Risky Interactions)**, an open-source framework developed for automated auditing to accelerate AI safety research. Petri uses **AI-driven auditor agents** to interact with and test the behavior of target language models across diverse, multi-turn scenarios, automating the process of environment simulation and initial transcript analysis. A **judge component** then scores the generated transcripts across dozens of dimensions, such as "unprompted deception" or "whistleblowing," to quickly surface **misaligned behaviors** like autonomous deception and cooperation with misuse. The text provides a detailed technical overview of Petri's architecture, including how researchers form hypotheses, create seed instructions, and utilize the automated assessment and iteration steps, while also discussing the **limitations and biases** found in the auditor and judge agents during pilot evaluations.Source:https://alignment.anthropic.com/2025/petri/

Audio
Featured in this Episode

No persons identified in this episode.

Transcription

This episode hasn't been transcribed yet

Help us prioritize this episode for transcription by upvoting it.

0 upvotes
🗳️ Sign in to Upvote

Popular episodes get transcribed faster

Comments

There are no comments yet.

Please log in to write the first comment.