The Neuron: AI Explained

Panic or Progress? Reading Between the Lines of AI Safety Tests

26 Jun 2025

1h 16m

12748 words

3 speakers

26 Jun 2025

Audio

Description

In Ep 2 we ask: "Panic or Progress? Reading Between the Lines of AI Safety Tests." We unpack the recent Claude Opus 4 "blackmail" test result, OpenAI's new transparency pledge, and why safety evaluations sometimes sound scarier than they are. Listeners will leave with a clear framework for interpreting headline-grabbing safety reports—and practical advice on when to worry, when to wait, and how to separate red flags from red herrings.

Featured

Grant Harvey

Corey Knowles

Unknown

Topics

Anthropic technological singularity singularity alignment

Transcription

Full Episode

0.031 - 36.211 Corey Knowles

Before you run off to build a bunker or something after the recent clawed chat GPT safety test stories, let's better understand what scary test results actually mean. All right, welcome, humans, to the second episode of the Neuron Podcast. I'm Corey Knowles, editor of the Neuron, and joining me is our resident wordslinger, Grant Harvey. How's it going today, Grant?

36.231 - 46.662 Grant Harvey

Good, good. It's a lovely Tuesday morning. Nothing at all weird is happening in the rest of the world, and we're just happy to be here talking about AI.

47.083 - 71.567 Corey Knowles

Just a totally normal day in a normal world. I agree. Well, today we're going to take a deep dive here into recent revelations of some wild stories that came out of Clawed 4 Opus testing in particular. But it's not just Clawed. It happens with other models as well. How nervous should that make us is kind of the gist of what we're trying to get at here.

71.547 - 89.084 Corey Knowles

Could instances like this actually be good news? So we're going to unpack the now somewhat infamous Claude Foropus blackmail safety test and OpenAI's pledge to publish its internal safety evaluations. There are a number of things at play in here, and it's a super interesting subject.

89.144 - 108.135 Corey Knowles

So by the end of this episode, we hope you'll have some practical advice on when to worry, when to wait, and how to tell the difference between genuine red flags and red herrings. So let's dive in. Grant, can you kind of explain to us what is AI safety testing?

108.318 - 132.483 Grant Harvey

Yeah, sure. As far as what AI safety is, I think a lot of people are familiar with Terminator. If you joke about robots and AI with ChatGPT, almost always there's a Skynet reference. Although these days, I don't even know if people, you know, Gen Z knows what Skynet is. I mean, I never watched any of the new Terminator movies, so I don't know if it's taken the zeitgeist by storm.

132.724 - 138.33 Corey Knowles

We're going to pause while you go watch T2 right now, Grant. I've seen T2.

138.35 - 158.305 Grant Harvey

I just haven't seen T3 or any of the other ones or the sequels or the requels, whatever you want to call it. T50, yeah. Generally speaking, AI safety is to make sure that there is never a Terminator situation or anything even remotely close to that, right? Yeah.

158.285 - 177.23 Grant Harvey

A more formal definition would be AI safety testing is this systematic practice of pushing on artificial intelligence systems, pushing them toward undesirable behaviors before and after it goes live to determine how it will react or behave in various circumstances.

The Neuron: AI Explained

Panic or Progress? Reading Between the Lines of AI Safety Tests

Full Episode

Sign in to Audioscrape

Share this moment

The Neuron: AI Explained

Panic or Progress? Reading Between the Lines of AI Safety Tests

Full Episode

Want to see the complete chapter?

Sign in to Audioscrape

Share this moment