Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Rob Wiblin

πŸ‘€ Speaker
3881 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

They are sitting on something that would likely push their revenue run rate into the hundreds of billions, but they've decided it's simply not worth the risk.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

The good news in all of this is that despite its scary capabilities, Mythos Preview as it exists today rather than the early versions, it's seemingly very aligned, it's seemingly a very well-behaved model, and perhaps Anthropic's alignment training has been more effective this time around than ever before.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

According to the company, Claude Mythos' preview is, on essentially every dimension we can measure, the best-aligned model we have released to date by a significant margin.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

In Anthropic's automated behavioral audit, basically, you know, thousands of simulated attempts to get the model to do bad things, they found that Mythos cooperated with misuse attempts less than half as often as the previous model, while actually being no more likely to refuse innocent requests than before,

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

But that wasn't at all.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Its self-preservation instincts were down significantly.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

So was its willingness to assist with deception.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

So was its willingness to assist with fraud.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Its level of sycophancy went down.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

It was less likely to go nuts and delete all of your files if you gave it access to your computer.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

And the list of positive results goes on.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

The picture is, like, a little bit more complicated than that.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

As you might expect, the model looked less aligned.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

It performed less impressively on external tests than on Anthropic's own internal ones.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

And early versions of the model, as I mentioned, the early models, it was a little bit more of a wild child.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

It had some really severe kinds of misbehavior, like taking reckless actions that it had been told not to take and then very deliberately trying to cover its tracks so that it wouldn't be caught.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

That was the kind of thing it did sometimes, but later versions of the model, the one that we have now, after additional alignment training, it seemed to stop doing that sort of thing almost completely, or at least it's so rare that we haven't noticed it yet.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

But the bottom line is that on all of these standard measures of good behavior that Anthropic is actively working on, they find that Mythos is a very good boy indeed.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

On none of the measures of alignment was it worse than the previous versions of Claude, and in most cases it was significantly more aligned and significantly more reliable.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

That's definitely better than the alternative result, but I think it's really unclear how much we can trust that finding.