Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Rob Wiblin

πŸ‘€ Speaker
3881 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1
Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

it's dramatically more likely to hide that it's doing so from anyone reading through its reasoning, which makes it much more likely to be able to trick automated monitoring systems that we tend to put in place to try to keep tabs on models in these cases.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Keep in mind as well, mythos is much sharper than previous models.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Personally, I think it is increasingly a stretch to attribute what I would say is a pattern of behavior here to the model getting confused about what's going on, which we might have said, you know, a year or two or three ago.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Finally, Anthropic tested their own alignment audit against a deliberately misaligned version of Mythos.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

So they created a version of Mythos that was trained specifically to want to introduce bugs into computer code.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

And they found that their alignment audit could not successfully identify that that model couldn't be trusted.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Anthropic's conclusion is that if a model is only planning to cause problems in specific narrow circumstances, circumstances that don't come up that frequently during testing, then their current methods just aren't up to the task of catching that kind of dangerous misalignment.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

So that's a lot of the additional context that Anthropic is giving to us and wants us to hold in mind when we read the headline result that this is the best aligned model Anthropic has released to date.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

There's another big, separate, important question hanging over all of this.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Have we now entered the era of recursive self-improvement, the point at which AI starts building better AI, and the whole thing accelerates beyond our control with an ever-shrinking level of human involvement?

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

According to Anthropic, the answer is probably not.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

They don't believe that mythos can fully replace their junior researchers, but they're less confident than ever about that, and there's some internal disagreement about it.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Part of the problem here is that the benchmarks they've relied on to answer these questions have now also been saturated.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

Mythos exceeds top human performance on all of them and is growing close to 100%.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

But those benchmarks, to be fair, only represent a fraction of all of the things that research staff at Anthropic do.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

It's a set of the most easily specified, measured, and checked tasks.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

The kind of thing where we expect AIs to perform best because these are the easiest things to train them in.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

So instead, the company has instead tried to investigate whether the recent speed up in AI advances is due to AI automation by documenting the specific breakthroughs and how they happened.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

And their conclusion is that they think it's mostly still due to the human beings rather than to AIs themselves.

80,000 Hours Podcast
How scary is Claude Mythos? 303 pages in 21 minutes

They've also surveyed staff and learned that they report being roughly four-fold more productive with Mythos than without AI, though they argue that speeding up staff four-fold is likely to lead to much less than a 2x increase in research progress overall.