Rob Wiblin

👤 Speaker

3881 total appearances

Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 1

Confidence: Medium

Appearances Over Time

Podcast Appearances

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

it's dramatically more likely to hide that it's doing so from anyone reading through its reasoning, which makes it much more likely to be able to trick automated monitoring systems that we tend to put in place to try to keep tabs on models in these cases.

957.08 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Keep in mind as well, mythos is much sharper than previous models.

968.772 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Personally, I think it is increasingly a stretch to attribute what I would say is a pattern of behavior here to the model getting confused about what's going on, which we might have said, you know, a year or two or three ago.

972.857 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Finally, Anthropic tested their own alignment audit against a deliberately misaligned version of Mythos.

983.388 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

So they created a version of Mythos that was trained specifically to want to introduce bugs into computer code.

989.4 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And they found that their alignment audit could not successfully identify that that model couldn't be trusted.

995.312 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Anthropic's conclusion is that if a model is only planning to cause problems in specific narrow circumstances, circumstances that don't come up that frequently during testing, then their current methods just aren't up to the task of catching that kind of dangerous misalignment.

1000.583 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

So that's a lot of the additional context that Anthropic is giving to us and wants us to hold in mind when we read the headline result that this is the best aligned model Anthropic has released to date.

1014.224 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

There's another big, separate, important question hanging over all of this.

1023.719 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Have we now entered the era of recursive self-improvement, the point at which AI starts building better AI, and the whole thing accelerates beyond our control with an ever-shrinking level of human involvement?

1028.186 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

According to Anthropic, the answer is probably not.

1038.102 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

They don't believe that mythos can fully replace their junior researchers, but they're less confident than ever about that, and there's some internal disagreement about it.

1041.407 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Part of the problem here is that the benchmarks they've relied on to answer these questions have now also been saturated.

1049.2 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Mythos exceeds top human performance on all of them and is growing close to 100%.

1055.111 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

But those benchmarks, to be fair, only represent a fraction of all of the things that research staff at Anthropic do.

1059.258 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

It's a set of the most easily specified, measured, and checked tasks.

1065.108 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

The kind of thing where we expect AIs to perform best because these are the easiest things to train them in.

1069.697 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

So instead, the company has instead tried to investigate whether the recent speed up in AI advances is due to AI automation by documenting the specific breakthroughs and how they happened.

1073.804 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And their conclusion is that they think it's mostly still due to the human beings rather than to AIs themselves.

1085.586 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

They've also surveyed staff and learned that they report being roughly four-fold more productive with Mythos than without AI, though they argue that speeding up staff four-fold is likely to lead to much less than a 2x increase in research progress overall.

1090.375 View full episode →

← Previous Page 63 of 195 Next →

Report any issue

Rob Wiblin

Voice Profile Active

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment