Rob Wiblin

It found a 27-year-old flaw in the world's most security-hardened operating system that would, in effect, let it crash all kinds of essential infrastructure.

21.838 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Engineers at the company who had no particular security training, they would ask the AI model to find vulnerabilities overnight and then wake up to working exploits of critical security flaws that could be used to cause real harm.

29.647 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And it managed to figure out how to build web pages that were visited by fully updated, fully patched computers, would allow it to write to the operating system kernel, which is the most important and protected layer of any computer.

41.183 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

We know all of this because Anthropic has released hundreds of pages of documentation about this model, which they've called Claude Mythos.

51.618 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

I'm going to take you on a tour of all of the crazy shit buried in these documents, and then I'm going to tell you what Anthropic says they plan to do to save us from their creation.

57.89 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

So how good is Mythos at hacking into computers?

67.177 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Well, unfortunately, it saturates all existing ways of testing how good a model is at offensive cyber capabilities.

71.063 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

That is to say, it scores close to 100%.

76.391 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

So those tests, they can't effectively tell us how far its capabilities extend anymore.

78.214 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

So to test Mythos, Anthropic has instead just been setting it loose, telling it to find serious unknown exploits that would work on currently used fully patched computer systems.

83.462 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

And the end result of that is that Nicholas Carlini, one of the world's leading security researchers who moved to Anthropica a year ago, he says that he has found more bugs in the last few weeks of Mythos than in the entire rest of his life combined.

93.417 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

For example, Mythos found a 17-year-old floor in FreeBSD, that's an operating system mostly used to run servers, that would let an attacker take complete control of any machine on the network without needing a password or any credentials at all.

105.431 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

The model found the necessary flaw and then it went ahead and built a working exploit fully autonomously.

117.565 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

Mythos also found a 16 year old vulnerability in FFmpeg.

122.471 View full episode →

80,000 Hours Podcast

How scary is Claude Mythos? 303 pages in 21 minutes

That is a piece of software used by almost all devices to encode and decode video.

126.577 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment