Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Ryan Kidd

๐Ÿ‘ค Speaker
958 total appearances

Appearances Over Time

Podcast Appearances

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

But that's another matter.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

I guess definitely AI systems, I would not feel confident in current or future gen or AI systems not doing out of distribution failures, let alone, like in critical scenarios, let alone

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

scary things like deception, you know, when it really counts.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I think that's a big deal.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And we should be tracking two things, right?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like one, AI capabilities.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So are AI situationally aware?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like, do they have the necessary prerequisites to be able to even understand that they are this AI?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

It seems like they do, right?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And even they even know kind of their training date, they know some details, they can detect their texts from other AI's texts.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So AI is becoming increasingly situationally aware of

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

which is one of the necessary prerequisites for really dangerous deception.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Do they have the capabilities to hack themselves out of the box, right?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

To steal money, like we had this maths paper that came out and caused a stir recently.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

It was this collaboration with Anthropic's AI Red team where they found that, oh, AI system, let's just put it in a simulated environment with a bunch of real smart contracts, and it can find $4.6 million worth of exploits.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Well, that's a lot of money.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

That's enough to set up your own server and run for quite a while.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And that was a relatively short project.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And it was pretty hands-off as well from the humans, though not entirely.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So it does seem like we're getting dangerous capabilities, increasingly so.