Ryan Kidd

Can AI Do Our Alignment Homework? (with Ryan Kidd)

But that's another matter.

Can AI Do Our Alignment Homework? (with Ryan Kidd)

I guess definitely AI systems, I would not feel confident in current or future gen or AI systems not doing out of distribution failures, let alone, like in critical scenarios, let alone

1188.774 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

scary things like deception, you know, when it really counts.

1203.674 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I think that's a big deal.

1207.841 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And we should be tracking two things, right?

1209.664 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like one, AI capabilities.

1211.387 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

So are AI situationally aware?

1213.13 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like, do they have the necessary prerequisites to be able to even understand that they are this AI?

1215.915 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

It seems like they do, right?

1221.324 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And even they even know kind of their training date, they know some details, they can detect their texts from other AI's texts.

1222.406 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

So AI is becoming increasingly situationally aware of

1227.115 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

which is one of the necessary prerequisites for really dangerous deception.

1229.678 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Do they have the capabilities to hack themselves out of the box, right?

1233.229 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

To steal money, like we had this maths paper that came out and caused a stir recently.

1237.402 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

It was this collaboration with Anthropic's AI Red team where they found that, oh, AI system, let's just put it in a simulated environment with a bunch of real smart contracts, and it can find $4.6 million worth of exploits.

1242.267 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Well, that's a lot of money.

1256.228 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

That's enough to set up your own server and run for quite a while.

1257.19 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And that was a relatively short project.