Ryan Kidd

When models are learning from the inputs data stream and learning some like a new complicated thing, it seems a lot like they're kind of optimizing over the input tokens or rather optimizing to produce an output.

933.303 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

So we might be in for a world where AI systems do spontaneously gain these MISA optimizers and these things, you know, these are a serious source of concern because they're, you know, very powerfully trying to optimize for some objective and, and, uh,

946.184 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

This is the main concern I have, I guess, is that we have this kind of deception model, this inner alignment failure, perhaps, where AI systems acquire goals spontaneously, or maybe because they're being trained deliberately to be power-seeking and make money on the internet, and then they decide to hide, and we don't have interpretability tools good enough to detect them.

959.358 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

So I guess I haven't really changed my fear about this scenario eventuating, but I have become more confident that we can elicit useful work from AI systems before we see obvious signs of this, I'll say.

980.401 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I'm pretty confident that AI systems right now are not executing

992.402 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

very powerful scheming against us because I think we would see some sort of like warning shots.

995.487 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like I don't think it's going to be as like night and day.

1000.138 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

You know, I think we'll see some kind of situations where AI systems are kind of trying to scheme in really dumb ways before they try and scheme in like very competent, difficult ways.

1002.564 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Does that make sense?

1012.227 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

I don't think people should be going to battle with AIs for many reasons.

1177.64 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

I think that's a pretty bad social more to...