Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Ryan Kidd

๐Ÿ‘ค Speaker
958 total appearances

Appearances Over Time

Podcast Appearances

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

But currently, it seems like we don't...

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

AI systems are really messy.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

They're kludgy, right?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like human brains.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

They have a bunch of contextually activated heuristics, right?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

It sees there's a bracket there, and it's like, oh, maybe I'll put another bracket there.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

It's a very simple, dumb circuitry.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

But then sometimes it does stuff that is, like in context learning, seems a lot like it's actually pattern matching to gradient descent.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

When models are learning from the inputs data stream and learning some like a new complicated thing, it seems a lot like they're kind of optimizing over the input tokens or rather optimizing to produce an output.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So we might be in for a world where AI systems do spontaneously gain these MISA optimizers and these things, you know, these are a serious source of concern because they're, you know, very powerfully trying to optimize for some objective and, and, uh,

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

This is the main concern I have, I guess, is that we have this kind of deception model, this inner alignment failure, perhaps, where AI systems acquire goals spontaneously, or maybe because they're being trained deliberately to be power-seeking and make money on the internet, and then they decide to hide, and we don't have interpretability tools good enough to detect them.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So I guess I haven't really changed my fear about this scenario eventuating, but I have become more confident that we can elicit useful work from AI systems before we see obvious signs of this, I'll say.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I'm pretty confident that AI systems right now are not executing

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

very powerful scheming against us because I think we would see some sort of like warning shots.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like I don't think it's going to be as like night and day.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

You know, I think we'll see some kind of situations where AI systems are kind of trying to scheme in really dumb ways before they try and scheme in like very competent, difficult ways.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Does that make sense?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

I don't think people should be going to battle with AIs for many reasons.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

I think that's a pretty bad social more to...

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

to set, to allow that kind of thing.