Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Ryan Kidd

๐Ÿ‘ค Speaker
958 total appearances

Appearances Over Time

Podcast Appearances

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Honestly, Nathan, I'm pretty confused.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like, I do think that, you know, contrary to expectations, we are looking like we're... Language models understand our values, right?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

That's the first thing to update.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like, they understand them in some key sense.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

It's not just regurgitating like sarcastic parrots.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Language models are...

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

really good at understanding human ethical mores and extrapolating on them in some scenarios.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

They're also really good at sycophancy.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

They're getting even better at deception, sophisticated deception.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

They do tend to deceive users in the right circumstances, though it does seem like

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

There's some debate about this, and it seems like some of the deception, it's far from what we might call consequentialist, like hardcore consequentialist deception in most situations.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

But I think alignment faking and some other papers have shown that you can create situations where AIs will deceive the user to achieve some ulterior objective, which was something that was deliberately given to the AI as an objective.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So that's one of the constraints of these little

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So I don't think there are any examples of AIs coherently deceiving users, like pursuing this like coherent objective, right?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Not just what we might call good heart deception, where they just kind of, they like fall into deception tendencies because of the limitations of training data.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I'm talking about this coherent deception.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

There are a few cases of this, if any, where this is like the sustained coherent deception scenario.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

that appears to arise spontaneously through the training process, which is pretty good given the level of capabilities we have.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

It seems like people didn't think five or 10 years ago for sure that we'd have AIs that are as capable as assisting frontier science

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

that are safe to deploy.