Ryan Kidd

👤 Speaker

958 total appearances

Appearances Over Time

Podcast Appearances

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Honestly, Nathan, I'm pretty confused.

715.413 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like, I do think that, you know, contrary to expectations, we are looking like we're... Language models understand our values, right?

718.861 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

That's the first thing to update.

726.56 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Like, they understand them in some key sense.

727.523 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

It's not just regurgitating like sarcastic parrots.

729.668 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Language models are...

731.793 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

really good at understanding human ethical mores and extrapolating on them in some scenarios.

733.604 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

They're also really good at sycophancy.

740.316 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

They're getting even better at deception, sophisticated deception.

741.879 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

They do tend to deceive users in the right circumstances, though it does seem like

745.245 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

There's some debate about this, and it seems like some of the deception, it's far from what we might call consequentialist, like hardcore consequentialist deception in most situations.

751.424 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

But I think alignment faking and some other papers have shown that you can create situations where AIs will deceive the user to achieve some ulterior objective, which was something that was deliberately given to the AI as an objective.

762.259 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

So that's one of the constraints of these little

776.159 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

So I don't think there are any examples of AIs coherently deceiving users, like pursuing this like coherent objective, right?

778.222 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Not just what we might call good heart deception, where they just kind of, they like fall into deception tendencies because of the limitations of training data.

785.67 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I'm talking about this coherent deception.

793.699 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

There are a few cases of this, if any, where this is like the sustained coherent deception scenario.

795.981 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

that appears to arise spontaneously through the training process, which is pretty good given the level of capabilities we have.

801.087 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

It seems like people didn't think five or 10 years ago for sure that we'd have AIs that are as capable as assisting frontier science

809.059 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

that are safe to deploy.

817.912 View full episode →

← Previous Page 5 of 48 Next →

Report any issue