Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Ryan Kidd

๐Ÿ‘ค Speaker
958 total appearances

Appearances Over Time

Podcast Appearances

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Hacking out of the box, getting money, getting influence, all that kind of stuff.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I think we want to be tracking all of that very closely.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I don't think we're at red lines currently, but we are approaching them, I would say.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And separately, I think we should be tracking, as you say, this model organisms work where we try to elicit dangerous behavior from AIs.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

You can think of this as like with your child, right?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

You leave some cookies out and you're like, don't eat the cookies.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And then you like turn away, but you're like secretly looking.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And if they eat the cookies, I caught you.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So that's the kind of thing we're doing.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

The thing is that AI systems can really detect when they're in training versus real environments.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

But if you recall the AI 2027 scenario, a lot of this...

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

yet like discussion around that people are talking about online learning like it's like we train open router trains the last big AI agent and then from then on it's just like kind of constantly learning online through some sort of RL paradigm if AI systems are perpetually online then they're always in deployment and you've got to have monitors and control protocols

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So that's why control research is so important, especially for the early days, to catch some of these slip-ups.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So you can do all the model organisms work you want in the lab, and that's like one layer of defense to see if we have these capabilities or these penchants for deception emerge.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And separately, you need to have all the control evals studying them as they're deployed.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

especially if they're going to be learning online, perhaps updating their behaviors, and just be constantly checking for this stuff.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Be ready.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Have like a fallback plan, a rapid response plan.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

What are you going to do if actually you see serious warning signs?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Can you shut the models down?