Ryan Kidd

Can AI Do Our Alignment Homework? (with Ryan Kidd)

So from an interpretability perspective,

2086.833 View full episode →

Can AI Do Our Alignment Homework? (with Ryan Kidd)

I don't think you need to be pushing the frontier that much, if at all, from the perspective of other types of research agendas, such as like weak to strong generalization and other types of AI control and skilled website things.

2089.358 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

I think you kind of do need more data points.

2104.36 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

I'm not saying we've exhausted everything you can do with the current models.

2108.63 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Far, far from it.

2111.878 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

But I think you are going to need more data points to build up consistent and to see some of these kind of worrying things

2113.401 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

behaviors emerge where your weaker model can't actually supervise your strong model in all situations, which, by the way, is predicated on this idea that verification is easier than generation, P versus NP, blah, blah, blah, especially if you can see the other person's thoughts and they can't see yours.

2120.678 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

So I think it does make sense to be at the frontier from that perspective, but I will say that I think the main reason

2136.48 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

that the companies are doing this is obviously to make money.

2142.751 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And then, in corollary of that, from a safety perspective, if you were trying to actually make a strong case for being at the frontier, it would be like, so our models are performance competitive,

2146.261 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And that they're close enough to the frontier, like fast follower kind of model, that you take a performance hit by using ours instead of the competitors, but they're safer.

2157.348 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And currently no one wants to use anything that's worse than the frontier model.

2168.03 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

Why would you?

2171.998 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

That's the best model.

2172.619 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

But if a model was like, I don't know, 10% more likely to...

2174.022 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

tell you to jump off a bridge or something, or actually seriously, 10% more likely to hack your bank account and steal all your money, let alone, I don't know, escape and make a bioweapon.

2179.403 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

I would like to think people would use the less good model.

2191.543 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I like to think that regulators and insurers

2194.608 View full episode →

Future of Life Institute Podcast

Can AI Do Our Alignment Homework? (with Ryan Kidd)

could adequately penalize the frontier companies into complying with those.