Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Ryan Kidd

๐Ÿ‘ค Speaker
958 total appearances

Appearances Over Time

Podcast Appearances

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So from an interpretability perspective,

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

I don't think you need to be pushing the frontier that much, if at all, from the perspective of other types of research agendas, such as like weak to strong generalization and other types of AI control and skilled website things.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

I think you kind of do need more data points.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

I'm not saying we've exhausted everything you can do with the current models.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Far, far from it.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

But I think you are going to need more data points to build up consistent and to see some of these kind of worrying things

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

behaviors emerge where your weaker model can't actually supervise your strong model in all situations, which, by the way, is predicated on this idea that verification is easier than generation, P versus NP, blah, blah, blah, especially if you can see the other person's thoughts and they can't see yours.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

So I think it does make sense to be at the frontier from that perspective, but I will say that I think the main reason

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

that the companies are doing this is obviously to make money.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And then, in corollary of that, from a safety perspective, if you were trying to actually make a strong case for being at the frontier, it would be like, so our models are performance competitive,

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And that they're close enough to the frontier, like fast follower kind of model, that you take a performance hit by using ours instead of the competitors, but they're safer.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And currently no one wants to use anything that's worse than the frontier model.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Why would you?

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

That's the best model.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

But if a model was like, I don't know, 10% more likely to...

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

tell you to jump off a bridge or something, or actually seriously, 10% more likely to hack your bank account and steal all your money, let alone, I don't know, escape and make a bioweapon.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

I would like to think people would use the less good model.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

And I like to think that regulators and insurers

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

could adequately penalize the frontier companies into complying with those.

Future of Life Institute Podcast
Can AI Do Our Alignment Homework? (with Ryan Kidd)

Because then you have existence proofs, like, oh, my product, I'm actually trying, right?