Ryan Kidd
๐ค SpeakerAppearances Over Time
Podcast Appearances
So from an interpretability perspective,
I don't think you need to be pushing the frontier that much, if at all, from the perspective of other types of research agendas, such as like weak to strong generalization and other types of AI control and skilled website things.
I think you kind of do need more data points.
I'm not saying we've exhausted everything you can do with the current models.
Far, far from it.
But I think you are going to need more data points to build up consistent and to see some of these kind of worrying things
behaviors emerge where your weaker model can't actually supervise your strong model in all situations, which, by the way, is predicated on this idea that verification is easier than generation, P versus NP, blah, blah, blah, especially if you can see the other person's thoughts and they can't see yours.
So I think it does make sense to be at the frontier from that perspective, but I will say that I think the main reason
that the companies are doing this is obviously to make money.
And then, in corollary of that, from a safety perspective, if you were trying to actually make a strong case for being at the frontier, it would be like, so our models are performance competitive,
And that they're close enough to the frontier, like fast follower kind of model, that you take a performance hit by using ours instead of the competitors, but they're safer.
And currently no one wants to use anything that's worse than the frontier model.
Why would you?
That's the best model.
But if a model was like, I don't know, 10% more likely to...
tell you to jump off a bridge or something, or actually seriously, 10% more likely to hack your bank account and steal all your money, let alone, I don't know, escape and make a bioweapon.
I would like to think people would use the less good model.
And I like to think that regulators and insurers
could adequately penalize the frontier companies into complying with those.
Because then you have existence proofs, like, oh, my product, I'm actually trying, right?