Bowen Baker
👤 SpeakerAppearances Over Time
Podcast Appearances
So that's probably like how they would be implemented in practice.
And I can, do you want to, if you have any questions on that, we can talk about that and then I can talk about obfuscation after.
Oh, yeah.
So we don't, I think there's good reason to not reveal the chain of thought to users.
I mean, there's two cited reasons, which, so the first is, if we showed users chain of thought, and this is to your question of like, should the user be monitoring this themselves?
If we showed the user the chain of thought, for the same reason, we have to kind of like conform the outputs to look nice.
So that, you know, like we don't kind of like say something offensive to a user or tell them how to do something illicit or anything like that.
Whatever the product policy things are for, you know, in any labs product.
We would have to then do that to the chain of thought because now we're giving them this new piece of this new thing.
And, you know, if the chain of thought says something offensive or like actually reveals the how to do, you know, how to make the bomb or whatever the illicit thing is.
but decides not to tell the user that, well, we've just told the user that if we give them a chain of thought.
And so then that would require us to put these style pressures on the chain of thought.
And that would then maybe lead to obfuscation, which we can talk about in a sec.
And then the other one is, I think, just like distillation risks.
So, you know, these chains of thought kind of give a lot of the reasoning of a model.
And I think you I guess generally we don't try to just like give our model to other people to use.
So.
So, yeah.
So this is like when you kind of in the intro, you were saying you mentioned that chain of thought could be fragile.
And I think that is a big worry in the community right now.