Bowen Baker
👤 SpeakerAppearances Over Time
Podcast Appearances
Um, but once you have the ability to quantify this, then you can start to actually make, you know, like interrogate different parts of your, of your training pipeline.
You can interrogate different decisions of monitoring.
So like you said, like if I have a model that has a longer chain of thought, it's much, we found that it's actually, you know, generally it reveals more information in that longer chain of thought, and then it's easier to monitor it.
Um,
And so that was like one of the one example of a thing you can start to actually like make statements about once you have these evaluations.
I am not advocating.
And I mean, probably like I would say probably I would advocate for that.
I'm sorry.
Yeah, definitely.
And like any hypothetical scenario where you're like deploying a dangerous model or a safe model, I'd say you should probably deploy the safe model, even if it's even if it's a smidge slower or something.
Yeah, there's probably many different types of taxes.
So for instance, like we talked about the latency tax already, like even if you just deploy the model you wanted to deploy, but you want to monitor it, you now have to have a slightly higher latency probably to monitor it and before it takes actions or gives outputs to users.
So that's one kind of tax.
Um, the tax we talk about in the paper is that we found, we did a experiment to investigate the effect of the pre-training compute size.
So, you know, the GPT paradigm has been increasing the size of the model while increasing the size of the data.
And then on top of that, we've like now have the RL paradigm, which like, you know, we take those pre-trained models and we, we, we do another procedure to make them really good at reasoning.
And so we wanted to investigate, you know, how does monitorability scale with pre-training size and pre-training compute?
And so we took a series of increasing size models and we measured their capability and their monitorability at like different reasoning efforts.
And so like, you know, models can generally be set to have like a lower or a higher reasoning effort.
They'll think a bit shorter or a bit longer respectively for each of those.