Bowen Baker
👤 SpeakerAppearances Over Time
Podcast Appearances
Yeah, it felt like a good time to switch over to working on safety-related problems.
And I worked on a couple of things before then, like weak to strong generalization.
And a couple other interpretability things.
But then once these reasoning models came out, worked with the, you know, we spun up a team around figuring out how and how monitorable these chains of thought might be.
So, I mean, we've been monitoring models without chain of thought before reasoning models.
The signals you can get, or sorry, the motivation for why you'd want to monitor is models might do bad things.
Yes.
We do our best to train them, but there's kind of two...
large failure modes that I see for like why they would actually misbehave and do things you didn't want them to do.
The first is kind of that they are just dumb.
Like they, we haven't trained them on enough data.
The model isn't big enough yet.
And those issues should in theory go away as we continue to scale the models and the data.
But the second one is that
We might actually just train models to do bad things directly.
And so there's a couple avenues for that.
The first is you have a bad actor like in your company or something that like poisons your data.
I think that one I'm not that worried about.
I mean, it's possible, but like I'm worried about it less than the second one, which is kind of.
more just mistakes.