Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Bowen Baker

👤 Speaker
414 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I worked in reinforcement learning primarily before working on safety.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And this was just like a thing that if you were doing anything, you would constantly find like reward hacks.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You would find that you...

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

you like didn't code your environment entirely correctly.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And there was this like weird strategy that could happen like, you know, back in the day of doing like physics simulators and like training models and physics simulators, you'd find them like jumping through walls because they found the exact right way to like break the physics simulator and like pass through an object and stuff like that, that you were just always had to kind of like patch these holes.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so, yeah, the same thing is happening with reasoning models and large language models.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, exactly.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I mean, I think someone had recently brought up a similar thing of like the kid, you know, like opening up the cookie jar when the parents not looking.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

It's like exactly the same kind of thing.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You've like set up a reward function or something you want them to do.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But then they realize that, oh, you know, like I'm not being observed in this case.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So then I can do it then or, you know, anything like that.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

It is.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

It's a bit like parenting.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, I mean, definitely we find that stronger monitors like do perform the monitoring task better.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

OK.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But that being said, weak monitors can still catch a lot of bad stuff that strong models do.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And that's because I think I mean, I don't have we don't have like hard evidence for this.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But if you remember, like the types of phrases I mentioned, like let's hack, let's fudge this type of thing.