Bowen Baker

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I worked in reinforcement learning primarily before working on safety.

591.815 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And this was just like a thing that if you were doing anything, you would constantly find like reward hacks.

596.502 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You would find that you...

601.63 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

you like didn't code your environment entirely correctly.

602.992 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And there was this like weird strategy that could happen like, you know, back in the day of doing like physics simulators and like training models and physics simulators, you'd find them like jumping through walls because they found the exact right way to like break the physics simulator and like pass through an object and stuff like that, that you were just always had to kind of like patch these holes.

606.476 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so, yeah, the same thing is happening with reasoning models and large language models.

626.681 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, exactly.

650.697 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I mean, I think someone had recently brought up a similar thing of like the kid, you know, like opening up the cookie jar when the parents not looking.

651.438 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

It's like exactly the same kind of thing.

658.57 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You've like set up a reward function or something you want them to do.

661.195 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But then they realize that, oh, you know, like I'm not being observed in this case.