Bowen Baker

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

you know, a very, very dumb, small language model can pick up on signals like that.

723.175 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so when the model is being very, very obvious, it can, you know, they can pick up on those things.

728.724 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And I think that is kind of like, yeah, that is probably the main mechanism by which that happens.

734.193 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, so that's a great question.

752.751 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So the outputs that the models have, so like what you read in ChatGPT, like what it actually gives you, those are optimized to be like, you know,

755.574 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

helpful to you do what you want be like safe and or you know like apply to any specifications we have on what a good output looks like and so you know if the model knows it's doing something bad

766.308 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

but it will know that it can't actually say that in the output because it's told not to do bad things for the user.

783.267 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so there's a clear reward to not reveal bad things happening in the output.

790.684 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Whereas in the thinking, there's no such reward or pressure to do so.

800.058 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

It's like the thinking is kind of treated as a private space for the model.

806.029 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And that's something that we at OpenAI really tried to hold a hard line on for the most part to not apply any kind of style pressures or to make the chain of thought look nice in any kind of way.

810.377 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

because we really want it to, you know, if it wants to think, like have a thought about a strategy for doing something bad, we want that to be very clear so that we can then monitor and catch it.

822.198 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And then the other reason is that, you know, I guess like, so there's this thing of the output's,

833.843 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

are pressured to not show bad things.

839.36 View full episode →

The Neuron: AI Explained

OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But also, outputs can be incredibly complex.