Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Bowen Baker

👤 Speaker
414 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

you know, a very, very dumb, small language model can pick up on signals like that.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so when the model is being very, very obvious, it can, you know, they can pick up on those things.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And I think that is kind of like, yeah, that is probably the main mechanism by which that happens.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, so that's a great question.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So the outputs that the models have, so like what you read in ChatGPT, like what it actually gives you, those are optimized to be like, you know,

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

helpful to you do what you want be like safe and or you know like apply to any specifications we have on what a good output looks like and so you know if the model knows it's doing something bad

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

but it will know that it can't actually say that in the output because it's told not to do bad things for the user.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so there's a clear reward to not reveal bad things happening in the output.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Whereas in the thinking, there's no such reward or pressure to do so.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

It's like the thinking is kind of treated as a private space for the model.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And that's something that we at OpenAI really tried to hold a hard line on for the most part to not apply any kind of style pressures or to make the chain of thought look nice in any kind of way.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

because we really want it to, you know, if it wants to think, like have a thought about a strategy for doing something bad, we want that to be very clear so that we can then monitor and catch it.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And then the other reason is that, you know, I guess like, so there's this thing of the output's,

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

are pressured to not show bad things.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But also, outputs can be incredibly complex.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

We're seeing these models do more and more crazy things every day.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

The size of a coding project a model can actually accomplish is, I don't know, probably pretty big by now, depending on what benchmarks you look at.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And imagine it's mostly good, but in...

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

one file, it didn't really know how to do a thing.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And it does this thing where it like stubs the unit tests and kind of like makes that one module not really work as it should.