Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Bowen Baker

👤 Speaker
414 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And this goes back to our also our conversation of, you know, talking about like what, you know, is the chain of thought or sorry, is the model doing something instinctively or is it like actually thinking through it actively in the chain of thought?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

there are certain pressures or things that could happen while training that could kind of shift that balance probably.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so one is, you know, this thing of like, if you train the model to never say something, think something bad,

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

We showed in a paper earlier this last year that it can in some cases actually still do a bad thing, but not think the bad thoughts anymore.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

We think it probably can only do this in cases where the chain of thought isn't actually necessary to do the bad thing, going back to this question of necessity.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

of like of the chain of thought um but it still shows that you know there's actually some propensity for models to reason out loud even when they don't need to and you can kind of destroy that propensity with style pressures like that um and so yeah the model and then we call that obfuscation like the model is like obfuscating it's actually it's it's thinking um

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And then there's kind of a lot of other reasons we think that, you know, chains of thought could become like less information dense or have yet be less monitorable.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So, for instance, people are worried even about just large scale RL or reinforcement learning, which is the way we train these models kind of just in the normal way we're doing it.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You could imagine that.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

as you increase the compute there, models could eventually learn their own languages or change the way that they're using the English language or whatever language they're thinking in.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

At the current scale, and this is something we found in our recent paper,

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

It doesn't seem that worrying, but it is, you know, in the reinforcement learning literature, there are examples, like people were working on like emergent language and emergent communication back in like, I don't know, 2016 to 2018, 19 or so.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And, you know, they showed it in like very constrained settings that if you just put in models that have no language prior, they're just totally randomly initialized, they can learn some language to talk to each other with.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And...

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Exactly.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, yeah, yeah.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so, yeah, so, but this might happen where, like, the model would, like, learn its own internal language of how to think that's maybe, like, a bit more efficient or compressed than the English language.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And it's, like, then it's harder for us to understand ourselves.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Or a different model to understand that hasn't, like, learned that same exact language, specific language.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah.