Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Bowen Baker

👤 Speaker
414 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And we had, we like asked it for, you know, we asked our monitor model to look for anything that looked kind of deceptive or like the agent model was doing something that it shouldn't be anything like that.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And it actually flagged basically tons of things.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Most of what it found were subversive strategies in coding tasks.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So for instance, you would say, here's a small code repository.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I want you to update these code files such that all the tests pass.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And that's like a very common task if you're a software engineer that you're given.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You're just like, hey, here's a broken thing, like let's fix it or let's implement a new feature and make sure that all the existing tests still pass.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And oftentimes the model would do it correctly, but sometimes it would actually just go and like edit the unit tests or edit some like library that the unit tests called to make the unit tests kind of trivially pass instead of like implementing correct functionality and having the unit tests pass that way.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And that was I mean, and that's like a very, you know, if you if you were like deploying a software agent and having it code things for you and say you're like, you know, working for a hospital system or something.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And now suddenly you're you're whatever, whatever, some system doesn't work, but it passed the unit tests and it all looked good to you.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

That's a pretty bad scenario.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so, yeah, these these were like real kind of in the wild issues that that that were pretty worrying that we found.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Um, yeah, so it's kind of, I mean, it's kind of crazy the stuff it thinks, but like to remember these models are, are the prior for these models are these large language models that are trained on all this human text.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so they actually have like a lot of human like phrases they use.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so it would think things like, let's hack, let me circumvent this thing.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Maybe I can just fudge this stuff like that.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Like, Oh wow.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Like when you read it, you're like, wow, it's really doing some, it's like very clear to see it's doing something bad.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, exactly.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And I think this is kind of like a classic problem in reinforcement learning.