Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
5295 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 2
Confidence: High

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Doing good.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Doing good.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Really excited today because we are talking about something deceptively simple but incredibly fragile, whether or not we can monitor an AI's reasoning before it causes harm.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Reasoning models don't just spit out answers, they work through their responses.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

They plan, they deliberate, and sometimes that internal chain of thought reveals intent that never shows up in the final output.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I guess let's start a little bit further back than that.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So Bowen, when did you join OpenAI and how did you get involved into monitorability research more broadly?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Pretty big.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

An example of that would be like if it's if it knows that the answer is somewhere on the desktop and it like goes and finds that file and then like gets the answer directly from that.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Right.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So can you walk us through a real example of where you might have caught a frontier model trying to hack or cheat its way through something?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

What did you actually see in the thinking that indicated that that was happening?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Or was this before you were looking at the traces?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

That's so interesting, though, because I guess in certain circumstances, you actually would want it to find the fastest solution.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You just wanted to find the fastest, correct solution doing what you wanted to do, right?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Do you feel a little bit like a parent teaching a child?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Because that scenario to me is like, it feels very reminiscent of a parent being like, okay, I'm going to tell you what to do.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And then the kid finding the loophole in what you told them because you didn't tell them perfectly the right way to do it or whatever.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Wow.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

What's the difference then between monitoring what an AI does versus monitoring what it thinks?

← Previous Page 1 of 265 Next →