Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing
6774 total appearances
Voice ID

Voice Profile Active

This person's voice can be automatically recognized across podcast episodes using AI voice matching.

Voice samples: 2
Confidence: High

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But yeah, that's fascinating.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Well, you have a background working in robotics.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

That makes me wonder if it is more like a muscle Twitch kind of thing where it's in the activations.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I don't really understand this at all, so perhaps you could break this down for us and the listeners.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But if you were to use a language model in a robotic scenario, how would this play out in a similar way?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Perhaps that's not the right way to phrase that question, but do you know what I'm saying?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Right.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Maybe.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

That's where I would worry.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Like, let's say like you're training a robot and the robot knows, oh, I need to go over there, but it doesn't know it can't flail its giant mechanical arms and smack everybody between here and there.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

That's kind of what I was picturing.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Exactly.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Right.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

That makes sense.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Well, how would this play out in practice if we deployed these systems at work?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

For instance, you talked about obfuscated reward hacking.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So what happens when the models learn to hide their misbehavior?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You kind of gave an example in the healthcare scenario, but perhaps we could expand on that.