Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Bowen Baker

👤 Speaker
414 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And this is what largely leads to like reward hacking type behavior.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And reward hacking basically means where you give your model a task, but you haven't thought through all the possible ways it could do the task.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And actually there exists a strategy that is like very much not what you wanted it to do.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Like you design the task.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

You did your best to put in all the guardrails in the environment that that you're like agent or model is acting in.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And you just didn't think of this like one strategy that it could take.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, exactly.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Like, yeah, that's a great example.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, absolutely.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So earlier last year, around March, we published a paper on this work.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But we had started, kind of as a YOLO, we had started monitoring one of our frontier reasoning models as it trained to kind of see if this thing was useful at all.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

At that time, we didn't know.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And that's why it was like a YOLO.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

We're just like, we'll spin it up before...

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

winter break and we'll see what we come back with, you know, like if it found anything.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And what we were doing was we spun up a different model, like a GPT at the time, GPT 4.0.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And we had it read the chain of thought, which is basically just English.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

It like looks a bit weird, but it's mostly understandable.