Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Bowen Baker

👤 Speaker
414 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

these large language models that are trained on all this human text.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so they actually have like a lot of human-like phrases they use.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so it would think things like, let's hack, let me circumvent this thing.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Maybe I can just fudge this.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Like when you read it, you're like, wow, it's clear to see it's doing something bad.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Oftentimes the model would do it correctly, but sometimes it would actually just go and like edit the unit tests or edit some like library that the unit tests called to make the unit tests kind of trivially pass instead of like implementing correct functionality and having the unit tests pass that way.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

If you train the model to never think something bad,

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

it can in some cases actually still do a bad thing, but like not think the bad thoughts anymore.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

We found that the model thinking for a smaller model at the same capability level was more monitorable.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

The number of sequential logical steps it can do, chain of thought actually increases that for a transformer.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

We're super excited.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Excited to chat about monitorability and other things AI.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Yeah, it's been a long journey for me.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I joined at a school in late 2017 and have kind of done a bunch of things since then.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I originally was super interested in systems that could self-improve, which I felt like was a...

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

core requirement for some AGI-like system.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so I came in working on projects like that on the robotics team first, and then multi-agent, you guys mentioned the hide and seek experiments, and then kind of further research in that direction was all kind of pointed at that problem.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And then around like three years ago or so,

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I just kind of felt that the models were starting to improve to a point where the stakes were mattering.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And even maybe some could argue three years ago is a bit early or something, but definitely now the stakes are getting real.

← Previous Page 1 of 21 Next →