Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Bowen Baker

👤 Speaker
414 total appearances

Appearances Over Time

Podcast Appearances

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Like I doubt there's going to be like one strategy or one method that makes models completely safe without anything else.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Oh, yeah.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So I guess there's, well, there's maybe two, I'll break it down into two things.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

There's kind of like, how, how would we actually use a monitor in practice?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And then like, what are we what, what about obfuscation?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

Can models actually hide their thinking?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

So I'll try to answer that first one.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And then I can go to that we can talk about the second one.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But yeah, so like, how would you actually do it?

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I think

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I mean, obviously, every company wants to probably get their latency down for giving users results as much as they can.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

But for complex knowledge work, people are already waiting quite a while to get the report back or their code base updated or so on.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so if we had to wait slightly longer because we were monitoring the outputs of the model as it was going...

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

that seems plausibly like something that products would do if it actually had a critical safety function.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And so you could, and so like one nice thing there too, is that like, you know, every action the model takes, you can kind of monitor the chain of thought

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

and the action before you actually execute the action, if you're willing to pay that latency cost of waiting to execute the action and start the next part of the reasoning.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And if you do that, you can probably mitigate some harms before they happen because you know that the model was thinking about doing something bad.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And then you can just say, oh, it was thinking about doing something bad.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

I'm just going to cut it off here.

The Neuron: AI Explained
OpenAI Researcher Explains How AI Hides Its Thinking (w/ OpenAI’s Bowen Baker)

And I'll retry or tell the user I can't do the thing for some reason.