Corey Knowles
👤 SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Absolutely.
Where it's not intentional.
It's just a lack of understanding that that was even a possibility.
It's doing something a human might do, an undesirable human might do.
That's interesting.
Wow, that's really interesting.
Something that stood out to me is that you mentioned even just a minute ago that you found that like 4.0 could successfully monitor O3 many, I believe it was in your paper.
And even though O3 many is technically the stronger model of the two, and it made me wonder how does a weaker AI catch a smarter one in the act?
Yeah.
If chain of thought traces are kind of like being able to open a window into the model's mind and look down in, what...
what can it still hide behind the curtains?
Are there things that are still not graspable from a monitorability aspect?
Like it's more instinctual.
You know, there's also an element of a real need in a robotics type situation, I would say, for this to be instant in some way, to be able to instantly know what's going on in order to stop something in the act before.
And, you know, I'm not making ridiculous Terminator questions.
I just mean in terms of, you know, smashing through a door accidentally or something even or.
It's going to be like how a car has more than one safety feature.
Okay.
Yeah.
You mentioned in your research that you found longer chains of thoughts to be generally more monitorable.