Grant Harvey
👤 SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
Pretty big.
An example of that would be like if it's if it knows that the answer is somewhere on the desktop and it like goes and finds that file and then like gets the answer directly from that.
Right.
So can you walk us through a real example of where you might have caught a frontier model trying to hack or cheat its way through something?
What did you actually see in the thinking that indicated that that was happening?
Or was this before you were looking at the traces?
That's so interesting, though, because I guess in certain circumstances, you actually would want it to find the fastest solution.
You just wanted to find the fastest, correct solution doing what you wanted to do, right?
Do you feel a little bit like a parent teaching a child?
Because that scenario to me is like, it feels very reminiscent of a parent being like, okay, I'm going to tell you what to do.
And then the kid finding the loophole in what you told them because you didn't tell them perfectly the right way to do it or whatever.
Wow.
What's the difference then between monitoring what an AI does versus monitoring what it thinks?
And why is looking at the thinking so much more effective than what the end result?
Mm-hmm.
Well, as they get better, it's like the risk of there being these catastrophic mistakes grow, I feel like, because of the trust, which is a good thing because it can do a lot more.
But yeah, that's fascinating.
Well, you have a background working in robotics.
That makes me wonder if it is more like a muscle Twitch kind of thing where it's in the activations.
I don't really understand this at all, so perhaps you could break this down for us and the listeners.