Grant Harvey
👤 SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
But yeah, that's fascinating.
Well, you have a background working in robotics.
That makes me wonder if it is more like a muscle Twitch kind of thing where it's in the activations.
I don't really understand this at all, so perhaps you could break this down for us and the listeners.
But if you were to use a language model in a robotic scenario, how would this play out in a similar way?
Yeah.
Perhaps that's not the right way to phrase that question, but do you know what I'm saying?
Right.
Maybe.
That's where I would worry.
Like, let's say like you're training a robot and the robot knows, oh, I need to go over there, but it doesn't know it can't flail its giant mechanical arms and smack everybody between here and there.
That's kind of what I was picturing.
Exactly.
Yeah.
Right.
That makes sense.
Well, how would this play out in practice if we deployed these systems at work?
For instance, you talked about obfuscated reward hacking.
So what happens when the models learn to hide their misbehavior?
You kind of gave an example in the healthcare scenario, but perhaps we could expand on that.