Grant Harvey
👤 SpeakerVoice Profile Active
This person's voice can be automatically recognized across podcast episodes using AI voice matching.
Appearances Over Time
Podcast Appearances
For instance, you talked about obfuscated reward hacking.
So what happens when the models learn to hide their misbehavior?
You kind of gave an example in the healthcare scenario, but perhaps we could expand on that.
I do have a question.
So do you see this being something that the end user would be monitoring or is it something that you would monitor, you know, on the OpenAI side?
Right.
Because there's maybe been historical accounts of other companies using that information and then releasing products with it.
That's right.
Yeah.
Yeah.
Did you want to talk about obfuscation or what?
Ideally, you would want that language to be understandable by humans.
Right.
You also mentioned a monitorability tax where sometimes you might accept slightly worse performance to keep the AI systems more transparent.
Is that something that you I think you're advocating for that, right?
Exactly.
Yeah.
And when would when would paying that tax actually be worth it?
I guess this is a question.
Along the trace.