Trenton Bricken
๐ค SpeakerAppearances Over Time
Podcast Appearances
Like, now we don't even talk about it.
And it'd be silly to think that it was a meaningful test.
MARK MANDELMANN- Yeah, yeah.
I don't know.
That seems okay.
Like if we have AI oracles.
Yeah, that's what I'm saying.
That's good.
Yeah, exactly.
One nice example of this is just the ability or notion to backtrack.
You go down one solution path.
Oh, wait, let me try another one.
And this is something that you start to see emerge in the models through RL training on harder tasks.
And I think right now it's not generalizing incredibly well, at least with RL.
What are you learning?
I mean, it really depends upon the timeline at which we get Cloud 8 and the models hit ASL 4 capabilities.
Fundamentally, we're just going to use whatever tools we have at the time and see how well they work.
Ideally, we have this enumerative safety case where we can almost verify or prove that the model will behave in particular ways.
In the worst case, we use the current tools like when we won the auditing game of seeing what features are active when the assistant tag lights up.
Yeah, yeah, yeah.