Nick Heiner
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's a model does the work and another model judges the success.
And the way it judges the success is a human will write out a bunch of very concrete, specific criteria that the model is going to evaluate on.
But it then becomes this game of like, okay, well, I have my verbal unit test essentially.
Is my test suite comprehensive enough that I can capture all of these cases?
Because a very literal-minded judge will look at this and be like, yeah, I mean, follow the rules.
Exactly.
Exactly.
You know, and that's, you know, when you see something like there was a, I don't want to name names, but there was a model last year that got released that was, you know, caused the company in question to do a lot of reorgs because the model was so bad.
And it was trained on Alamarina.
And it's like, yeah, like you said, the model is only doing what you tell it to do.
And it did a great job of optimizing for that reward signal.
In my first computer science class,
My professor said to us, you should never get frustrated with a computer because it can only do what you tell it to do.
You know, he was saying that like a deterministic context, right?
Of like, you literally wrote this code.
But honestly, it works even in a modeling context.
Like it's only doing what we trained it to do.
Well, so part of the value we provide is by having a very diverse set of AI trainers to sort of get past that.
That's good.
Yeah.