Nick Heiner
๐ค SpeakerAppearances Over Time
Podcast Appearances
So again, if you purely have like the, I just write out the rest of what I think this document is, then you could just write, you know, title, accurate and easy instructions to build a bomb, you know, with using only things you can buy at Home Depot or something.
And then it will faithfully write the rest of that document.
And so post-training is where you teach it not to do that.
Got it.
Yes, exactly.
What you want is for it to be something you can train a model in.
And the two things that sort of feed into that are difficulty and realism.
because it's not helpful.
I mean, you can train a model to do whatever you want, but if you're not training it to something that's valuable in the real world, then it's just a waste of GPU.
This is what we see with LM Arena, where when labs train on it, it makes the model worse because you're sort of optimizing for a signal that's incredibly noisy.
And so in the same sense, it's very important that our environments be highly realistic and that they, you know, match, match reality.
And so like one way that you do that, for instance, is you have this expert network.
And so if you're trying to build, say, the customer support or like the finance environment, you need to have people who have that job in real life tell you like what type of tasks do they do?
How are those tasks judged?
What are the tools that they use?
Right.
So like if you have like your Bloomberg terminals or like your Zendesk or whatever.
Yeah.
Um, so yes, you gotta do all that.
You gotta make sure the difficulty is correct.