Sam Altman
π€ SpeakerAppearances Over Time
Podcast Appearances
Maybe it's not even mostly an alignment capability.
It helps make a better system, a more usable system.
Yeah.
And this is actually something that I don't think people outside the field understand enough.
It's easy to talk about alignment and capability as orthogonal vectors.
They're very close.
Better alignment techniques lead to better capabilities and vice versa.
There's cases that are different and they're important cases.
But on the whole, I think things that you could say like RLHF or interpretability that sound like alignment issues also help you make much more capable models.
And the division is just much fuzzier than people think.
And so in some sense, the work we do to make GPT-4 safer and more aligned looks very similar to all the other work we do of solving the research and engineering problems associated with creating useful and powerful models.
And there's no one set of human values or there's no one set of right answers to human civilization.
So I think what's going to have to happen is we will need to agree on, as a society, on very broad bounds.
We'll only be able to agree on a very broad bounds of what these systems can do.
And then within those, maybe different countries have different RLHF tunes.
Certainly individual users have very different preferences.
We launched this thing with GPT-4 called the system message, which is not RLHF, but is a way to let users have a good degree of steerability over what they want.
And I think things like that will be important.
So the system message is a way to say, you know, hey, model, please pretend like you or please only answer this message as if you were Shakespeare doing thing X. Or please only respond with JSON no matter what was one of the examples from our blog post.
But you could also say any number of other things to that.