Sam Altman
๐ค SpeakerAppearances Over Time
Podcast Appearances
We'll only be able to agree on a very broad bounds of what these systems can do.
And then within those, maybe different countries have different RLHF tunes.
Certainly individual users have very different preferences.
We launched this thing with GPT-4 called the system message, which is not RLHF, but is a way to let users have a good degree of steerability over what they want.
And I think things like that will be important.
So the system message is a way to say, you know, hey, model, please pretend like you or please only answer this message as if you were Shakespeare doing thing X. Or please only respond with JSON no matter what was one of the examples from our blog post.
But you could also say any number of other things to that.
And then we...
We tune GPT-4 in a way to really treat the system message with a lot of authority.
I'm sure there's jail, there'll always, not always, hopefully, but for a long time, there'll be more jailbreaks and we'll keep sort of learning about those.
But we program, we develop, whatever you want to call it, the model in such a way to learn that it's supposed to really use that system message.
Yeah.
And the creativity, the kind of, they almost, some of them almost treat it like debugging software.
But also, I've met people who spend like 12 hours a day for a month on end on this.
And they really get a feel for the model and a feel how different parts of a prompt compose with each other.
Yeah, where you put the clause, when you modify something, what kind of word to do it with.
You could experiment.
Yeah, there's all these ways that the kind of analogies from humans to AIs like breakdown and the parallelism, the sort of unlimited rollouts.
Today's Monday.
We launched the previous Tuesday, so it's been six days.