John Schulman
๐ค SpeakerAppearances Over Time
Podcast Appearances
And that was one of the things we were excited about.
So yeah, we worked on that.
We worked on that for most of the year, and we had browsing as another feature in it, though we ended up de-emphasizing that later on because the model's internal knowledge was so good that the browsing wasn't the most interesting thing about it.
uh we were thinking about we had it out for beta testing or to friends and family for a while and we were thinking about doing a public release um but um at that time uh actually gpd4 finished training in august or um yeah in august that year and um
Actually, the flagship RL effort at OpenAI was the instruction following effort, because that was the models that were being deployed into production.
So the first fine tunes of GPT-4 used that whole stack.
And that was...
Yeah, those models were really good, and everyone got really excited about that after seeing the Instruct fine-tuned GPT-4s.
So they were really, really good.
They would occasionally give you amazing outputs, but they were also a little bit, the model was clearly pretty unreliable.
It would sometimes hallucinate a lot, and it would sometimes give you pretty unhinged outputs.
So it was clearly not quite ready for prime time, but it was obviously very good.
um and uh yeah so i guess that um people uh forgot about chat for a little while after that because about this like alternative branch uh but then we we ended up um we pushed it further and we ended up like mixing together all the data sets like the instruct and the chat data and to try to get something that was the best of both worlds and uh i think the yeah the models we the chat models were like uh
were clearly more like, it was an easier to use.
It was sort of more, it sort of like automatically had much more sensible behavior in terms of like the model knowing its own limitations.
That was actually one of the things that I got excited about as we were developing it,
Like I realized a lot of the things that people thought were flaws in language models, like just like blatantly hallucinating could be not completely fixed, but you could make a lot of progress with pretty straightforward methods.
Oh yeah, and also the other thing about chat was that when we had these instruct models, the task of complete this text, but in a nice way or in a helpful way, that's a pretty poorly defined task.
So I think that task is both confusing for the model and for the human who's supposed to do the data labeling.
Whereas for chat, I think people had an intuitive sense of what a helpful robot should be like.