Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

John Schulman

๐Ÿ‘ค Speaker
528 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And that was one of the things we were excited about.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So yeah, we worked on that.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

We worked on that for most of the year, and we had browsing as another feature in it, though we ended up de-emphasizing that later on because the model's internal knowledge was so good that the browsing wasn't the most interesting thing about it.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

uh we were thinking about we had it out for beta testing or to friends and family for a while and we were thinking about doing a public release um but um at that time uh actually gpd4 finished training in august or um yeah in august that year and um

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Actually, the flagship RL effort at OpenAI was the instruction following effort, because that was the models that were being deployed into production.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So the first fine tunes of GPT-4 used that whole stack.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And that was...

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, those models were really good, and everyone got really excited about that after seeing the Instruct fine-tuned GPT-4s.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So they were really, really good.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

They would occasionally give you amazing outputs, but they were also a little bit, the model was clearly pretty unreliable.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It would sometimes hallucinate a lot, and it would sometimes give you pretty unhinged outputs.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So it was clearly not quite ready for prime time, but it was obviously very good.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

um and uh yeah so i guess that um people uh forgot about chat for a little while after that because about this like alternative branch uh but then we we ended up um we pushed it further and we ended up like mixing together all the data sets like the instruct and the chat data and to try to get something that was the best of both worlds and uh i think the yeah the models we the chat models were like uh

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

were clearly more like, it was an easier to use.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It was sort of more, it sort of like automatically had much more sensible behavior in terms of like the model knowing its own limitations.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

That was actually one of the things that I got excited about as we were developing it,

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like I realized a lot of the things that people thought were flaws in language models, like just like blatantly hallucinating could be not completely fixed, but you could make a lot of progress with pretty straightforward methods.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Oh yeah, and also the other thing about chat was that when we had these instruct models, the task of complete this text, but in a nice way or in a helpful way, that's a pretty poorly defined task.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think that task is both confusing for the model and for the human who's supposed to do the data labeling.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Whereas for chat, I think people had an intuitive sense of what a helpful robot should be like.