John Schulman

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And that was one of the things we were excited about.

2572.486 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So yeah, we worked on that.

2575.753 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

We worked on that for most of the year, and we had browsing as another feature in it, though we ended up de-emphasizing that later on because the model's internal knowledge was so good that the browsing wasn't the most interesting thing about it.

2578.278 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

uh we were thinking about we had it out for beta testing or to friends and family for a while and we were thinking about doing a public release um but um at that time uh actually gpd4 finished training in august or um yeah in august that year and um

2597.144 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Actually, the flagship RL effort at OpenAI was the instruction following effort, because that was the models that were being deployed into production.

2617.782 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So the first fine tunes of GPT-4 used that whole stack.

2626.741 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And that was...

2632.233 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, those models were really good, and everyone got really excited about that after seeing the Instruct fine-tuned GPT-4s.

2634.337 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So they were really, really good.

2642.228 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

They would occasionally give you amazing outputs, but they were also a little bit, the model was clearly pretty unreliable.

2643.61 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It would sometimes hallucinate a lot, and it would sometimes give you pretty unhinged outputs.

2650.439 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So it was clearly not quite ready for prime time, but it was obviously very good.

2656.447 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

um and uh yeah so i guess that um people uh forgot about chat for a little while after that because about this like alternative branch uh but then we we ended up um we pushed it further and we ended up like mixing together all the data sets like the instruct and the chat data and to try to get something that was the best of both worlds and uh i think the yeah the models we the chat models were like uh

2661.033 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

were clearly more like, it was an easier to use.

2687.577 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It was sort of more, it sort of like automatically had much more sensible behavior in terms of like the model knowing its own limitations.

2691.667 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

That was actually one of the things that I got excited about as we were developing it,

2700.128 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like I realized a lot of the things that people thought were flaws in language models, like just like blatantly hallucinating could be not completely fixed, but you could make a lot of progress with pretty straightforward methods.

2705.06 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Oh yeah, and also the other thing about chat was that when we had these instruct models, the task of complete this text, but in a nice way or in a helpful way, that's a pretty poorly defined task.

2722.053 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think that task is both confusing for the model and for the human who's supposed to do the data labeling.

2736.462 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Whereas for chat, I think people had an intuitive sense of what a helpful robot should be like.

2742.674 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment