John Schulman

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So you have to set up a very good prompt with some examples.

2438.524 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So,

2441.687 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So people at OpenAI were working on just taking the base models and making them easier to prompt so that if you just wrote a question, it would answer the question instead of giving you more questions or something.

2443.109 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So we had these instruction following models, which were kind of like base models, but a little easier to use.

2456.887 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And those are the original ones deployed in the API.

2463.875 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

or after GPT-3, those were the next generation of models.

2467.46 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Then at the same time, there were definitely a lot of people thinking about chat.

2473.027 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So Google had some papers, like they had Lambda and earlier Mina.

2479.315 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So they had these chatbots and it was more like,

2485.503 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It was more like a base model that was really specialized to the task of chat, really good at chat.

2488.807 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I think at least looking at the examples from the paper, it was more used for fun applications where the model would take on some persona and pretend to be that persona.

2496.219 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It was not so functional like help me refactor my code.

2508.358 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So yeah, there are definitely people thinking about chat.

2513.326 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I had worked on a project before looking at chat called WebGPT, which was more about doing question answering with the help of web browsing and retrieval.

2517.333 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Well, when you do question answering, it really wants to be in a chat because you always want to ask follow-up questions or sometimes the model should ask a clarifying question because the question's ambiguous.

2530.454 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So it was kind of clear after we did the first version of that that the next version should be conversational.

2544.856 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So anyway, we started working on a conversational chat assistant and

2550.184 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

This was built on top of GPD 3.5, which was done training at the beginning of 2022.