Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

John Schulman

๐Ÿ‘ค Speaker
528 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think it was just much easier to tell people

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

for people to get the idea of what the model was supposed to do.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So as a result, I think the model had a much more coherent personality and it was much easier to get pretty sensible behavior robustly.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Interesting.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Not exactly.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I mean, they could have... I don't remember the status of which models were available for fine tuning.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Assuming we had 3.5 available for fine tuning at the time, you could have made something pretty decently close, but I'm not sure you would have...

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I don't think you would have been able to do just one iteration of fine tuning where you have like purely human written data and you fine tune on that.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I think you would want to do several iterations.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like if you're not gonna do RL, which we did,

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

you'd want to do some kind of iterative supervised fine tuning where you have like humans edit the model generated outputs because it's really hard to get people to like if you train on human generated data even if it's really high quality it's just hard for a model to fit that data perfectly because it might not be like it might not be something a model is capable of outputting so you need to do something iterative that looks a little bit more like rl

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think if you had done that, you could have gotten something pretty close, but that would have been kind of non-trivial.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But we also had another instruction following model trained with RL that was released a little before ChatGBT.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think if you put a chat wrapper on that, you would get something decently close.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But that model, if you just prompted it with chat,

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So, but that model had some differences in strengths.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like it was, like that model was pretty good at writing and poetry and so forth, but it wasn't, it sort of, it wasn't as good at knowing its limitations and at factuality and so forth.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I would say faster than I would have expected since GPT-2.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I was pretty like bought into scaling and yeah, pre-training and so forth being a good idea.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But when GPT-2 was done, I would say I wasn't completely sold on it being revolutionizing everything.