John Schulman

I don't think you would have been able to do just one iteration of fine tuning where you have like purely human written data and you fine tune on that.

2804.167 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I think you would want to do several iterations.

2811.788 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like if you're not gonna do RL, which we did,

2814.996 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

you'd want to do some kind of iterative supervised fine tuning where you have like humans edit the model generated outputs because it's really hard to get people to like if you train on human generated data even if it's really high quality it's just hard for a model to fit that data perfectly because it might not be like it might not be something a model is capable of outputting so you need to do something iterative that looks a little bit more like rl

2820.451 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think if you had done that, you could have gotten something pretty close, but that would have been kind of non-trivial.

2847.181 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But we also had another instruction following model trained with RL that was released a little before ChatGBT.

2855.714 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think if you put a chat wrapper on that, you would get something decently close.

2862.104 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But that model, if you just prompted it with chat,

2867.572 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So, but that model had some differences in strengths.

2873.641 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like it was, like that model was pretty good at writing and poetry and so forth, but it wasn't, it sort of, it wasn't as good at knowing its limitations and at factuality and so forth.

2878.206 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I would say faster than I would have expected since GPT-2.

2902.885 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I was pretty like bought into scaling and yeah, pre-training and so forth being a good idea.

2907.149 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But when GPT-2 was done, I would say I wasn't completely sold on it being revolutionizing everything.

2915.097 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment