John Schulman

I mean, right now it's a pretty lopsided ratio, but you could argue that the output generated by the model is like high quality compared to, or higher quality than most of what's on the web.

2965.869 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So it sort of makes more sense for the model to think by itself instead of just like training to imitate what's on the web.

2978.5 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think there's a first principles argument for that.

2989.47 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I would say we found a lot of gains through post-training.

2993.977 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I'm not sure.

2996.359 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would expect us to keep pushing this methodology and probably increasing the amount of compute we put into it.

2998.201 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, I would say that most of that is post-training.

3022.883 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Interesting.

3027.608 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So there are a lot of different separate axes for improvement.

3029.43 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

We think about data quality, data quantity, just doing more iterations of the whole process of deploying and collecting new data and changing what you're...

3034.175 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

what kind of annotations you're collecting.

3047.47 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So there's a lot of things that stack up, but together they give you a pretty good, like effective compute increase.

3049.574 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I'd say I just have a decent amount of experience at this point from the different parts of the stack, from RL algorithms, obviously, since I've worked on those since grad school, to the data collection, the annotation process, to...

3085.169 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment