John Schulman

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like, it's not...

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, so I guess you have these old points about like instrumental convergence, like the model's gonna wanna take over the world so it can produce this awesome piece of code at the end.

1719.408 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like if you ask it to write you a Flask app, it'll be like, oh yeah, first I need to take over the world and then I need to, I don't know.

1728.501 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But at a certain point, it's a little bit, it's a little hard to imagine why for some like fairly well-specified tasks like that, you would wanna first take over the world.

1736.452 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Um, but of course, uh, yeah, if, if you had a task like make money, uh, then maybe, uh, that would lead to some nefarious behavior as a, um, instrumental goal.

1746.627 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah.

1758.342 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, I would say there are probably some analogies with a drive or a goal in humans.

1796.736 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So in that you're trying to steer towards a certain set of states rather than some other states.

1801.043 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

um and so i would i would think that our concept of a drive or a goal has um other um elements like uh like the feeling of satisfaction you get for achieving it and uh and those things might um be more like have more to do with the learning algorithm than uh what the model does at runtime uh when you just have a fixed model so i would say the

1808.896 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I would say there are probably some analogies, though I don't know exactly how close it is.

1832.977 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But I would say to some extent, the models do have drives and goals in some meaningful way.

1841.589 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And in the case of RLHF, where you're trying to maximize human approval as measured by a reward model, the model is just trying to produce something that people are going to like and they're going to judge as correct.

1848.418 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Well, I would say you could define reasoning as tasks that require some kind of computation at test time or maybe some kind of deduction.

1908.155 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So by definition, reasoning would be tasks that require some test time computation and step-by-step computation.

1919.351 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

On the other hand, I would also expect to gain a lot out of like doing some kind of training time computation or practice at training time.