Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

John Schulman

👤 Person
528 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So, I mean, if you have...

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, if you collect a diverse data set, you're going to get a little bit of everything in it.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And if you have models that generalize really well, even if there's just a couple examples of getting back on track.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I see.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Okay, interesting.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Or even, like, maybe in the pre-training, there's examples of getting back on track.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Then, like, the model will be able to generalize from those other things it's seen to the current situation.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think, like...

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

uh if you have models that are weaker you might be able to get them to do almost anything with enough data but you might have to put a lot of effort into a particular domain or skill whereas for a stronger model it might just do the right thing without any training data or any effort

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, I would say at a high level, I would agree that longer horizon tasks are going to require more model intelligence to do well and are going to be more expensive to train for.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I'm not sure I would expect there to be a really clean scaling law unless you set it up in a very careful way or design the experiment in a certain way.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Because

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I would say there might end up being some phase transitions where once you get to a certain level, you can deal with much longer tasks.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So for example, people,

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I think when people do planning at different timescales, I'm not sure they use completely different mechanisms.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So we probably use the same mental machinery if we're thinking about one month from now, one year from now, or like a hundred years from now.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So we're not actually doing some kind of reinforcement learning where we need to worry about a discount factor that covers that timescale and so forth.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think using language, you can describe all of these different timescales, and then you can do things like plan.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

In the moment, you can try to make progress towards your goal, whether it's a month away or 10 years away.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I might expect the same out of models where