John Schulman

uh if you have models that are weaker you might be able to get them to do almost anything with enough data but you might have to put a lot of effort into a particular domain or skill whereas for a stronger model it might just do the right thing without any training data or any effort

368.378 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, I would say at a high level, I would agree that longer horizon tasks are going to require more model intelligence to do well and are going to be more expensive to train for.

420.725 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I'm not sure I would expect there to be a really clean scaling law unless you set it up in a very careful way or design the experiment in a certain way.

433.458 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Because

445.691 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I would say there might end up being some phase transitions where once you get to a certain level, you can deal with much longer tasks.

448.13 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So for example, people,

461.771 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I think when people do planning at different timescales, I'm not sure they use completely different mechanisms.

464.195 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So we probably use the same mental machinery if we're thinking about one month from now, one year from now, or like a hundred years from now.

474.692 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So we're not actually doing some kind of reinforcement learning where we need to worry about a discount factor that covers that timescale and so forth.

485.93 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think using language, you can describe all of these different timescales, and then you can do things like plan.

496.886 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

In the moment, you can try to make progress towards your goal, whether it's a month away or 10 years away.

505.76 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I might expect the same out of models where

510.287 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment