Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

John Schulman

👤 Person
528 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So you could, so for example, right now,

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like you could imagine having the models to carry out a whole coding project instead of maybe giving you one suggestion on how to write a function.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So you could imagine the model, like you giving it sort of high level instructions on what to code up and it'll go and write many files and test it, look at the output, iterate on that a bit.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So just much more complex tasks.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, I would say this will come from some combination of just training the models to do harder tasks like this.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I'd say the models aren't...

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Most of the training data is more like doing single steps at a time.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And I would expect us to do more for training the models to carry out these longer projects.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I'd say any kind of training, like doing RL to learn how to do these tasks, however you do it, whether you're supervising the final output or supervising it like each step, I think any kind of training at carrying out these long projects is going to make them a lot better.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And since the whole area is pretty new, I'd say there's just a lot of low-hanging fruit.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Interesting.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

in doing this kind of training.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I'd say that's one thing.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Also, I would expect that as the models get better, they're just better at recovering from errors or they have just,

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

they're better at dealing with edge cases, or when things go wrong, they know how to recover from it.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So the models will be more sample efficient, so you don't have to collect a ton of data to teach them how to get back on track.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Just a little bit of data or just their generalization from other abilities will allow them to get back on track, whereas current models might just get stuck and get lost.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Right.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

They're not directly connected.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would say you usually have a little bit of data that does everything.