John Schulman

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So you could, so for example, right now,

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like you could imagine having the models to carry out a whole coding project instead of maybe giving you one suggestion on how to write a function.

186.618 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So you could imagine the model, like you giving it sort of high level instructions on what to code up and it'll go and write many files and test it, look at the output, iterate on that a bit.

194.768 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So just much more complex tasks.

209.647 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, I would say this will come from some combination of just training the models to do harder tasks like this.

219.347 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I'd say the models aren't...

228.703 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Most of the training data is more like doing single steps at a time.

234.492 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And I would expect us to do more for training the models to carry out these longer projects.

241.933 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I'd say any kind of training, like doing RL to learn how to do these tasks, however you do it, whether you're supervising the final output or supervising it like each step, I think any kind of training at carrying out these long projects is going to make them a lot better.

249.494 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And since the whole area is pretty new, I'd say there's just a lot of low-hanging fruit.

269.078 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Interesting.

276.868 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

in doing this kind of training.

277.709 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I'd say that's one thing.

280.915 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Also, I would expect that as the models get better, they're just better at recovering from errors or they have just,

282.879 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

they're better at dealing with edge cases, or when things go wrong, they know how to recover from it.

291.85 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So the models will be more sample efficient, so you don't have to collect a ton of data to teach them how to get back on track.

298.697 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Just a little bit of data or just their generalization from other abilities will allow them to get back on track, whereas current models might just get stuck and get lost.

306.524 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Right.

332.411 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

They're not directly connected.

332.592 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would say you usually have a little bit of data that does everything.

334.515 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment