John Schulman

👤 Speaker

528 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think that same kind of ability could be used for something like active learning.

2265.983 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Well, I think policy gradient algorithms are not the most sample efficient algorithms.

2303.928 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So that's probably not what you want to do at test time if you want to learn really fast.

2308.635 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But who knows?

2313.241 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I mean, maybe it's not that bad.

2314.302 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I think

2316.926 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Something like motor learning in animals is probably something like a policy grading algorithm.

2319.449 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

For example, you're learning how to shoot baskets.

2327.14 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

That takes maybe thousands of tries to get more accurate.

2331.126 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

There's probably something that's like a policy grading algorithm underneath.

2337.596 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But that's not gonna be the fastest way to learn in like if you have a model trying to do a project or some kind of task.

2343.344 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would think we would wanna rely more on like in context learning where,

2353.766 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

you effectively have a learned algorithm.

2360.962 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Like you've learned how to explore, like you've learned how to try all the possibilities exhaustively.

2363.325 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And instead of doing the same thing over and over again, making the same mistake.

2368.711 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So yeah, I would say we'll be able to do things that look more like learned search algorithms.

2375.018 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And that'll be the kind of thing that gets used in a particular task.

2380.064 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Interesting.

2385.717 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, so early, so we had, before ChatGPT, we had, OpenAI had these instruction following models and that was, the idea there was we had base models and people can prompt them in elaborate ways, but they're also kind of hard to prompt.

2415.097 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

You had to, they basically do auto-complete.

2435.48 View full episode →

← Previous Page 11 of 27 Next →

Report any issue