John Schulman

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It's not just memory, but it's also somewhat like specializing to a task that specializing to a certain task or putting a lot of effort into like some particular project.

2037.907 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I see, so it's not just about finding like, I don't know, training on a bunch of sources that are relevant, fine tuning on some special domain.

2060.182 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It's also about like reasoning about, like developing some knowledge through your own reasoning and also using some sort of introspection and self-knowledge to figure out what you need to learn.

2068.317 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah.

2080.439 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, I would say that does feel like something that's missing from today's systems.

2081.613 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I mean, I would say people haven't really pushed too hard on this middle ground between large-scale training, where you produce the snapshot model that's supposed to do everything, a deployed model, and then on the other hand, in-context learning.

2088.302 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And I think part of that is that we've just been increasing context length so much that there hasn't been an incentive for it.

2107.027 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So if you can go to like 100,000 or a million context, then that's actually quite a lot.

2113.398 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And it's not actually the bottleneck in a lot of cases.

2119.449 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But I agree that

2124.618 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

You'd probably also wanna supplement that by some kind of fine tuning, like the capabilities you get from fine tuning and in context learning are probably somewhat complimentary.

2126.922 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would expect us to wanna build systems that do some kind of online learning and also have some of these cognitive skills of like introspecting on their own knowledge and seeking out new knowledge that fills in the holes.

2137.455 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Well, you're learning while you do the task, right?

2182.475 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So the only way to do something that involves a lot of steps is to have learning and memory that gets updated during the task.

2187.222 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So there's a continuum between short-term memory, between short-term and long-term memory.

2196.175 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would say, yeah, I would expect,

2205.83 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I would expect this capability would start to become, like the need for it would start to become clear when we start to look at long horizon tasks more and to some extent, just putting a lot of stuff into context will take you pretty far because we have really long contexts now, but you probably also want things like fine tuning.

2213.474 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And as for like introspection and the ability to do active learning,

2237.688 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

that might automatically fall out of the model's abilities to know what they know, because models have some calibration regarding what they know.

2243.056 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And that's why models don't hallucinate that badly, because they have some understanding of their own limitations.

2255.251 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment