Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

John Schulman

๐Ÿ‘ค Speaker
528 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It's not just memory, but it's also somewhat like specializing to a task that specializing to a certain task or putting a lot of effort into like some particular project.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I see, so it's not just about finding like, I don't know, training on a bunch of sources that are relevant, fine tuning on some special domain.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It's also about like reasoning about, like developing some knowledge through your own reasoning and also using some sort of introspection and self-knowledge to figure out what you need to learn.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, I would say that does feel like something that's missing from today's systems.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I mean, I would say people haven't really pushed too hard on this middle ground between large-scale training, where you produce the snapshot model that's supposed to do everything, a deployed model, and then on the other hand, in-context learning.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And I think part of that is that we've just been increasing context length so much that there hasn't been an incentive for it.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So if you can go to like 100,000 or a million context, then that's actually quite a lot.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And it's not actually the bottleneck in a lot of cases.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

But I agree that

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

You'd probably also wanna supplement that by some kind of fine tuning, like the capabilities you get from fine tuning and in context learning are probably somewhat complimentary.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would expect us to wanna build systems that do some kind of online learning and also have some of these cognitive skills of like introspecting on their own knowledge and seeking out new knowledge that fills in the holes.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Well, you're learning while you do the task, right?

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So the only way to do something that involves a lot of steps is to have learning and memory that gets updated during the task.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So there's a continuum between short-term memory, between short-term and long-term memory.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would say, yeah, I would expect,

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I would expect this capability would start to become, like the need for it would start to become clear when we start to look at long horizon tasks more and to some extent, just putting a lot of stuff into context will take you pretty far because we have really long contexts now, but you probably also want things like fine tuning.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And as for like introspection and the ability to do active learning,

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

that might automatically fall out of the model's abilities to know what they know, because models have some calibration regarding what they know.

Dwarkesh Podcast
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

And that's why models don't hallucinate that badly, because they have some understanding of their own limitations.