John Schulman

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I don't think anyone has a good answer for a good explanation of the scaling law with parameter count.

3405.017 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I mean, there are some

3412.253 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

uh i don't even know what the uh what the best um sort of mental model is uh for this like clearly you have more capacity if you have a bigger model but uh so like you should be able to eventually get uh lower loss but i guess why are bigger models more sample efficient um i guess you could um i can give you some like very sketchy uh explanations like uh like they have um

3414.62 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

You could say that the model is sort of an ensemble of a bunch of different circuits that do the computation.

3442.548 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

You could imagine that it has a bunch of computations that it's doing in parallel, and the output is a weighted combination of them.

3451.707 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

if you have more um just width of the or if you just have i mean actually width is somewhat similar to depth because uh like with residual networks uh you end up like the depth can do something similar to width in terms of like updating what's in the residual stream but uh

3463.912 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

If you, yeah, you could argue that you're learning all these things in parallel, you're learning all these different computations in parallel and you just have more of them with the bigger model.

3483.065 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So you have more chance that one of them is lucky and ends up like having high, like winning, guessing correctly a lot and getting up-weighted.

3493.179 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So that's kind of like a...

3504.214 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

uh what would be the yeah there's some algorithms uh that work this way like that um

3507.078 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

like mixture, what is it?

3514.435 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Mixture, some kind of mixture model or multiplicative weight update algorithm.

3516.057 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

Yeah, there's some algorithms that kind of work like this.

3523.725 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So where you have like some kind of mixture of, I don't wanna say mixture of experts, cause it means something different, but like basically a weighted combination of experts with some learned gating and,

3526.188 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

I actually, anyway, I said something slightly wrong, but anyway, yeah, you could imagine something like that and just having a bigger model gives you more chances to get the right function.

3540.322 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So that would be, and then of course, it's not just like you have a bunch of like totally disjoint, like functions that have,

3551.295 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

you're taking a linear combination of.

3561.027 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

It's more like a library where you might chain the functions together in some way.

3562.95 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So there's some composability.

3567.216 View full episode →

Dwarkesh Podcast

John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI

So I would just say the bigger model has a bigger library of different computations, including lots of stuff that's kind of dormant and only being used some of the time.

3572.323 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment