Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

๐Ÿ‘ค Speaker
15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Let me make the question more concrete.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

How much more than chinchilla optimal are models overtrained?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And has that changed as a result of RL generation?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Which is the fact that you're not training on all your rollouts.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, so if you're doing a backward pass on every single generation in RL, it would be 6 nd.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, so this could be a smaller number, right?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I think the way I said it was super garbled.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Just for the audience, maybe.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Forward plus backwards per parameter is six.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Forward alone is two.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That's why RL where you might... You're definitely going to generate all the trajectories, but you might or might not train all the trajectories is two to six.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yes.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And inference would be 50%.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

If both of them are 1 in 10, that kind of implies that there's never a backward pass on RL?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So this is like 1.5 and this is one, um, um, Billions of dollars of the compute just flowed the other direction.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Right.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But then, so it looks... Sorry, I'm making a basic algebra mistake.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It seems like there should be less RL tokens than pre-training tokens?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This is quite interesting.