Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So data on pre-training.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

This is not well-cited, but... You want me to not remove that?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

No, it's fine.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And I think often active params, number of active params could be in the range of like 100 billion, something like that.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Maybe a bit larger.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I'm assuming active params are about 100 billion.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And so multiply by 20 to get the chinchilla token count.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So chinchilla, the chinchilla would be around 2 trillion.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And yeah, and we see like we're at 100 times larger than that.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Actually, what does de-chinchilla actually mean?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Like the token count for pre-training for the chinchilla scaling law would recommend, I guess.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Got it.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, yeah, like the ratio of this 200 trillion or 100 trillion parameters over the, like, the potential optimal of 2 trillion, that's the amount that's overtrained, which is like a factor of 100 overtrained, perhaps.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I mean, this is why you should just approximate everywhere because there's so big error bars on this.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

But yeah, it's kind of empowering to just set A equal to B and figure it out.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

That's super cool.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, I mean, why specifically 50%?