Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So data on pre-training.

5479.856 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

This is not well-cited, but... You want me to not remove that?

5482.683 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

No, it's fine.

5485.47 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And I think often active params, number of active params could be in the range of like 100 billion, something like that.

5486.954 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

5495.804 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Maybe a bit larger.

5496.064 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I'm assuming active params are about 100 billion.

5498.507 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so multiply by 20 to get the chinchilla token count.

5500.609 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So chinchilla, the chinchilla would be around 2 trillion.

5503.993 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And yeah, and we see like we're at 100 times larger than that.

5512.362 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Actually, what does de-chinchilla actually mean?

5516.654 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like the token count for pre-training for the chinchilla scaling law would recommend, I guess.

5518.741 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Got it.

5528.669 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So, yeah, like the ratio of this 200 trillion or 100 trillion parameters over the, like, the potential optimal of 2 trillion, that's the amount that's overtrained, which is like a factor of 100 overtrained, perhaps.

5529.17 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I mean, this is why you should just approximate everywhere because there's so big error bars on this.

5570.592 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

But yeah, it's kind of empowering to just set A equal to B and figure it out.

5574.916 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, yeah.

5579.48 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

That's super cool.

5579.86 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

5615.065 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So, I mean, why specifically 50%?

5616.386 View full episode →

← Previous Page 42 of 58 Next →

Report any issue