Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

๐Ÿ‘ค Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I mean, I think starting with equalizing in cost is right, but depending on how you model the cost, this comes close to equalizing in data.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Which way are people going to err?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

If you think that people's power of prediction is not perfect, and also you run the risk that you make a model that is not a frontier model, and then you just throw it away, then that changes the cost trade-off because there's some probability that applies to the inference, and you should derate the inference tokens by some amount.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, so I think we just have to make some real-world assumptions here in order to do that.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So the inference tokens we should totally be able to catch, right?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So let's say a few hundred million, I don't know, maybe it's like 500 million tokens a second now, I don't really know.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

500 million tokens a second times a model is deployed for two months before it becomes obsolete?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I don't really know.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I can't do this in my head.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Can you type it into a computer?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

uh 2.6 times 10 to the 15th okay 2.6 uh 10 to the 15th okay um um this number is probably too large this um because this is going to be multiple models in a family we so let's let's make it like five times smaller or ten times smaller or something like that um uh

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Okay, so we're estimating maybe 50 million tokens per second per specific model.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The model is live for two months, and so this comes out to around 200 trillion tokens.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then we want to compare that to active parameters on a frontier model.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I don't actually know the latest rumors, but some

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Do you know who for?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Trained on $150 trillion tokens.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Interesting.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Which is similar.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Yeah, that's actually similar.