Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Dwarkesh Patel

๐Ÿ‘ค Speaker
15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And if you want to do pipeline and training, in order to avoid that bubble, you need to... Should we draw the training diagram?

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

It may be worth clarifying, the reason there is that hard stop is because you want to do a whole batch at once for the backward step.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And then there is an optimal size for how big that batch should be.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

but there's this harder trade-off during training.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Last week, Horace was kind enough to give me and my friends a great lecture on large-scale pre-training systems.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And there were some concepts that I wanted to animate for a write-up on my blog, like how weight shard and gradients flow depending on the parallelism that you're using.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So I gave Cursor my lecture notes and a sketch that I'd made during the lecture, and I asked it to visualize a specific hierarchical collective that Horace had explained.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The first version was already pretty good, and then I was able to use design mode to select and tweak any specific components from there.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I was able to do all of this without a clear end state in mind.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Cursor's Composer 2 fast model was quick enough that I was able to iterate almost instantaneously.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I could try an idea, test the results in the built-in browser, and immediately make any changes.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

I went through 10 different versions in under 20 minutes.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

If you want to check out this animation, I published it along with the lecture notes in a blog post.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

The link is in the description.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

And if you want to try out this kind of iterative design flow for yourself, go to cursor.com slash lorecash to get started.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

So, macro question.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Everybody's talking about the memory wall right now.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Memory's getting super expensive.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

There's not enough memory.

Dwarkesh Podcast
Reiner Pope โ€“ The math behind how LLMs are trained and served

Smartphone volume will go down 30% because there's not enough memory.