Dwarkesh Patel

👤 Speaker

15267 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And if you want to do pipeline and training, in order to avoid that bubble, you need to... Should we draw the training diagram?

3527.154 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

It may be worth clarifying, the reason there is that hard stop is because you want to do a whole batch at once for the backward step.

3583.175 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then there is an optimal size for how big that batch should be.

3589.95 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

but there's this harder trade-off during training.

3702.305 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Last week, Horace was kind enough to give me and my friends a great lecture on large-scale pre-training systems.

3760.068 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And there were some concepts that I wanted to animate for a write-up on my blog, like how weight shard and gradients flow depending on the parallelism that you're using.

3766.278 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I gave Cursor my lecture notes and a sketch that I'd made during the lecture, and I asked it to visualize a specific hierarchical collective that Horace had explained.

3774.811 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The first version was already pretty good, and then I was able to use design mode to select and tweak any specific components from there.

3784.655 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I was able to do all of this without a clear end state in mind.

3790.828 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Cursor's Composer 2 fast model was quick enough that I was able to iterate almost instantaneously.

3793.393 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I could try an idea, test the results in the built-in browser, and immediately make any changes.

3798.002 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I went through 10 different versions in under 20 minutes.

3802.732 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

If you want to check out this animation, I published it along with the lecture notes in a blog post.

3805.057 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

The link is in the description.

3809.624 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And if you want to try out this kind of iterative design flow for yourself, go to cursor.com slash lorecash to get started.

3810.986 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So, macro question.

3818.036 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Everybody's talking about the memory wall right now.

3820.94 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Memory's getting super expensive.

3823.384 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

There's not enough memory.

3824.605 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Smartphone volume will go down 30% because there's not enough memory.

3826.007 View full episode →

← Previous Page 28 of 764 Next →

Report any issue