Reiner Pope

👤 Speaker

1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

If I think of a forward passive training, so I will, let's say I have four layers.

7935.513 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I run them in the 0, 1, 2, 3 order.

7940.238 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I have to write all of the activations to HBM.

7943.081 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so I get an HBM footprint here that is kind of like linear in number of layers.

7948.827 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yep.

7957.337 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this actually can be the largest memory footprint during training.

7959.513 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And so this is normal training.

7964.326 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And then I run the backwards pass and I read it kind of in reverse.

7966.232 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Like I run them sort of forward pass goes forward, backward pass goes backwards.

7969.04 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

And I have to read them back out.

7972.971 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

the idea of this RevNet's paper is that because it's invertible, I don't need to store this at all.

7974.65 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

I can completely rematerialize it when I'm running my backwards pass.

7982.501 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So I run my forwards pass, and then when I'm running my backwards pass, I'm simultaneously in lockstep undoing all of the forwards pass steps that I did in order to have the activations that I need here.

7985.765 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

So this ends up being a memory saving, which is a nice idea.

7998.122 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah.

8013.572 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Spending more memory to save compute is generally profitable given where hardware is at.

8014.695 View full episode →

Dwarkesh Podcast

Reiner Pope – The math behind how LLMs are trained and served

Yeah, interesting.

8019.244 View full episode →

← Previous Page 58 of 58 Next →

Report any issue