Menu
Sign In Search Podcasts Libraries Charts People & Topics Add Podcast API Blog Pricing

Reiner Pope

πŸ‘€ Speaker
1157 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

It's one of the main mechanisms of constructing ciphers.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Often you want ciphers to be invertible, especially the layers of ciphers you want to be invertible because that has better cryptographic properties.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

This has actually been ported over into neural nets.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

There's a 2017, 18 paper called Rev Nets, Reversible Networks.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And what it does is it actually makes the entire, like you can apply it to any network, like a transformer network.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

You can make, I do a forwards pass, but then I can actually run the entire pass backwards as well.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

So the whole neural network is invertible.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

with exactly this construction.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And so this paper, reversible networks, like applied to some layer, like a transformer layer, for example, we've got this function F, which is our transformer layer.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Now, normally we would have just an input and then a residual connection coming out and it gets added like this.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

over here.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

But now the variation of this is going to be we've got two inputs, x and y. So we've got x and y inputs.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

x goes through the function, gets added to y. And then this becomes the new x, the output x.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

And then this X becomes the output Y. So really what this is doing, this is actually sort of doing, if you think of two layers back, this is actually the thing you mentioned before.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

It's actually doing the residual connection from two layers back.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Like this Y came from the previous layer and was the residual connection there.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

But because of this construction, the whole thing is invertible.

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

Why do I care?

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

What does invertible matter for?

Dwarkesh Podcast
Reiner Pope – The math behind how LLMs are trained and served

The big thing that it can be interesting for is for training.