Noam Shazeer

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And then we have a bunch of work in even early brain days when we were using CPU machines and they were really slow.

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So we needed to do asynchronous training to help scale where each copy of the model would kind of do some local computation and then send gradient updates to a centralized system and then apply them.

3803.645 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

asynchronously and another copy of the model would be doing the same thing.

3814.962 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You know, it makes your model parameters kind of wiggle around a bit and it makes people uncomfortable with the theoretical guarantees, but it actually seems to work in practice.

3818.829 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So one way to do that is you effectively record the sequence of operations.

3863.463 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So like which gradient update happened and when and on which batch of data.

3867.829 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You don't necessarily record the actual gradient update in a log or something, but you could replay that log of operations

3874.077 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So that you get repeatability.

3881.922 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Then I think you'd be happier.

3884.366 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

I mean, the thing that let us go from asynchronous training on CPUs to fully synchronous training is the fact that we have these super fast TPU hardware chips and then pods, which have incredible amounts of bandwidth between the chips and a pod.

3912.122 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And then scaling beyond that, we have really good data center networks and even cross metro area networks.

3927.506 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

that enable us to scale to, you know, many, many pods in multiple metro areas for our largest training runs.

3933.796 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And we can do that fully synchronously, as Noam said, as long as the gradient accumulation and communication of the parameters across metro areas happens, you know, fast enough relative to the step time, you're golden.

3939.964 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You don't really care.

3951.78 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But I think as you scale up, there may be a push to have a bit more asynchrony in our system than we have now.

3954.383 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Because, like, we can make it work.

3963.498 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

I've been, you know, our ML researchers have been really happy how far we've been able to push synchronous training because it is easier mental model to understand.

3965.582 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You know, you just have your algorithm sort of fighting you rather than...

3974.016 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

The asynchrony and the algorithm kind of battling you.

3978.143 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Maybe it's your adversarial machine MUQQ17 that is like setting the seventh bit of your exponent and all your radians or something.

4001.549 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment