Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Noam Shazeer

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
692 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

How does that flow work when you kind of

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

need a bit more information and then you want to put it back in the background for it to continue doing you know finding the hotels in brilliant or whatever um i think it's going to be pretty interesting uh and inference will be useful difference will be useful i mean there's also a compute efficiency thing in inference that you don't have in training and that

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Yeah, like as a good example of an algorithmic improvement is like the use of drafter models.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So you have like a really small language model that you do one token at a time when you're decoding and predict like four tokens.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And then you give that to the big model and you say, okay, here's the four tokens the little model came up with.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

check which ones you agree with and if you agree with the first three then you just advance and then you've basically been able to do a four four token with parallel computation instead of a one token with thing in the big model um and so those are the kinds of things that people are looking at to improve inference efficiency

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So you don't have this single token decode bottleneck.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Hello, how are you?

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

That sounds great to me.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I'm going to advance past that.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I mean, we're already doing it.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So we're pro multi-data center training.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I think in the Gemini 1.5 tech report, we said we used multiple metro areas and trained with some of the compute in each place and then a pretty โ€“

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

long latency, but high bandwidth connection between those data centers.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And that works fine.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

It's great.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Actually, training is kind of interesting because each step in a training process is, you know, usually for a large model is a few seconds or something at least.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So the latency of it being, you know, 50 milliseconds away doesn't matter that much.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Just the bandwidth.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Yeah, just bandwidth.