Jeff Dean

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But I mean, not that you would need that human understanding to figure out how to work the thing at runtime because you just have some sort of learned router that's

6161.828 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

I mean, for any sort of even existing mixtures of experts, you want the whole thing in memory.

6281.682 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

I mean, basically, if you are, I guess there's kind of this misconception running around with like mixture of experts that, okay, the benefit is that

6287.651 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

you don't even have to go through those weights in the model if some expert is unused, it doesn't mean that you don't have to retrieve that memory.

6299.148 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Because really, in order to be efficient, you're serving at very large batch sizes.

6309.812 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Of independent requests.

6315.905 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Right, of independent requests.

6317.367 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So it's not really the case that, OK, at this step, you're either looking at this expert or you're not looking at this expert.

6319.47 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Because if that were the case, then when you did look at the expert, you would be running it at batch size one, which is massively

6328.985 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Inefficient.

6336.897 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Like you've got modern hardware, the operational intensities are whatever, hundreds.

6337.758 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So that's not what's happening.

6345.911 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

It's that you are looking at all the experts, but you only have to send a small fraction of the batch through each one.

6349.737 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

You definitely want to have at least enough HBM to put your whole model.

6469.487 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So depending on the size of your model, most likely that's how much –

6476.638 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

That's how much HBM you'd want to have at a minimum.

6482.967 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

I mean, yeah.

6486.614 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

I mean that even the data control modularity stuff seems like really cool because then you could have like the piece of the model that's just trained for me.

6697.027 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So it knows all my private data.

6704.674 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

We're going to need like a million automated researchers to invent all of this stuff.

6726.001 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment