Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Jeff Dean

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
374 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

But I mean, not that you would need that human understanding to figure out how to work the thing at runtime because you just have some sort of learned router that's

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I mean, for any sort of even existing mixtures of experts, you want the whole thing in memory.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I mean, basically, if you are, I guess there's kind of this misconception running around with like mixture of experts that, okay, the benefit is that

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

you don't even have to go through those weights in the model if some expert is unused, it doesn't mean that you don't have to retrieve that memory.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Because really, in order to be efficient, you're serving at very large batch sizes.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Of independent requests.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Right, of independent requests.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So it's not really the case that, OK, at this step, you're either looking at this expert or you're not looking at this expert.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Because if that were the case, then when you did look at the expert, you would be running it at batch size one, which is massively

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Inefficient.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Like you've got modern hardware, the operational intensities are whatever, hundreds.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So that's not what's happening.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

It's that you are looking at all the experts, but you only have to send a small fraction of the batch through each one.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

You definitely want to have at least enough HBM to put your whole model.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So depending on the size of your model, most likely that's how much โ€“

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

That's how much HBM you'd want to have at a minimum.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I mean, yeah.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I mean that even the data control modularity stuff seems like really cool because then you could have like the piece of the model that's just trained for me.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So it knows all my private data.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

We're going to need like a million automated researchers to invent all of this stuff.