Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Noam Shazeer

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
692 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And that's not to say we should exactly mimic that because

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

silicon and, you know, wetware have very different characteristics and strengths.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

But I do think one thing we could draw inspiration, more inspiration from is

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

this notion of having different specialized portions, sort of areas of a model of a brain that are good at different things.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So we have a little bit of that in mixture of experts models, but it's still very kind of structured.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And I feel like this kind of more organic growth of expertise, and when you want more expertise of that, you kind of add some more capacity to the model there and let it learn a bit more on that kind of thing.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And also this notion of like adapting,

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

the connectivity of the model to the connectivity of the hardware is a good one.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So I think you want incredibly dense connections between artificial neurons in sort of the same chip and the same HBM, because that doesn't cost you that much.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

But then you want a smaller number of connections to nearby neurons.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So like a chip away, you should have some amount of connections.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And then like many, many chips away, you should have a smaller number of connections where you send over a very limited kind of bottlenecky thing, the most important things that this part of the model is learning for other parts of the model to make use of.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And even across multiple TPU pods, you'd like to send even less information, but the most salient kind of representations.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And then across metro areas, you'd like to send even less.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Yeah, I'd like that to emerge organically.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Like you could hand specify these characteristics, but I think you don't know exactly what the right proportions of these kinds of connections.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And so you should just let the hardware dictate things a little bit.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Like if you're communicating over here and this data always shows up really early, you should add some more connections.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Then it'll make it take longer and show up at just the right time.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I think there's a notion of like how much compute do you want to spend on this particular inference.