Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Noam Shazeer

๐Ÿ‘ค Speaker
See mentions of this person in podcasts
692 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

One thing I remember was we'd been, in the early days of the brain team, we were focused on, let's see if we can build some infrastructure that lets us train really, really big neural nets.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And at that time, we didn't have GPUs in our data centers.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

We just had CPUs.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

But we know how to make lots of CPUs work together.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So we built a system that enabled us to train pretty large neural nets through both model and data parallelism.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So we had a system for unsupervised learning on actually 10 million randomly selected YouTube frames.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And it was kind of a โ€“

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

a spatially local representation.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So it would build up unsupervised representations based on trying to reconstruct the thing from the high-level representations.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And so we got that working and training on 2,000 computers using 16,000 cores.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And after a little while,

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

that model was actually able to build a representation at the highest level where one neuron would get excited by, you know, images of cats that, you know, it had never been told what a cat was, but it sort of had seen enough examples of them in the training data of head-on facial views of cats that that neuron would turn on for that and not for much else.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Similarly, you'd have other ones for human faces and, you know, backs of pedestrians and this kind of thing.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And so that was kind of cool because it's sort of from unsupervised learning principles building up these really high-level representations.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And then we were able to get, you know, very good results on the supervised ImageNet 20,000 category challenge that, like, advanced the state of the art by, like, 60% relative improvement, which was quite good at the time.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

Yeah.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

And that neural net was probably 50x bigger than one that had been trained previously, and it got good results.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

So that sort of said to me, hey, actually scaling up neural nets seems like a โ€“ I thought it would be a good idea, and it seems to be, so we should keep pushing on that.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

I think we're an organized, the world's information company, and that's broader than information retrieval, right?

Dwarkesh Podcast
Jeff Dean & Noam Shazeer โ€“ 25 years at Google: from PageRank to AGI

That's maybe...