Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing

Noam Shazeer

👤 Person
692 total appearances

Appearances Over Time

Podcast Appearances

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But we sort of had some early evidence that seemed like it might be possible.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So we're like, great, let's build the whole chip around that.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And then over time, I think you've seen people able to use much lower precision for training as well.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But also the inference precision has, you know, gone.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

People are now using INT4 or FP4, which sounded like if you said to someone, like, we're going to use FP4 to like a supercomputing floating point person 20 years ago, they'd be like, what?

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

That's crazy.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

We like 64 bits in our floats.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Or even below that, some people are quantizing models to two bits or one bit.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And I think that's a trend to definitely pay attention to.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Yeah, just a zero or one.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And then you have like a sign bit for a group of bits or something.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Then you're like, yes, quantization is irritating, but your model is going to be three times faster, so you're going to have to deal.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Yeah, so, I mean, let me start with the undergrad thesis.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So I kind of got introduced to neural nets in one section of one class on parallel computing that I was taking in my senior year.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

and I needed to do a thesis to graduate, like an honors thesis.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And so I approached the professor and I said, oh, it would be really fun to do something around neural nets.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So he and I decided I would sort of implement a couple of different ways of parallelizing backpropagation training for neural nets in 1990.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And I called him something funny in my thesis, like pattern partitioning or something.

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But really I implemented a –

Dwarkesh Podcast
Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

model parallelism and data parallelism on a 32-processor hypercube machine.