Noam Shazeer

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But we sort of had some early evidence that seemed like it might be possible.

642.361 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So we're like, great, let's build the whole chip around that.

646.327 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And then over time, I think you've seen people able to use much lower precision for training as well.

649.791 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But also the inference precision has, you know, gone.

656.08 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

People are now using INT4 or FP4, which sounded like if you said to someone, like, we're going to use FP4 to like a supercomputing floating point person 20 years ago, they'd be like, what?

659.185 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

That's crazy.

669.479 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

We like 64 bits in our floats.

670.74 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Or even below that, some people are quantizing models to two bits or one bit.

674.765 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And I think that's a trend to definitely pay attention to.

679.53 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Yeah, just a zero or one.

683.814 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And then you have like a sign bit for a group of bits or something.

684.915 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Then you're like, yes, quantization is irritating, but your model is going to be three times faster, so you're going to have to deal.

744.252 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

Yeah, so, I mean, let me start with the undergrad thesis.

789.288 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So I kind of got introduced to neural nets in one section of one class on parallel computing that I was taking in my senior year.

792.757 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

and I needed to do a thesis to graduate, like an honors thesis.

800.075 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And so I approached the professor and I said, oh, it would be really fun to do something around neural nets.

803.34 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

So he and I decided I would sort of implement a couple of different ways of parallelizing backpropagation training for neural nets in 1990.

807.927 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

And I called him something funny in my thesis, like pattern partitioning or something.

817.221 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

But really I implemented a –

822.289 View full episode →

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

model parallelism and data parallelism on a 32-processor hypercube machine.

824.172 View full episode →

Appearances Over Time

Podcast Appearances

Sign in to Audioscrape

Share this moment