Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Pricing
Podcast Image

Dwarkesh Podcast

Jeff Dean & Noam Shazeer – 25 years at Google: from PageRank to AGI

12 Feb 2025

2h 14m duration
22986 words
4 speakers
12 Feb 2025
Description

This week I welcome on the show two of the most important technologists ever, in any field.Jeff Dean is Google's Chief Scientist, and through 25 years at the company, has worked on basically the most transformative systems in modern computing: from MapReduce, BigTable, Tensorflow, AlphaChip, to Gemini.Noam Shazeer invented or co-invented all the main architectures and techniques that are used for modern LLMs: from the Transformer itself, to Mixture of Experts, to Mesh Tensorflow, to Gemini and many other things.We talk about their 25 years at Google, going from PageRank to MapReduce to the Transformer to MoEs to AlphaChip – and maybe soon to ASI.My favorite part was Jeff's vision for Pathways, Google’s grand plan for a mutually-reinforcing loop of hardware and algorithmic design and for going past autoregression. That culminates in us imagining *all* of Google-the-company, going through one huge MoE model.And Noam just bites every bullet: 100x world GDP soon; let’s get a million automated researchers running in the Google datacenter; living to see the year 3000.Watch on Youtube; listen on Apple Podcasts or Spotify.SponsorsScale partners with major AI labs like Meta, Google Deepmind, and OpenAI. Through Scale’s Data Foundry, labs get access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you’re an AI researcher or engineer, learn about how Scale’s Data Foundry and research lab, SEAL, can help you go beyond the current frontier at scale.com/dwarkeshCurious how Jane Street teaches their new traders? They use Figgie, a rapid-fire card game that simulates the most exciting parts of markets and trading. It’s become so popular that Jane Street hosts an inter-office Figgie championship every year. Download from the app store or play on your desktop at figgie.comMeter wants to radically improve the digital world we take for granted. They’re developing a foundation model that automates network management end-to-end. To do this, they just announced a long-term partnership with Microsoft for tens of thousands of GPUs, and they’re recruiting a world class AI research team. To learn more, go to meter.com/dwarkeshTo sponsor a future episode, visit dwarkeshpatel.com/p/advertiseTimestamps00:00:00 - Intro00:02:44 - Joining Google in 199900:05:36 - Future of Moore's Law00:10:21 - Future TPUs00:13:13 - Jeff’s undergrad thesis: parallel backprop00:15:10 - LLMs in 200700:23:07 - “Holy s**t” moments00:29:46 - AI fulfills Google’s original mission00:34:19 - Doing Search in-context00:38:32 - The internal coding model00:39:49 - What will 2027 models do?00:46:00 - A new architecture every day?00:49:21 - Automated chip design and intelligence explosion00:57:31 - Future of inference scaling01:03:56 - Already doing multi-datacenter runs01:22:33 - Debugging at scale01:26:05 - Fast takeoff and superalignment01:34:40 - A million evil Jeff Deans01:38:16 - Fun times at Google01:41:50 - World compute demand in 203001:48:21 - Getting back to modularity01:59:13 - Keeping a giga-MoE in-memory02:04:09 - All of Google in one model02:12:43 - What’s missing from distillation02:18:03 - Open research, pros and cons02:24:54 - Going the distance Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

Audio
Transcription

Chapter 1: What transformative systems did Jeff Dean contribute to at Google?

0.031 - 26.956 Dwarkesh Patel

Today, I have the honor of chatting with Jeff Dean and Noam Shazier. Jeff is Google's chief scientist. And through his 25 years at the company, he has worked on basically the most transformative systems in modern computing, from MapReduce, Bigtable, TensorFlow, AlphaChip. Genuinely, the list doesn't end. Gemini now. And Noam is the single person most responsible for the current AI revolution.

0

26.976 - 48.047 Dwarkesh Patel

He has been the inventor or the co-inventor of all the main architectures and techniques that are used for modern LLMs, from the transformer itself, to mixture of experts, to mesh TensorFlow, to many other things. And they are two of the three co-leads of Gemini at Google DeepMind. Awesome. Thanks so much for coming on.

0

48.428 - 50.652 Noam Shazeer

Thanks for having us. Super excited to be here.

0

50.672 - 65.98 Dwarkesh Patel

OK. First question. Both of you have been Google for 25 or close to 25 years. At some point early on in the company, you probably understood how everything worked. When did that stop being the case? Do you feel like there was a clear moment that happened?

0

66.301 - 74.915 Jeff Dean

I mean, I know I joined. And at that point, this was like end of 2000. And they had this thing, everybody gets a mentor.

Chapter 2: How did Noam Shazeer's work influence modern LLM architectures?

75.576 - 91.544 Jeff Dean

And so I knew nothing. I would just ask my mentor everything. And my mentor knew everything. It turned out my mentor was Jeff. And it was not the case that everyone at Google knew everything. It was just the case that Jeff knew everything because he had basically written everything.

0

91.764 - 107.612 Noam Shazeer

You're very kind. I mean, I think as companies grow, you kind of go through these phases. When I joined, we were 25 people, 26 people, something like that. And so you eventually learned everyone's name. And even though we were growing, you kept track of all the people who were joining.

0

107.592 - 124.852 Noam Shazeer

At some point, then you lose track of everyone's name in the company, but you still know everyone working on software engineering things. Then you lose track of all the names of people in the software engineering group, but you at least know all the different projects that everyone's working on.

0

Chapter 3: What is the future of Moore's Law and its impact on AI?

125.533 - 133.802 Noam Shazeer

And then at some point, the company gets big enough that you get an email that Project Platypus is launching on Friday, and you're like, what the heck is Project Platypus? So I think...

0

133.782 - 141.533 Jeff Dean

Usually it's a very good surprise. You're like, wow, Project Cloud of Us. I had no idea we were doing that. And it turns out to be brilliant.

0

141.553 - 160.06 Noam Shazeer

It is good to keep track of what's going on in the company, even at a very high level, even if you don't know every last detail. And it's good to know lots of people throughout the company so that you can go ask someone for more details or figure out who to talk to. I think with one level of indirection, you can usually find the right person in the company if you have a good...

0

160.04 - 168.731 Noam Shazeer

network of people that you built up over time. How did Google recruit you, by the way? I kind of reached out to them, actually.

0

168.771 - 170.413 Dwarkesh Patel

And Noam, how did you get recruited?

170.433 - 176.501 Jeff Dean

What was it that you did? I actually saw Google at a job fair in 1999.

Chapter 4: How does Google plan to scale AI models and inference?

177.623 - 196.11 Jeff Dean

And I assumed that it was already this huge company that had no point in joining. Because everyone I knew used Google, I guess that was because I was a grad student at Berkeley. I guess I've dropped out of grad programs a few times. But it turns out that actually it wasn't really that large.

0

196.611 - 218.506 Jeff Dean

So it turns out I did not apply in 1999, but just kind of sent them a resume on a whim in 2000 because I figured it was my favorite search engine and figured I should apply to multiple places for a job. But then, yeah, it turned out to be... be really, really fun, looked like a bunch of smart people doing good stuff.

0

218.606 - 228.281 Jeff Dean

And they had this really nice crayon chart on the wall of the daily number of search queries that somebody had just been maintaining.

0

Chapter 5: What are the implications of automated chip design for AI?

228.501 - 246.343 Jeff Dean

And yeah, it looked very exponential. I was like, these guys are going to be very successful. And it looks like they have a lot of good problems to work on. So I was like, OK, maybe I'll go work there for a little while and then have enough money to just go work on AI for as long as I want after that.

0

246.363 - 248.185 Dwarkesh Patel

MARK MANDELMANN- In a way, you did that, right?

0

248.905 - 273.903 Jeff Dean

FRANCESC CAMPOY FLORES- Yeah, it totally worked out exactly according to my- MARK MANDELMANN- Sorry, you were thinking about AI in 1999? Yeah, this was like 2000. I remember in grad school, a friend of mine at the time had told me that his New Year's resolution for 2000 was to live to see the year 3000 and that he was going to achieve this by inventing AI.

0

274.063 - 298.479 Jeff Dean

I was like, that sounds like a good idea. Then You know, I didn't get the idea at the time that, oh, like, you could go do it at a big company. But, you know, I figured, hey, you know, a bunch of people seem to be making a ton of money at startups. Maybe I'll just make some money and then I'll have, you know, enough to live on and just work on AI research for a long time.

0

300.402 - 304.368 Jeff Dean

But, yeah, it actually turned out that Google was a terrific place to work in AI.

304.528 - 304.628

Yeah.

304.608 - 323.967 Noam Shazeer

I mean, one of the things I like about Google is our ambition has always been sort of something that would kind of require pretty advanced AI. You know, organizing the world's information and making it universally accessible and useful. Like, actually, there's a really broad mandate in there. So it's not like the company was going to do this one little thing and stay doing that.

324.467 - 332.775 Noam Shazeer

And also, you could see that what we were doing initially was in that direction, but you could do so much more in that direction.

332.975 - 347.946 Dwarkesh Patel

How has Moore's Law over the last two, three decades changed the kinds of considerations you have to take on board when you design new systems, when you figure out what projects are feasible? What are still the limitations? What are things you can now do that you obviously couldn't do before?

Chapter 6: How do Jeff and Noam envision the future of AGI?

424.162 - 454.949 Jeff Dean

Basically, what's happened is that at this point, arithmetic is very, very cheap. And moving data around is comparatively much more expensive. pretty much all of deep learning has taken off roughly because of that, because you can build it out of matrix multiplications that are n cubed operations and n squared bytes of Data communication, basically.

0

455.37 - 478.801 Noam Shazeer

Well, I would say that the pivot to hardware oriented around that was an important transition. Because before that, we had CPUs and GPUs that were not, you know, especially well suited for deep learning. And then, you know, we started to build, say, TPUs at Google that were really just reduced precision linear algebra machines. And then once you have that, then you want to... Right.

0

478.821 - 499.5 Jeff Dean

You have to see the insight that seems like it's all about... all about kind of identifying opportunity costs. Like, okay, this is something like Larry Page, I think, used to always say, like, our second biggest cost is taxes and our biggest cost is opportunity costs. And if he didn't say that, then I've been misquoting him for years. But...

0

499.48 - 525.423 Jeff Dean

But basically, it's like, what is the opportunity that you have that you're missing out on? And in this case, I guess it was that, OK, you've got all of this chip area, and you're putting a very small number of arithmetic units on it. fill the thing up with arithmetic units, you could have orders of magnitude more arithmetic getting done. Now what else has to change?

0

525.523 - 528.309 Jeff Dean

Okay, the algorithms and the data flow and everything else.

528.329 - 533.942 Noam Shazeer

I know, by the way, the arithmetic can be like really low precision, so then you can squeeze even more multiplier units in.

534.293 - 549.556 Dwarkesh Patel

No, I want to follow up on what you said that the algorithms have been following the hardware. If you imagine a counterfactual world where suppose that the cost of memory had declined more than arithmetic or just like the dynamic you saw over the last decade.

549.576 - 554.584 Jeff Dean

Okay, data flow is extremely cheap and arithmetic is not cheap.

Chapter 7: What challenges do AI researchers face in ensuring safety and alignment?

554.604 - 557.208 Jeff Dean

What would AI look like today? That's interesting.

0

557.228 - 559.812 Noam Shazeer

You'd have a lot more lookups into very large memories. Yeah.

0

559.792 - 591.563 Jeff Dean

Yeah, I mean, I think it might look more like AI looked like 20 years ago, but in the opposite direction. I'm not sure. I guess I joined Google Brain in 2012. I'd left Google for a few years, happened to go back for lunch to visit my wife. And we happened to sit down next to Jeff and the early Google Brain team. And I thought, wow, that's a smart group of people doing something.

0

591.583 - 611.727 Jeff Dean

ANDREW BROGDONER- I think I said, you should think about b-brill next, because we're making some pretty good progress here. FRANCESC CAMPOY- That sounds fun. So OK, so I jumped back in to join Jeff. That was like 2012. I seem to join Google every 12 years. I rejoined Google in 2000, 2012, and 2024. ANDREW BROGDONER- What's going to happen in 2036?

0

611.767 - 615.111 Dwarkesh Patel

FRANCESC CAMPOY- I don't know.

Chapter 8: What insights do Jeff and Noam share about their careers at Google?

615.547 - 625.82 Dwarkesh Patel

We shall see. What are the trade-offs that you're considering changing for future versions of TPU to integrate? How are you thinking about algorithms differently?

0

626.321 - 649.771 Noam Shazeer

I mean, I think one thing, one general trend is we're getting better at quantizing or having much more reduced precision models. You know, we started with TPU v1. We weren't even quite sure we could quantize a model for serving with 8-bit integers. But we sort of had some early evidence that seemed like it might be possible. So we're like, great, let's build the whole chip around that.

0

649.791 - 674.745 Noam Shazeer

And then over time, I think you've seen people able to use much lower precision for training as well. But also the inference precision has, you know, gone. People are now using INT4 or FP4, which sounded like if you said to someone, like, we're going to use FP4 to like a supercomputing floating point person 20 years ago, they'd be like, what? That's crazy. We like 64 bits in our floats.

0

674.765 - 682.613 Noam Shazeer

Or even below that, some people are quantizing models to two bits or one bit. And I think that's a trend to definitely pay attention to.

0

682.633 - 683.794 Dwarkesh Patel

One bit? Just like a zero or one?

683.814 - 690.161 Noam Shazeer

Yeah, just a zero or one. And then you have like a sign bit for a group of bits or something.

690.181 - 722.78 Jeff Dean

It really has to be a code design thing because, you know, if... if the algorithm designer doesn't realize that he can get greatly improved performance, throughput with the lower precision, of course the algorithm designer is going to say, of course I don't want low precision. That introduces risk. And then that's irritation. And then if you ask the chip designer, OK, what do you want to build?

722.88 - 744.232 Jeff Dean

And then they'll ask the person who's writing the algorithms today, who's going to say, no, I don't like quantization. It's irritating. So you actually need to basically see the whole picture and figure out, oh, wait a minute, we can increase our throughput to cost ratio by a lot by quantizing.

744.252 - 749.34 Noam Shazeer

Then you're like, yes, quantization is irritating, but your model is going to be three times faster, so you're going to have to deal.

Comments

There are no comments yet.

Please log in to write the first comment.