Jeff Dean
๐ค SpeakerAppearances Over Time
Podcast Appearances
relatively cheap compared to training the system from scratch, then it could actually make, yeah, that could actually make research much, much cheaper and faster.
Yeah.
So, okay.
Okay, let's figure it out and do that next.
Yeah, you've been building the infrastructure.
Maybe.
There have been times like the way the TPU pods were set up.
I don't know who did that, but they did a pretty brilliant job.
The low-level software stack and the hardware stack that, OK, you've got your nice, regular, high-performance hardware.
You've got these great torus-shaped interconnects.
And then you've got the right low-level collectives, the all-reduces, et cetera, which I guess came from supercomputing, but it turned out to be kind of just the right thing to build distributed deep learning on top of.
Actually, in the past, I found experts to be relatively easy to understand.
I mean, I don't know.
The first mixture of experts paper, you could just, like, look at the expert.
Yeah.
Like, yeah, you could just see, okay, like, this expert, like, we did, you know, 1,000, 2,000 experts.
Okay, and this expert, like, all of the, was getting words referring to cylindrical objects.
Yeah.
Talking about time.
It was actually pretty easy to do.