Noam Shazeer
๐ค SpeakerAppearances Over Time
Podcast Appearances
One thing I remember was we'd been, in the early days of the brain team, we were focused on, let's see if we can build some infrastructure that lets us train really, really big neural nets.
And at that time, we didn't have GPUs in our data centers.
We just had CPUs.
But we know how to make lots of CPUs work together.
So we built a system that enabled us to train pretty large neural nets through both model and data parallelism.
So we had a system for unsupervised learning on actually 10 million randomly selected YouTube frames.
And it was kind of a โ
a spatially local representation.
So it would build up unsupervised representations based on trying to reconstruct the thing from the high-level representations.
And so we got that working and training on 2,000 computers using 16,000 cores.
And after a little while,
that model was actually able to build a representation at the highest level where one neuron would get excited by, you know, images of cats that, you know, it had never been told what a cat was, but it sort of had seen enough examples of them in the training data of head-on facial views of cats that that neuron would turn on for that and not for much else.
Similarly, you'd have other ones for human faces and, you know, backs of pedestrians and this kind of thing.
And so that was kind of cool because it's sort of from unsupervised learning principles building up these really high-level representations.
And then we were able to get, you know, very good results on the supervised ImageNet 20,000 category challenge that, like, advanced the state of the art by, like, 60% relative improvement, which was quite good at the time.
Yeah.
And that neural net was probably 50x bigger than one that had been trained previously, and it got good results.
So that sort of said to me, hey, actually scaling up neural nets seems like a โ I thought it would be a good idea, and it seems to be, so we should keep pushing on that.
I think we're an organized, the world's information company, and that's broader than information retrieval, right?
That's maybe...