Noam Shazeer
👤 PersonAppearances Over Time
Podcast Appearances
In one, you split all the examples into different batches, and every CPU has a copy of the model.
And in the other one, you kind of pipeline a bunch of examples along to processors that have different...
different parts of the model.
And I compared and contrasted them.
And it was interesting.
I was really excited about the abstraction because it felt like neural nets were the right abstraction that could solve tiny toy problems that no other approach could solve at the time.
but and I thought oh you know naive me oh 32 processors we'll be able to train like really awesome neural nets but it turned out you know we needed about a million times more compute before they really started to work for real problems but then starting you know in the
you know, late 2008, 2009, 2010 timeframe, we started to have enough compute, thanks to Moore's Law, to actually make neural nets work for real things.
And that was kind of when I sort of re-entered looking at neural nets.
But prior to that, in 2007... So actually, can I ask about this?
Yeah, it's four pages and then, like, 30 pages of C code.
Oh, yeah.
So that, we had a machine translation research team at Google led by Franz Och.
who had joined Google maybe a year before and a bunch of other people.
And every year they competed in a – I guess it's a DARPA contest on translating a couple of different languages to English.
I think Chinese to English and Arabic to English I think.
And the Google team had submitted an entry and the way this works is you get like – I don't know.
500 sentences on Monday and you have to submit the answer on Friday.
And so I saw the results of this and we'd won the contest and by a pretty substantial margin measured in blue score, which is like a measure of translation quality.
And so I reached out to Franz, the head of this winning team.