Noam Shazeer
๐ค SpeakerAppearances Over Time
Podcast Appearances
That's actually a bit of a problem with neural nets is they're so tolerant of noise.
You can have things set up kind of wrong in a lot of ways, and they just kind of figure out ways to work around that or learn.
Maybe an experiment takes an hour to run or two hours or something, not two weeks.
It's great.
How do they interact?
Right.
You think maybe they're independent, but actually maybe there's some funny interaction between, you know, improving the way in which we handle video data input and the way in which we, you know, update the model parameters or something.
And that interacts more for video data than some other thing.
You know, there's all kinds of interactions that can happen that you maybe...
don't anticipate.
And so you want to run these experiments where you're then putting a bunch of things together and then periodically making sure that all the things you think are good are good together.
And if not, understanding why they're not playing nicely.
Yeah, I mean, I think most things you don't even try to stack because they...
the initial experiment didn't work that well or it showed results that aren't that promising relative to the baseline.
And then you sort of take those things and you try to scale them up individually.
And then you're like, oh, yeah, these ones seem really promising.
So I'm going to now include them in something that I'm going to now bundle together and try to advance and combine with other things that seem promising.
And then you run the experiments and then you're like, oh, well, they didn't really work that well.
Let's try to debug why.
Yeah, I mean, I think it's super important to understand what's going on and what the trends are.