Noam Shazeer
๐ค SpeakerAppearances Over Time
Podcast Appearances
You distort the model or you hide parts of it and try to make it guess that it's a bird from...
this upper corner of the image or the lower left corner of the image.
And that makes the task harder.
And I feel like there's an analog for kind of more textual or coding related data where you want to, you know, force the model to work harder and you'll get more interesting observations from it.
I mean, Dropout was invented on images, but we're not really using it for text mostly.
That's one way you could get a lot more learning in a more large-scale model without overfitting is just make like 100 epochs over the world's text data and use Dropout.
But that's pretty computationally expensive.
But it does mean we won't run it.
Like even though people are saying, oh, no, we're almost out of like textual data.
I don't really believe that because I think we can get a lot more capable models out of the text data that does exist.
And they're pretty good at a lot of stuff.
Maybe not.
It's an interesting data point.
Yeah, I mean, I think we should consider changing the training objective a little bit.
Like just predicting the next token from the previous ones you've seen seems like not how people learn.
Right.
It's a little bit related to how people learn, I think, but not entirely.
Like a person might, you know, read a whole chapter of a book and then try to answer questions at the back.
And that's a kind of different kind of thing.
I also think we're not learning from visual data very much.