Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
So can you try selecting the worst examples?
Can you try selecting the best examples?
All of these very highly non-random data sets.
And what we found is that across the board, these things worked pretty well.
And I would say follow...
Follow-up work that we've done shows that it works not just for CIFAR-10, but across a bunch of other different data sets, although how well it works for different data sets has varied a little bit.
As far as we can tell, it seems to work on more complex models as well.
We've tried scaling up the models pretty intensely.
And actually, there's this folklore intuition we have, which is completely unverified.
So take this with a grain of salt.
But if you think about when, as deep neural networks grow in width and in overparameterization, this linear approximation in their parameter space
tends to work better and better.
They sort of look more linear in their parameter space, and there's an interesting connection between linearity in parameter space and linearity in data space.
We sort of know that this linear approximation in data space works really well for very over-parameterized linear models.
And so, again, folklore intuition is that as your deep neural network approaches a very over-parameterized linear model, this linear approximation actually works even better.
Yeah, it's a great question that we thought a lot about, and I think
In an ideal world, you could come up with a really smart adaptive sampling algorithm for this problem because you know where you need information and where you don't need information.
You can think of each subset that you sample as giving you some extra information about the value of those data points or something like that.
Um, unfortunately I think because of sort of the way our compute is structured, that like we have tons of GPUs available at once.
Um, and so we like to saturate them all.