Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
But I think the core challenges there is that we have a very poor handle on like features in natural images.
Like we don't really have a way of like
I mean, if you think about what's the space of features for an image, it's basically infinite.
And we don't really have a great way of removing a non-robust feature or adding a non-robust feature or anything like that.
The fine tuning or robustifying networks post hoc is really interesting and has been a big source of study.
I think that there has been some work trying to do this via randomized smoothing at test time, for example, or by fine-tuning the network using a robust objective or something like that.
But I would say we've had less progress on that than algorithm-focused approaches.
Yeah, it's a fascinating question.
I think there are a lot of people studying this inductive or implicit bias as well, which deals with a very similar thing.
And I think all of these are getting at this core problem that the space of features that neural networks can learn is unfathomably large because of how over-parametrized they are and the fact that they can, for example, memorize the entire training set with random labels.
And so they have almost infinite features to choose from.
And so there's this interesting question of like, okay, if we just leave things alone and we just run SGD with this architecture, what features will this network actually converge to?
And I think that's almost like a more relevant question than like what features can they represent?
Because the answer to the latter question is almost always like any feature.
And so you can view like, I think, I know the work you were talking about, about like the mass autoencoders, you can view this as like sort of what knobs do we have to change neural networks inductive biases?
And similarly, like the paper by Rob Garros, the texture versus shape bias one.
They similarly were trying to figure out what knobs do we have to play with to change this texture versus shape bias, whether that's the addition of training data that has different styles with the same class.
And I think a really natural perspective to view adversarial training from robust optimization
is exactly saying like, okay, the inductive bias or the implicit bias of our neural networks is leading us towards features that are great, just happen to be not adversarially robust.
And so you can view adversarial training as basically trying to change those inductive biases so that they lead us towards features that are robust.