Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
You take the difference.
You divide by 2 epsilon.
And that's the partial derivative with respect to that pixel.
You do that for every pixel.
You have a gradient.
And then you can use those gradients to attack.
a production system with only query access.
And so our first Blackbox adversarial attack paper was basically speeding that up or making that more query efficient so you didn't have to spend tens of thousands of dollars in API credits.
And also introducing the step model where, you know, often it's not like you upload your image and then someone replies with a bunch of logits.
Instead, you usually just get sort of one prediction.
Like I get to upload an image and then the thing tells me like, this is a cat or like this is a dog and that's all I have.
And so we basically adapted these techniques to be used in the setting where you only have hard labels.
That was 2018.
Yeah, so in the first paper we did, which was during my undergrad, we were just focused on these two settings, what we call the query limited setting, where you don't want to spend a bunch of money on APIs, and what we called the hard label setting, where you have the sparser information.
But we weren't super focused on, like,
optimizing the actual estimator very much.
We just used like an off the shelf, this like natural evolution strategies or spherical gradient estimator, has a bunch of names.
But we just used like a very standard sort of first zero-throttle optimization algorithm.
In this follow-up paper that I did during my PhD, which was joint with Logan Engstrom and our advisor, we basically looked at the algorithm itself for doing these black box attacks.
And we subbed in this class of algorithms from zero third order optimization called banded algorithms.