Menu
Sign In Search Podcasts Charts People & Topics Add Podcast API Blog Pricing

Andrew Ilyas

๐Ÿ‘ค Speaker
638 total appearances

Appearances Over Time

Podcast Appearances

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

And what we really wanted to do was test this out.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

And so we designed this really simple experiment.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

We took a data set, just a standard image classification data set, and we took the training set, and we took a fixed model, and we made an adversarial example out of every single sample in the training set.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

So what that means is that, let's say, for simplicity, let's say you just have a dogs versus cats training set.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Every cat you adversarially perturb to become a dog, every dog you adversarially perturb to become a cat.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

And so the result is this data set of cats that have been slightly perturbed to be classified as dogs and dogs that have been slightly perturbed to be classified as cats.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Now, once you've done that, what we're going to do is we're going to relabel

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

the data according to the adversarial class.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

So the resulting dataset looked completely misclassified to a human.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

You just have a bunch of cats labeled as dogs and dogs labeled as cats.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Of course, these are all slightly perturbed images.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

We then throw away the original model, and we throw away the original data set, and we just train a new model on this data set of mislabeled dogs and mislabeled cats.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

And the question is, what's going to happen with this new model if we test it on the original data distribution?

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

So just clean pictures of cats labeled as cats and dogs labeled as dogs.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

And if you think back to this useful versus useless features dichotomy, you should basically only expect two things.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Either the classifier will get 0% because you've trained it on a bunch of cats labeled as dogs and dogs labeled as cats, and you're testing it on correctly classified images.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Or maybe you're slightly more skeptical and you're like, well, actually, I think it could get 50%.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

Because it's possible that by making these adversarial examples in the training set, you're actually introducing some variation along the useless features.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

And then the model might just cling to that useless variation and then get 50% on the clean data.

Machine Learning Street Talk (MLST)
Adversarial Examples and Data Modelling - Andrew Ilyas (MIT)

The interesting thing, though, is that when you train a classifier on this entirely mislabeled data set, that classifier gets 90% accuracy.