Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
if we forget for a second about the data collection process and we just assume that you have a data set, clearly changing that data set is going to change model predictions in some way.
And so what we were asking is, can we, without actually thinking about the very mechanistic details of the learning algorithm itself, can we sort of black box that away and think of machine learning as just a map directly from training data set to prediction?
It's an honor to be here.
Yeah, so my name is Andrew.
I'm a sixth year PhD student at MIT.
I'm advised by Alexander Modry and Kostas Daskalakis, hopefully graduating soon.
I work a lot on robustness and reliability with a focus on sort of looking at the entire machine learning pipeline from how we collect data to how we make it into data sets to what learning algorithms we use, and really trying to take a step back and look at the entire pipeline to answer questions about robustness and reliability.
Yeah, absolutely.
I think a big goal of my work is what I'd call predictability in machine learning systems.
So really whether we can understand the principles behind why they work well enough that when we put them into production, we understand both when they're going to work and when they're not going to work, and also ideally why.
Yeah, so I started at MIT in 2015.
Towards the end of my undergrad, I got really interested in this phenomenon of adversarial examples, just doing undergrad research with a couple of my friends.
We worked on a couple of papers together and just got really excited about the field.
um and i sort of into my phd work continued that uh interest for a while um working on both you know developing attacks but also trying to understand trying to start an understanding of why these things even arise um and i think you know gradually i can explain the whole path but gradually that brought me along to this uh to this conclusion that we really need to understand like the interaction between
training data and models and basically trying to get at some core of why machine learning works the way it does.
Yeah, so an adversarial example is just a very small perturbation to a natural input so that a machine learning model doing inference or behaving on that input misbehaves.
And so in the context of images, that could be changing a couple of pixels so that a classifier misclassifies the image.
There's been recent work on this in the context of language, where now you're trying to append a very small suffix or prefix to your prompt so that the language model does some unintended behavior.
But broadly speaking, it's about slightly changing inputs to make machine learning models misbehave.
Yeah, so I think about it in at least four or five different steps.