Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
So the thing that inspired them, the right way to interpret them, is as coefficients of a model that tries to predict these data counterfactuals.
And so we were like, OK, if what these things are doing are trying to predict data counterfactuals, then the way we should evaluate them is basically we should try to tell how good they are at evaluating data counterfactuals.
So the LDS is exactly generating a bunch of data counterfactuals, seeing what these different methods predict will happen under those data counterfactuals, and then looking at the correlation between predictions and reality.
Yeah, of course.
So like I was saying, the key idea behind TRAQ is really this approximation of your neural network as a linear model in parameter space.
And so if you take a step back and you think about, what if we were just doing logistic regression?
What if there were no neural network or anything?
It turns out that in this case, like we were saying earlier, the influence function is a really good data model.
It's a very good approximation for getting data counterfactuals.
The problem is that we're not doing logistic regression.
We have this crazy big neural network or whatever.
So let's start with thinking about a two-class neural network.
So this is still a binary classification problem, but now we have a neural network instead of logistic regression.
What we can do is we can train that neural network, or we can apply our learning algorithm once, and we get a neural network.
Once we do that, that neural network has a corresponding parameter vector theta, which are just the weights of all of the neural network.
What we're going to do then is treat the output of that neural network as a linear function in theta by doing a Taylor approximation around those final parameters.
So what that looks like is if your normal neural network was like f of, you know,
you're going to make a new neural network that looks like f hat of theta, which is now a linear function in theta times your gradient evaluated at theta star.
And so this is a very classical trick, especially in deep learning theory.
It's called the empirical neural tangent kernel.