Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
Like all of the data attribution methods just performed really poorly according to their, they like measured basically the overlap between what your data attribution method claims the most important examples are compared to which, you know, looking at the overlap between that set and the set of actually logically entailing training examples.
And they found that everything performs really poorly.
And so, you know, we saw this before writing the track paper and we're like, this is a perfect test bed for, you know, trying out track.
And so we tried it out and we found that there was a higher overlap than any other data attribution method, but it was still getting beat by this information retrieval baseline.
And so we were like, okay, what's, you know, can we do something like, like what's going on here?
Cause we were just kind of like, we shook our faith a little bit in, in our method.
We were like, maybe we're not actually finding anything, anything meaningful.
And so we designed, we took their exact data set and we designed this slightly alternate evaluation metric.
where we say, okay, instead of just measuring the overlap with the ground truth facts, why don't we try removing the examples that are flagged by our method and then retraining the model and seeing how many of the test facts that model gets wrong.
So if you actually identify the right things, then when you delete them, your model should get the corresponding test examples wrong.
And so you can try this, for example, for the top 100 track identified training examples.
And then you can try it again for the top 100 information retrieval training examples.
And then as a baseline, you can try dropping out all of the ground truth, like entailing facts in the training set.
And for each of those counterfactuals, you can study sort of what the model's behavior on the test example looks like.
Yeah, yeah, yeah.
So it's a very similar.
Basically, the evaluation mirrors a lot like, can we scrub certain test examples?
And so once we've done this counterfactual evaluation, what we found is actually that the track-identified training examples had a larger counterfactual effect on the outputs than any of the other data attribution methods, than the information retrieval method, and even than the ground truth facts.
And so this is really confusing.
There are a bunch of explanations of this in our paper for why we think there are a lot of syntactic relationships that are not entailment relationships in the dataset.