Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
Yeah, that's right.
So at least in our paper, we tried up to 300 million parameter models and got some pretty promising results.
I think we would like to do a lot more work on trying to understand where the limits are of this.
The track estimator for data models is definitely more efficient in the sense of if you train with 1,000 less models,
It'll work about as well, but it's not going to be sort of like its limits are not going to be as good as you could get with this regression based estimator.
And so I think trying to understand the limits and also trying to understand when or where it doesn't work or does work, I think is like is a great direction for future work.
Yeah, so I think part of the nice thing that we did in the track papers is crystallize some of our thoughts from the original data models paper about how we should actually go about evaluating these broad clouds of data attribution methods.
And so I think one broad problem in general with data attribution is that if I come up to you with some method for some data model and you come up to me with some other method for assigning value to data points, there's not really a great way of deciding whose is better.
And so I think one thing that we tried to put forward in the track paper, and this is almost taken directly from the original data models paper, is that the way we should evaluate these things is by using this correlation that I was talking about earlier between predicted model outputs and true model outputs.
And so once you do this sort of quantitative evaluation of data attribution methods, it sort of allows you to see this existing trade-off that was present in the literature between fast methods that were not super predictive of model predictions or of model behavior and extremely slow methods like the regression-based estimator I was talking about that were very predictive of model behavior.
And so you can sort of now view this goal of data attribution as like trying to trace out better and better Pareto frontiers of this tradeoff between efficacy and speed.
Yeah, so that's exactly this correlation that I was talking about.
So what that looks like is the thing you're evaluating is some function that maps from a test data point to a vector of scores, one for each training data point.
So this is like broadly.
what was recognized at the time as a data attribution method.
And there are a bunch of these.
Like you were saying, there's the Shapley value.
There's the influence function.
There's a whole bunch of these.
And the neat thing about all of these is that they all have an interpretation as a linear data model.