Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
It's not clear directly how to apply it.
I think there are sort of midpoints between supervised learning and reinforcement learning, where maybe it makes slightly more sense.
Like an offline RL, for example, looks much more like supervised learning.
But in terms of applying it to online RL, it's kind of difficult because you need to figure out what it means to sample a subset of the space.
One sort of broad class of ideas that I haven't explored at all, but I'm really excited about thinking about is you can sort of imagine generalizing this framework to predict not just as a function of what exact data points you train on, but if you have some like parametric representation of how you're going to collect data,
and you learn the map from that parametric representation to predictions, then it's possible maybe to apply this beyond just the supervised learning case.
Because I think the knobs you have to tune in, you know, RL, for example, are not quite just like include or not include data point, but there are knobs that you can still tune.
Yeah, of course.
So as I mentioned, training and fitting these data models is quite expensive.
In their initial instantiation, you need tens of thousands of machine learning models trained on random data sets.
And so together with Sam Park, Christian Georgiev, and Guillaume Leclerc, and our advisor Alexander at MIT, we basically were thinking about how we could make this more efficient.
And the key is really to go back to this influence function work that you were talking about earlier, which, as we were talking about, works really well for these classical statistical models like linear regression or logistic regression.
But the assumptions that they rely on don't really hold in the deep learning setting.
And so our main idea was actually this two-step process where we're like, okay, first of all, what we're gonna do is we're gonna approximate whatever neural network or whatever class of neural networks we're interested in studying.
We're gonna approximate them as overparameterized logistic regressions.
And this is a well-known approximation called like the empirical neural tangent kernel.
But the idea is you basically look at where your network ended up, you look at those final parameters, and then you do a Taylor expansion in parameter space to basically what's called linearize your neural network.
So you really treat it as a linear function in parameter space.
And so what this allows you to do is now that we're back in linear function land, we can then go back and try to apply the influence function.
And so, you know, this plus a couple of other very important tricks and analyses allow us to basically build an estimator that works, you know, roughly about as well, like in relevant regimes, about as well as a regression-based data model at, you know, 100x less computation or even 1000x less computation.