Andrew Ilyas
๐ค SpeakerAppearances Over Time
Podcast Appearances
So yeah, that was just one of the findings.
But we did this basically study into what kinds of biases can creep up as a function of how you collect data, not just the data itself.
And then on the theoretical side, probably my favorite work so far has been this work on understanding what's called self-selection bias.
So this is, again, very far from the deep learning regime.
We're just back to doing linear regression or something like that.
And what self-selection bias looks like is, I think the best way to illustrate it is with an example.
Let's say you like went to the canonical example from like the 50s is that you go to a village and everyone in the village is either like a hunter or a fisher.
That's the only two jobs available in the village.
And you're interested in sort of understanding how do people's features, like their height and their weight and how fast they are, translate into the money that they make from hunting or from fishing.
And so a very natural approach to this would be you go to the village, you survey people, you record their features, then you run a linear regression for hunting, and you run a linear regression for fishing, and then you're done.
What people have known for a really long time, though, is that this is a super biased approach.
It won't actually tell you how important the different features are for your hunting revenue or your fishing revenue.
And the reason is that all of these villagers had the choice of whether they wanted to do hunting or whether they wanted to do fishing.
And assuming they're rational people, they chose the thing that they were better at.
So like you don't get to see how good this hunter is at fishing and you don't get to see how good the fisher is at hunting.
And like the way they partitioned into those two groups is extremely non-random.
It's actually based on whichever one would have made them more money.
And so there's been a ton of work sort of throughout econometrics, economics, stats on like dealing with the self-selection problem.
And so
with some collaborators at MIT and Berkeley and my advisor Costas, we basically devised an algorithm, like an efficient algorithm that could recover from self-selection bias, as we called it, in high dimensions.