Sandra Matz
π€ SpeakerAppearances Over Time
Podcast Appearances
go and get a matcha latte at Starbucks on 72nd Street in New York at 7.20 a.m., then you have lunch in a certain place, and maybe you take a cab downtown at night, there's at some point only so many people who have exactly that same signature. So you can almost think of it as a fingerprint that is made up of your data.
Yeah, so that's in a way that the most interesting part of this entire field of research is like, yeah, we can identify you as a person, we can know that it's Shankar based on your data. But for me, the more interesting part is actually that we can dive into your psychology. So we can take a look at what's going on inside your mind.
And so the study that we did when we try to predict someone's income was essentially relying on their Facebook data. So what is it that people talk about and post on social media? And I think there were some really interesting, sometimes quite uncomfortable truth that we discovered.
But overall, the bottom line was that just by looking at what you talk about on Facebook, we can have a pretty good sense of your socioeconomic status.
Yeah. So when you, when you start opening the black box, um, what you see is some of them are like, some of the cues are relatively obvious. So you can imagine that people with a lot of money, they talk about the vacations that they're going to take. They talk about expensive luxury brands, um, a lot more often than people who are struggling to, to make, um, ends meet.
But there's also these more subtle cues that I found, um, even more interesting, which is for example, that lower income people, um, they talk more about themselves and they talk more about the present. than higher income people. And in the beginning, you might be wondering, why might this be the case?
And I think it's just that it's really damn hard to think of anything else other than how you make the present work if you're struggling to make enough money to put food on the table. So those are all these little, I think, secrets about what's going on inside our mind that we can uncover in the data.
Yeah. And I think that's the distinction between identity claims and behavioral residue that I think is so interesting, right? So again, you might post about this luxury vacation, and it's a very clear signal to the world that you're having a great time and you can afford going on this vacation.
But then all of these more subtle ones where you talk about yourself, you're more focused on the present, that's certainly something that we don't necessarily intend to reveal.
Yeah, so behavioral residue are all of the traces that we essentially inadvertently leave as we go about our life. In the offline context, you could imagine, again, that's like the bin overflowing, that's your socks not being organized, that's the bed not being made. And in the digital world, it's all of the traces that we generate without really thinking about it.
So that could be your smartphone, for example, captures your GPS records pretty much continuously 24-7. And you're not intentionally sitting down to create a record of where you went and what you did there. But still, those traces exist.
Yeah, that was really, so the research by Youyou Wu, I would say was one of the pivotal studies in this field because it showed just how accurate the predictions that we can make about someone's psychology really are based on relatively little data. So she was studying the Facebook pages that people follow. So let's say CNN has a Facebook page, you can like it.
And what she showed is that just by looking at your Facebook pages, an algorithm can actually predict your personality more accurately. than our co-workers could, than our friends could, than our family members could. And mind you, those are people who know you pretty well, right? Those are your parents, those are your siblings, those are your kids.
They've spent a substantial amount of time with you. And it was slightly inferior to the judgments and the predictions of your significant other. Now, this was a study that was done in 2015. It was only based on Facebook likes. So you could imagine that if we get access to
all of your digital traces and apply slightly more sophisticated machine learning that we could probably outperform even your significant other.
Yeah. And I think what is astonishing to me, and I think a point that is important, those models aren't perfect, right? So I think any prediction always has a certain amount of error. And what we're talking about are averages. So on averages, these models are really accurate, as you just said, with a comparison. However, we still make mistakes at the individual level.
So one of the things when we kind of make these comparisons and predictions that I want to highlight is that don't take it as a truth, right? It's a prediction. It's a probability. It's pretty damn accurate on average, but we're still going to make mistakes at the individual level.
Yeah, I think of it as like this puzzle that we're putting together of a person. So you get a piece here that's their social media and then you get another piece that's their credit card spending and another piece that's their smartphone sensing data. And gradually you kind of see this person behind the data emerge. And what I think is fascinating about this combining data sources is essentially
A lot of people always say when I talk about social media that, well, isn't it just like this curated identity of who we are? It's just like who we want to be. We all like matcha lattes and amazing vacations. We're never sad. So it's just like the self-idealized version of who we really are. That's true for some of these identity claims, right? Social media.
But if you wanted to, let's say you wanted to pretend that you're more organized and conscientious than you do really are, maybe you can do this on Facebook for a couple of weeks. It's really, really difficult to do this across all data sources and across like months and months and months.