Powering location intelligence with geo social data

April 09, 2019 19 min read

Powering location intelligence with geo social data

 Listen here or subscribe to "The MapScaping Podcast" wherever you get your podcasts.

 

 

           

 

PUSH THE PLAY BUTTON TO LISTEN TO THE INTERVIEW OR SUBSCRIBE TO "THE MAPSCAPING PODCAST" WHEREVER YOU GET YOUR PODCASTS.

Lyden:
... and so the big insight for us is that through social media, and especially since we have mobile phones, are getting all of this geolocated data. People are expressing their interest, basically giving a journal of their life, and we're capturing and categorizing all that data, and using it to help cities and retailers to find the right spots in cities.

Daniel:
Hello. Welcome to another episode of the MapScaping podcast. My name is Daniel, and this is a podcast where I interview people that are doing really amazing things in the field of geospatial. Today, I'll be talking to Lyden Foust about geosocial data, how it's collected, and what you can do with it. I hope you enjoy the show.

Daniel:
Today on the show I've got Lyden Foust, and he's gonna talk to us about geosocial data.

Daniel:
Hi, Lynden.

Lyden:
How's it going, Daniel?

Daniel:
Good, thanks. Maybe before we jump into this extremely interesting subject, geosocial data, which I almost know nothing about, maybe you could tell us a little bit about your background.

Lyden:
Sure. Yeah. I think in the geo space, I started originally with Lord of the Rings. Have you read Lord of the Rings?

Daniel:
Yeah, I have. Several times.

Lyden:
Okay. So, do you remember the maps in Lord of the Rings in the beginning?

Daniel:
Yeah. Sure do.

Lyden:
That totally fascinated me. From there, I really got into SimCity as a kid, SimCity 2000, but grew up, went to school, and where my background was mainly ethnography. So, my job was to study cultures of areas for a period of time, and it was mainly market research; it wasn't real deep. It was for consumer brands trying to figure out what types of products they wanted, what type of communication would work.

Daniel:
So, as someone who's working with geosocial data that's got some sort of location attached to it, you don't have the real strong background in geography or cartography or geoscience?

Lyden:
No. I would say it's more in the social sciences is where I got my start.

Daniel:
Maybe we can just dive into it and you can tell us a little bit about what is geosocial data.

Lyden:
Sure. I'll give you the speil. We started because we saw that companies, in cities in particular, were making critical decisions using mainly census-generated demographic data, and they were missing out on all of the attitudes in the mindsets of communities in making their decisions. So, the big insight for us is that through social media, and especially since we have mobile phones, are getting all of this geolocated data where people were expressing their interest, basically giving a journal of their life, and we're capturing and categorizing all that data and using it to help cities and retailers find the right spots in cities.

Daniel:
How are you doing that? Obviously through social networks, but you're not scraping pages or anything. How are you capturing this data?

Lyden:
We either have good paid partnerships, these companies like Twitter, or there's open APIs. Essentially, what we're doing is capturing all the geo data and the categorizing conversations using a machine-clustering algorithm where we essentially allow the machine to do unsupervised learning and connect common themes, and then we'll go ahead and label those themes. So, what it looks like for people that are using this is you can see an index of how many people are talking about nightlife in this area, or active moms, or romantic-types stuff.

Daniel:
Okay. So, you're categorizing the data, which sounds like a difficult job. I can imagine categorizing unstructured data in this way would be tough, but you're using machine learning for that so you've got a fair bit of muscle behind there. Is this something that's happening on local servers, or where is the machine learning taking place?

Lyden:
It's a good question. So, we build it on local servers, but all of our work happens in the Cloud, all the way from the data coming in to the data going out to clients. We talked about this recently, just so lucky, with the compliments of technology that's come out: one, for the GPUs that we use to process the data; and two, all this Cloud infrastructure that makes this possible. It really wasn't even possible even a couple years ago to do what we're doing.

Daniel:
How many categories of data do you end up with?

Lyden:
You know, we can split it any way. We can even split it a thousand different ways if we wanted to, but probably our most usable is we bring it down to about 100 different types of conversations happening on social media. Then, there's another piece of this in that quite a few categories are not really meaningful or useful to retailers.

Daniel:
Yeah. I can imagine you'll quickly get to a point where it would almost get too fine, that maybe you couldn't really trust it.

Lyden:
Right. You can even test that in predictive models, right? So, if you can break it down into a thousand different parts, and then run that against the predictive model to see what's best for sales forecasting, it's not gonna work as well as if you have a broader category. That drives how many categories we have, too; what's the most effective in actual use.

Daniel:
Yeah. I guess if you can't tie it back to something, some sort of existing data in order to enrich it either, that must be a bit of something you think about.

Lyden:
Yeah. I think one thing that has really helped us is ... You know, when you're taking all this social information some people will look at that and say, "My grandson uses social media. That's not valuable to me," and when you're categorizing and indexing things like active moms or romantic-type behavior, you could argue that's subjective. So, what's really helpful is when you place these into predictive models, and then you can see the bottom-line impact, that if you place your store next to deal-seeker type activity in conversation, there is a bottom-line lift on revenue. That's when these things get really meaningful, is when you're quantifying what people have considered in the past these unquantifiable emotional things.

Daniel:
I guess the really important thing here to stress is that you're only really interested in data that has a location attached to it. Is that correct?

Lyden:
Yeah. You got it.

Daniel:
In a pre-interview talk, you talked about there were some areas that you didn't see a lot of activity, not a lot of geolocated data. What kind of areas would they be?

Lyden:
Oh, yeah. So, on cornfields. There's not a lot of people talking on social media in cornfields. We generally, because people analyze it on block groups, we generally share our data by block group, and anywhere where there's people will generally have enough data to be able to make some sort of meaningful decisions, but it's kind of ... You know, retailers aren't trying to make, and cities aren't trying to make, huge decisions in areas where they're mainly rural and there's no people out there.

Daniel:
No, that's true, but I'm thinking in terms of people do have strong feelings about something like a national park, for example. There maybe not a lot of people out there geotagging their ticks or Tweets saying, "Hey," a particular national park, but people do have a strong connection to these places and they do mean something for people. So, I could imagine that even though your data collection there is quite spars, that it's still an area where people get quite concerned if there was development going to happen there.

Lyden:
You know, we have one category that's outdoor challenges, and then we have another category that is essentially people taking pictures and photography of nature. So, we actually do have a fair amount of especially national park data, and what's interesting is typically this is the most valuable of the half-mile radius in predictiveness; but those two in particular, that's really valuable to analyze up a three mile radius; because you want to locate near, but obviously not in, the national parks. Also, as a correlation with how much beer people drink.

Daniel:
Interesting. What's the most interesting thing you found by looking at this data? Or do you guys just package it up and send it off? Do you do any analysis yourselves?

Lyden:
We package it up and send it off, but obviously we're talking with the retailers. Let me think of some interesting ones. I think a lot of them are obvious, but it's interesting to see them quantified. People that talk about books a lot, so an area that our indexed high for our bookish category, tend to have degrees in humanities and tend to be well-educated. So, that's one of those obviously ones, but it's really cool to see quantified.

Lyden:
I think one of the most interesting ones, and this is public so we did this at Payless and we did a report case study, is Payless was looking at two locations: one location was killing it, and the other location was a total bog and did not do well. They're the exact same on demographics, traffic data for traditional data sets that they use, but one location is doing well and one did terribly. So, what's the difference between those locations? We found that the one location that did really well had a high index of active moms, which demographically are a certain level, they tend to be deal seekers. The other location had a high index of urban fashion, they're very similar demographically to active moms; but if they have any money, they're gonna spend it on a pair of Jordan's because that's culturally important for them. It's those kinds of insights that you realize you can't treat all communities as the same; they are really unique.

Daniel:
That's really interesting. I's sure it provides some really interesting analysis for people. Any time, doing this kind of work, you can put on another layer of data, it must be interesting. It must be attractive for people to understand more.

Lyden:
Yeah. Absolutely.

Daniel:
Is there something this data couldn't be used for? Could you give me an example of a situation where you just, "Forget it. This is not the tool for the job"?

Lyden:
Yeah. There's some companies, like gas stations. What matters in gas stations are, "Are people driving by this?" You know, your attitude and personality has nothing to do with the gas station that you choose in general. If there's not even a slight emotional reason why you might choose to go one place or another, then it's probably not nearly gonna be as valuable.

Daniel:
Yeah. That must be really horrible for the people that spend millions and millions and millions of dollars marketing for different kinds of gas stations, only to discover it makes no difference at all. You just have to be there.

Lyden:
Yeah. You just have to show up in the right place.

Daniel:
Don't worry about the marketing. Just show up. Just be there.

Daniel:
So, you're collecting a fair bit of data, I'm imagining. How many platforms are you collecting from, and what kind of size of data? What kind of amount of data are we talking?

Lyden:
Anywhere where there's text data that is geolocated, which essentially means conversations, we're collecting from those platforms. So, you can think of the obvious ones like Facebook and Twitter, but the level of data coming in is really about eight billion data points or eight billion conversations per year, and that's actually on an uprising trend.

Daniel:
That doesn't surprise me that people are using social media more. It may be surprising me a little bit that people are geotagging things more.

Lyden:
Yeah. I think the value, too, geotagging is you geotag for a purpose. It's one of those things you have to click the button to say, "I wanna geotag this," generally. I think that it makes it even more meaningful because someone's trying to express the behavior or something that's true about themselves in an area.

Daniel:
Yeah. "At this point and at this time, this is important for me."

Lyden:
Uh-huh (affirmative).

Daniel:
So, you're collecting a lot of data and you're using what we would call in the GIS world, or remote sensor world, an unsupervised classification, so machine learning where you're just saying, "Show me all the bits that fit into these hundred different categories, 150 different categories. What have you got there?" Is this just for the US or you're doing this on a global scale?

Lyden:
It's just for the US right now, but we're collecting data on a global scale except for China and North Korea; we don't have any data there. But, yeah, we just haven't really launched outside of the US right now and we're still a startup, but we do have plans for it.

Daniel:
What's holding you back in terms of that?

Lyden:
All of our customers are in the US. So, as soon as we [inaudible 00:14:38] says, "Hey, can we turn this on for London?" we can totally do that because we have all the raw information; we just have to categorize and index it.

Daniel:
Okay. I'm sitting in Denmark here, this sounds like a great idea, and I need to find out where I'm gonna put my whatever, and I hear about you, "This is a great idea." How are you gonna send this data to me?

Lyden:
It's super simple. It's just a CSV and whatever. If you're using block groups or blocks to do the analysis, we can just send them that way, because all of our data is organized by latitude and longitude when it comes in.

Daniel:
Okay. So, you aggregate these categories. Does that mean that each block would have the dominant category? Or how would that work?

Lyden:
We send every single one of our 100 categories across, and there's just [inaudible 00:15:29], so we give it on percentiles based on that nation. So, ours obviously is the United States; and if people were talking about this 40% higher than average, then it's going to be at the 90th percentile. So, it comes across like that.

Daniel:
What amount of value can people expect? Does it depend on the situation, or the problem they're trying to solve, or the analysis they're doing? Or can you say across the board, "Hey, if you use this in your analysis, you will increase the likelihood of getting the correct answer by x percent"?

Lyden:
Sure. I mean I can give you a range here. We've seen on average people reduce their error by 25%, and their models were such big ... especially over a big scale. What excites me about that is that's saying people's attitude and mindsets really make a difference. Now, that 25% you might already have really impressive predictive models, so it's not really increasing the percent a ton.

Lyden:
I can give you an example, too, of when we didn't really ... and that actually goes back to your other question ... when we really didn't move the needle quite as much as I thought we would. Some companies, like grocery store companies, will have ridiculous loyalty data, incredible loyalty data; and if you're already capturing that much of your community that makes the decisions, this won't bump the model near as much. So, we've seen that happen before when the customer data is already spectacular. In other words, if your predictive model, if you can predict what you're gonna make at a location within 99.5%, you wouldn't expect nearly as high of a lift on your model from our data, but that is a real rarity.

Daniel:
I guess it depends on the size of your project and the size of your investment. If you're talking about tearing down the main street and putting a road through it, you wanna be as close to getting the right answer as possible; and if you can lift it by 1%, 2% and it doesn't ... you know. Because I'm thinking some of these models don't cost a great deal to run; they cost maybe more time and energy, in that respect. But if you're talking about an investment of billions and billions of dollars, then I think it would be a good thing to do. You're wanting to get it right.

Lyden:
Yeah. And right now when you're talking about building a road through somewhere, you can quantify the traffic data, you can quantify the economic benefit of getting someone from point A to B, but what you can't quantify is how a community is gonna react, and that's really unfortunate because there's lots of examples of building a road through a community that just crushes that community economically because you just built it through their cultural center; people aren't really considering that because you can't quantify it and put it into a model, and I think that's where spacial really helps give people a voice.

Daniel:
How long has your business been going? How long have you been collecting this kind of data?

Lyden:
We've been around for three years; our data goes seven years back.

Daniel:
Can you look back in time and find some examples where people have made huge blunders maybe, you know what I mean? If we go back to the idea of the road, that's something most people relate to, "Okay, we're gonna bulldoze the main street here and we're gonna increase the road size and really rip this town in half." Can you go back in your data, or would this be a possibility, to see what it was like before, what people's attitudes and what the categories looked like before and after?

Lyden:
Yeah. That's what our city and mobility partners, like Ford, are using this data for. So, they will toggle the time and look back on emblematic cities; and they use it for the positive things, too, like where do you put investment in a city that has the best effect on the city and reverberates the most, your 80 for your 20? Some of that is subjective, too. That kind of urban design is not as much of a hard science, but it's super useful to be able to see the culture change based on what you do in a city.

Daniel:
Absolutely. Yeah. I think one of the really interesting things about your data is that change over time that you could see. I imagine it would take some work to process that amount of data to see that, to visualize it; but I think that'll be incredibly interesting, especially when you got down to the block level.

Lyden:
Yeah. And I think cities are moving faster than they ever have, so it is getting really interesting to watch cities change.

Daniel:
Do you have any more really good examples of what people are using this for? I think we've done the road thing to death now ... and obviously site selection, for if I'm gonna place a petrol station here or build some kind of restaurant here, I can definitely see that ... but is there any less obviously examples you've seen your work used for?

Lyden:
Yeah. What I wouldn't have expected is consumer packaged good companies; if you're selling diapers or if you're selling soaps, that kind of stuff. I didn't expect them to find as much value in there, but still do you supply an area with the expensive diapers or do you put the cheap ones in there? Do you put organic products in a place, or do you put the traditional products? So, that's a surprising use case that I've seen.

Lyden:
There's also a marketing use cast, too. So, if you know the social categories that correlate with certain brands, you can intelligently use Facebook, or whatever, and drop ads into those areas because those types of people are already resonating with what you're doing. So, I've seen that, too.

Daniel:
That's an interesting idea, using your data in Facebook. I'm guessing you're using your data in order to get a position or find a region, "Hey, this region here, this area here, I want to target with these Facebook ads." Because Facebook has an incredible amount of data, as I'm sure you know, themselves.

Lyden:
Yeah.

Daniel:
But you're kind of skimming the surface a little bit. But compared to what those guys are collecting, and Twitter and Amazon, I mean that must be a whole different game.

Lyden:
Honestly, I haven't really made Facebook ads myself. We don't really market to individual people; more companies. But if I remember, I think they have really non-subjective-type categories that you would click, where ours are obviously psychographic in nature.

Daniel:
Would you see them as a possible threat to this business model? Because we're seeing Amazon branch out in the past and say, "Okay, now we do Cloud computing," for example. What would be stopping them from coming on and saying, "Hey, now here's an API. Now, you can drag this data out of our system," for a cost, of course?

Lyden:
Yeah, they might make something like that for advertising, so that could happen. Now, it would be very surprising to me if they got into retail site selection, and they would have a bit of a problem because they would be biased towards only Facebook data because they couldn't use other data sources; we're more of a neutral third-party analysis company. But, honestly, if more companies do this, I think it's a good thing because right now cities are being designed with not nearly the data they should be; and if we can start sparking other companies that are considering this, and getting cities to consider the social makeup as being really a serious part of the city, I think we've done our job as a company.

Daniel:
Yeah. I couldn't agree more with you there. I think we need to take a serious look at how we're designing cities. With more and more people moving into urban areas, we need to think about making these places more livable, and how are we gonna do this in the future? The rate of development, at least here than I can see in Denmark and in Europe in general, is too slow. We can't keep up. We're just putting Band Aids on the patient instead of actually coming up with a cure.

Lyden:
Yeah. I think that happens because, when it comes to some of the social and psychographic stuff, is it's too hard to quantify, and you can't show a PowerPoint that proves why you should consider the communities.

Daniel:
We talked a little bit about where this is going. But when I think of a future in terms of social media, in terms of location, I think these things are only gonna get more important. I think they are gonna be bigger and richer sources of data. When you think about geosocial as a data source, what do you think is gonna happen in the future?

Lyden:
I think you're gonna see more companies start using geosocial, whether it's our or more companies that will start up using geosocial. I think it'll be not just one company doing it; I think it'll be an industry in the future. I think, beyond that, the statistic is that 80% of the world's data is unstructured data, which means it's images or text; and right now we're making all of our decisions on the 20% that's easy to understand, structure data. So, I think you're gonna see a massive amount of innovation in this area where the technology is caught up with the theory and the value of the data. So, I think that's really gonna change the way our cities are built; it's gonna change a lot of things here in the next 10 years.

Daniel:
Yeah. Do you see social media as being a big driver of this unstructured data? I mean we're definitely creating a lot of data when we use social media; there's no question about that. But is that gonna be the biggest source of this kind of data, or do you see other sources out there than you could hook-up to?

Lyden:
I don't totally know. I think we're at the forefront of satellite imagery and computer vision, so I think that'll be pretty big as well. But, yeah, I guess I'm not totally sure what other data sources people will start using.

Daniel:
Yeah. I'm thinking about the Internet of things. You know, when we all have an Alexis or Google Home, or something like that, and the fridge talks to the whatever house, I was thinking that kind of data as well.

Lyden:
Yeah, voice data, so all the questions that people ask. Yeah, it'll be helpful data then which is ... because people love that.

Daniel:
Yeah, whether that's sociable or not. That brings me to another question. Now, we've come right down to that very personal level; we're in someone's home, and we're collecting data in there maybe, maybe that's the future. Like I said earlier, I come from New Zealand, but I live in Denmark, and we've had a whole bunch of privacy issues recently, that's been a really big thing in the media over here; because of that, we've passed certain laws in Europe that have really changed the way we think about privacy and data collection in general. Is that something that you see happening in the US anytime soon?

Lyden:
Yeah, totally. I think that any company, especially if they're academic research related, any company that is using data coming from people ... even if it's like us and we can't track it down to who said what, but we can just track it down to what was said where" ... I think those companies really need to work to lobby or put new laws in place for privacy. I think those companies need to be involved with it because, one, it's totally unfair when your privacy is invaded as a consumer; and if people and if companies take advantage of that, that data is going away, and what scares me is in a world where we don't get to use that data to design our cities, right? Because people obviously got mad that their data is being used and they didn't ask it to be. So, I think we need to be really, really careful of that, and it's definitely a passion point for me to ...

Lyden:
I can see it going either way. I could see a world where the data goes dark and we don't use any of that information, and I could see a world where our privacy is like 1984, totally invaded, and that's not good either. There's something in between that seems very right.

Daniel:
You talked a little bit before about lobbyists, and people need to get a hold of these lawmakers and explain and show up and say, "Hey, we're doing this," and be maybe more open about what they're doing and how they're doing it. I see this as being a really big problem in the future. Like we said before, we're creating more and more of this data, and it's actually really interesting talking to someone like you who wants to, to use a cliché, to use it for good instead of evil. Most of the people I think about using this data are really interested in taking advantage of someone, "How can I get in front of someone, get their attention, for cheaper?" and you're talking about doing something good with it You're talking about using this as a source, "I'm gonna use this as a way of doing good and changing the cities and making better decisions." But do you see any other real danger around here? Because if it did go dark, you're right; we stand to lose a lot.

Lyden:
Yeah. The dangers I was talking about, I could really see either way.

Lyden:
There's one thing that I find really interesting and I'd point you to. There's this guy called Alex Pentland; I don't know if you've heard of him, but he's part of MIT media lab. He's got the social physics lab, and he's doing a lot of work around data privatization while also retaining anonymity of users, so that the data still can be collected, can be used for good, but is definitely more safe, and you know what's being shared about you.

Daniel:
That's kind of what you're doing already, isn't it? You're aggregating it to such a level. I'm not 100% sure what a block is for you in terms of area, but I'm thinking just the fact that you're aggregating it out, you'll know information about that area, but it's not that detailed. You don't have, "Okay, this is Mrs. Jones," kind of detail.

Lyden:
Yeah. What it looks like from our side is we can never see that Daniel said this. What it looks like is this blog group, and here's all the conversations that have been had by this blog group, and then they're again categorized so there's another level of abstraction to where they're usable.

Daniel:
Is there any other major data source you use to enrich this data or you see your data being used with?

Lyden:
Yeah. I'll just go through a couple of them. I don't have the list up right now; it would be good if I did. We use demographic data; sometime we'll put this up against Yelp data to see how expensive the restaurants are, for example, in the area as an indicator of wealth or using wealth. Education is another thing that we're always interested in. Age, education, ethnicity, gender, household occupancy, income, urbanicity, religion. Voting is another one that we're doing now. So, any hard data to put this up against is always really interesting, and the correlations always make a whole lot of sense.

Daniel:
Hey, well, I know we're running out of time here. You've probably got a million things to do. We're recording this just before Christmas, so I can imagine that you're very busy right at the moment. But before you go, I just wanna say thank you so much for taking the time to do this interview with me; I really appreciate it. Is there somewhere we can go to learn more about what you're up to and follow along?

Lyden:
Yeah. You can go to spatial.ai. S-P-A-T-I-A-L dot A-I.

Daniel:
Awesome. Thanks so much.

Lyden:
Cool. Good talking, Daniel.

Daniel:
That's the end of another episode of the MapScaping podcast. I hope you enjoyed the show. As always, full transcripts of these podcasts are available at MapScaping.com. If you're interested in reaching out to us, you can do so on MapScaping Facebook, Instagram and Twitter.

Daniel:
Talk to you soon. Bye.