Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
post

All Of The Places In The World

About The Guest

Kyle Fowler is the senior director of engineering at Foursquare. Before joining Foursquare in 2011, he was a third-party user of Foursquare developer tools. Ever since, he has worked on different projects, in varying roles, to solve a variety of location problems.

What Is Foursquare?

Foursquare is a cloud-based location technology platform that supports building solutions based on a deep understanding of location. For the general public, Foursquare is a location-based social network that facilitates meeting up with friends and discovering new places. Users can check in to places and show their network where they are and what they are doing. On the other hand, commercial companies and organizations use Foursquare APIs and advertising enablement products to gain insights from the physical world.

Foursquare Swarm

Swarm is a life-logging app that keeps a user’s history of interaction with the world. It is a perfect diary for keeping a personal log of the places you have visited, alongside photos, and information about the friends you have been with, and the experiences you had. Logging places into Swarm can happen passively in the background or through active interaction with the app. In active logging, you have to interact with the app in every place you visit. But passive loggers can also actively build their history by periodically reviewing their history feed to confirm the places they have visited.

How to Download Your Swarm Data

One way to download your Swarm data is by requesting it through your profile. The data will include all your check-ins and any other information you logged. The other way is through the Foursquare API after signing up to be a Foursquare developer (anyone can sign up). Using an API is the best way to pull your data if you want to use it for running analysis or building your own features.

How Is Foursquare Mapping All Of The Places In The World?

Crowdsourcing is a huge contributor to Foursquare location data. But Foursquare may not have users in every corner of the world. To get a good amount of coverage the data is obtained from several other sources that include machine-generated approaches and from other companies. For completeness of coverage, Foursquare purchases licenses for regional data generated by other trusted companies that have a particular interest in a specific area.

How Does Foursquare Control the Quality of Location Data?

All the places submitted at Foursquare go through a quality audit process to ensure that the location is real. The initial screening removes low accuracy points that may introduce skewed locations.

Machine learning models are also used to review the input sources and assign a confidence score of whether a place exists. Further controls also involve sending samples of places to human moderators for review.

Foursquare understands that the negative outcomes of being wrong about a place are worse than the place not being listed at all. If people using that location information actually go there and find that it does not exist, it results in a bad user experience and a negative perception of the app.

A quality auditing of the data helps to take out all the bad sources and ensures the recorded locations are an accurate reflection of the real world.

Maximizing Accuracy with the Geosummarizer

Foursquare’s Geosummarizer is a model that analyzes the inputs for a certain POI (Point of Interest) and selects the right coordinates.

The Geosummarizer helps to solve different challenges, such as when people record slightly different coordinates for the same place they check into. This is more likely to happen in dense urban environments, especially when the user is inside a building.

To try and find the right coordinate, the Geosummarizer compares the input coordinate to the geocodes of that place’s address and infers whether the coordinate should exist within that place. The process then picks the best coordinate that has the most corroborating features with that address.

If there are multiple clusters of coordinates for a given location, the Geosummarizer tries to assign them appropriately to different geocodes. A place can have multiple geocodes, and the users will use the one appropriate for them based on the use case. For instance, an app may want a pickup and drop-off location of a POI. The Geosummarizer also outlines various levels of venue hierarchy for describing relationships of places within places, such as a gate inside an airport terminal.

Locations for mobile things like food trucks are treated separately from other categories. The summarization process for generating a new coordinate for that venue takes the latest data into account much more heavily than it would for a brick-and-mortar store that is not expected to move.

Mobile POIs can get updated in real-time – on every single check-in – as long as the coordinates are believed to be legitimate.

Foursquare’s Taxonomy of Places

Foursquare has over 1,200 categories for classifying places across the world. In an effort to make the places locally familiar, some of those categories are only visible in certain countries. For instance, one would not expect a Chinese Food category in China, or Italian restaurants to show up in Italy.

Foursquare tries to adapt the categories system to reflect the depth of expressiveness for a particular country. Categories are re-evaluated on a monthly basis to ensure they are showing up appropriately, and whether new categories should be added.

Mapping all the places in the world and making them locally familiar is a challenging task. However, Foursquare’s knowledge of the world is continually increasing. As more locations are added and their model is continuously improving, Foursquare is steadily moving towards achieving this goal.

This is the first in a series of episodes published in partnership with Foursquare, and the idea is to use it as a reference for later episodes about privacy and location data, knowledge graphs, AI, location-based marketing, and big geospatial data in the browser.


In Conversation

The Original Location-Based Social Network

Daniel: Kyle, welcome to the podcast. The first time we talked you described Foursquare as the original location-based social network. Could you introduce yourself?

Kyle: I’m Kyle Fowler, director of engineering at Foursquare. I joined in 2011, so I’ve been here quite a while. I was originally a third-party developer user of the Foursquare developer tools — really interested in the problem space of checking in and tracking my history. I went to a developer event one night and was suggested I apply for a job. I’ve been here ever since, starting as a mobile developer on the BlackBerry platform.

Daniel: Help the listeners understand what Foursquare is.

Kyle: Our mission has always been the same, though what it looks like to the outside has shifted — making the real world more usable. It started as a social network facilitating meeting up with friends and discovering new places, and that’s evolved into helping other companies get as much value out of the real world as possible, whether through places data, our API products, or our advertising enablement products.

Swarm and Life-Logging

Daniel: Tell me more about the Swarm app.

Kyle: Swarm is our check-in app — the term we use is life-logging. It’s built around tracking your history of interaction with the world. Since I’ve been an employee so long, I can go back in my own profile and search everywhere I’ve been since 2010 with high accuracy. It’s my diary. We support both active logging — checking in at places you care about — and our in-app Pilgrim technology, which uses background location to suggest places you’ve been. You can go in once a week, see your history feed of suggested places, and confirm or deny them.

Daniel: Can I download the data I’m producing through Swarm?

Kyle: Yes, two ways. Through a GDPR-type access request on your profile, which gives you a large dump of all your check-ins. Or through our API — anyone can sign up to be a Foursquare developer, authenticate with OAuth, and pull all your check-ins to run analysis or build your own features. As a side project I built a visual scrapbook out of my wife’s Foursquare data.

Knowing All the Places in the World

Daniel: Why is this not as simple as it sounds?

Kyle: The first problem, core to Foursquare’s business, is: what are the places in the world? Just because you open your phone doesn’t mean an app knows what’s around you. Knowing the corpus of places — and then, for a given set of coordinates, which places are most relevant for a user to say they’re at — is a significant data problem, especially in cities with GPS drift and signals bouncing off tall buildings. To get good coverage you pull from everywhere you can: crowdsourcing is a huge contributor, but you also use machine-generated approaches, web crawls, and licensed regional data from trusted companies. Then the problem becomes how to take a thousand different sources and synthesize a single output places dataset.

Controlling Data Quality

Daniel: How do you know a place is real and not just made up?

Kyle: The best approach is real humans verifying that something exists. We have a quality audit process that sends samples of places to human moderators who can say yes, this is real, or no, it’s not. Behind that labeled training data, we have machine learning models that give confidence scores on whether a place from a source is real, and identify bad sources we need to down-weight or discard. We have a model called our reality score with a lot of features feeding into it — photos added, tips left, check-ins, clicks in the app while searching. Foursquare understands that being wrong about a place is worse than not having it: if we send a user to a restaurant that doesn’t exist, that’s a really bad outcome.

The Geosummarizer and Multiple Coordinates

Daniel: Everyone checking into a place gives a slightly different set of coordinates. How do you find the right one?

Kyle: That’s where machine learning comes in — we have a process called the Geosummarizer. It uses building polygons and roads: if the input coordinates from a source say a place is in the middle of a forest, not where it should be according to the roads and buildings of that address, we can discard those, or cluster and average them. In dense urban environments the geocode of an address may conflict with what a device sees inside a building, so we have what we call the “phone’s-eye view” of the world and the “map coordinate view,” and apply them depending on context.

Daniel: So you end up with multiple coordinates for one place — like a stadium with many entrances?

Kyle: Yes — a given place can have multiple geocodes: a rooftop, a popular location, a drop-off location, a main entrance. We notice multiple clusters and assign them to different geocodes. A ride-hailing app would want the drop-off location, which may be different from the check-in location where a device usually sees a user inside a building.

Places Within Places, and Mobile POIs

Kyle: Something somewhat unique to Foursquare is the hierarchy of places — places are not independent of other places. A mall is a venue, but there’s also a contained shape for the Foot Locker inside the mall. We can have up to six levels of venue hierarchy — an airport gate inside a terminal inside a concourse inside the whole airport.

Daniel: What about things like food trucks and pop-up stores?

Kyle: Those are the most challenging. Pop-ups may only exist for two days, so you need real-time data. For food trucks, based on category we treat them separately — our summarization process takes recency into account much more heavily, so a food truck can get updated on every single check-in. That’s good for the app use case, but when we deliver data nightly or monthly to commercial customers, those locations aren’t all that useful. We want our product to be flexible to meet the needs of all our use cases — the API product is the most real-time, while commercial data products have various cuts for quality, category, or chain.

Making Places Feel Local

Daniel: How do you make these things feel local — for example, they don’t call it “Chinese food” in China?

Kyle: Our category taxonomy has a little over 1,200 categories, and some are only visible in certain countries. In the China market we’d likely not show a “Chinese food” category, but we’d have Sichuan or Hong Kong–style restaurant categories. We re-evaluate categories monthly, and we have translations for place names, categories, and chains in 12 languages. The hardest places to get data for are regional chains — there might be a small ten-store chain near me, and it would be great if Foursquare knew it was a chain with a specific website, but there are thousands and thousands of those all over the world. The places we have the most data for are broad chains with store locators on their websites, and the places people go the most — airports, train stations, and Disney World.

Validating Data vs Getting New Data

Daniel: Is it more important to validate the data you already have, or to get new data?

Kyle: I’d take the easy way out and say both, but the negative outcomes of being wrong about a place we have are worse than not having a place at all. If someone wants to get burgers and we send them to a place that doesn’t exist, that’s a really bad outcome for the user and for their perception of us. If we don’t have a restaurant that opened a week ago, there’s more acceptance — this problem is hard, and we haven’t gotten it yet. So making sure the places we have are an accurate reflection of the real world is probably the most important thing to us.

About the Author
I'm Daniel O'Donohue, the voice and creator behind The MapScaping Podcast ( A podcast for the geospatial community ). With a professional background as a geospatial specialist, I've spent years harnessing the power of spatial to unravel the complexities of our world, one layer at a time.