Jennings Anderson is a geoinformation research scientist who’s spent the last seven years looking into the evolution of OpenStreetMap (OSM).
A while ago, OSM passed a significant milestone — the 100 millionth changeset was uploaded to it.
This milestone represents a collective contribution of nearly 1 billion features globally in the past 16 plus years by a diverse community of over 1.5 million mappers.
I was first introduced to it in 2014, at the beginning of graduate school at the University of Colorado, Boulder.
I was looking at humanitarian mapping in OSM. This was when HOT (The Humanitarian OpenStreetMap Team) introduced v2 of their Tasking Manager, which helps people coordinate humanitarian mapping in OSM.
Humanitarian mapping is when hundreds, or thousands, of mappers converge on the map and create data in response to humanitarian relief. I dove headfirst into the analysis of it at the time, for instance, looking at the response to the 2010 Haiti earthquake, and Typhoon Haiyan, in the Philippines in 2013.
In doing so, I learned several tools available for working with OSM data to make a map. But for data analysis, or explicitly looking at the complete historical data set of OSM, there were far fewer tools.
It was then that I discovered OSM is so much more than a map and a database — it’s a global community of mappers.
How OSM developed goes much deeper than just looking at the features on the map, such as the roads or the buildings — it’s about the community. I take a contributor-centric approach to my OSM analysis, as opposed to a data-centric approach.
This includes questions from how many mappers work the map, when they do that, who they are, and what type of changes they make, instead of questions about how much data exists or does not exist in that region.
With OSM data quality, it’s not necessarily a question of the map of an area being good enough, but it’swhen the map of an area will become good enough for whatever use case one may have.
With thousands of mappers making millions of edits every day, the map is only moving in one direction over time — improving.
I’ve also had the luck of good timing — many great folks worldwide have wondered about similar questions as I did for OSM data quality.
These different people have been developing approaches to the same questions in parallel and then finally met at conferences to discuss the challenges and successes.
Six years of research, collaboration, and community efforts resulted in my Ph.D. on “Contributor-centric analytics for OpenStreetMap approaches to full-stack metadata-driven analysis infrastructure for an open geospatial data platform.”
I’ve been fortunate to have collaborated and continue to collaborate with companies like Mapbox, Development Seed, Facebook, the YouthMappers organization supporting thousands of student mappers all around the globe, researchers at the Heidelberg Institute for Geoinformation Technology, as well as other academic researchers in this growing domain of OSM data analysis.
OSM has been around since 2004.
A guy named Steve Coast started it as a substitute for official, hard to get spatial data. He was a student at the University College London. The project took off. People came online and contributed local knowledge or any other spatial data they may have and known about.
Over 1.6 million contributors have edited the map to date, which is an awe-inspiring number.
OSM is more than just an online community.
It’s a community of communities, as Patricia Solis describes this term in her work.
There are the humanitarian mappers, people who contribute data from a humanitarian perspective. There are the hobbyists who’ve been around for years contributing particular niche knowledge here and there. And now there are the corporate actors who are contributing data concerning their corporate interest and their data use cases.
Plus, there are the local communities all around the world, the local chapters.
It’s impressive how OSM is a bit of everyone on the map.
In OSM, there are three types of elements:
Nodes are points, such as a drinking fountain or a statue.
Ways are a line, a collection of points, such as a road or a building.
Relations can work directly or be a collection of nodes, ways, and more relations, such as the collection of all the ways that make up a country outline.
An edit to any of those pieces is considered a change to the database, and it’s called an edit.
A changeset is the logical grouping of all the edits.
Several tools are available for editing, the most popular among newcomers being the main website’s iD editor.
When someone performs edits, they change the OSM elements so that when they press save; they submit a changeset. The changeset has metadata with it — the time it was created, who the user was, who made these changes, plus hashtags and comments to let other mappers know why that edit was made, or what imagery they used to make it.
That helps analysts further down the road know what was happening in the map when that edit occurred.
According to best practices, you should map what the physical world represents.
In the best case, you map something and name it so people can go to that place in the real world and see that something is mapped in the right place, and the name and information are accurate.
There’s no problem updating a piece of information that someone added to the map to reflect reality or update it if something was different.
For reaching a consensus, sometimes you can end up being caught in a back and forth of ”edit wars” if someone disagreed on how something should be mapped.
Those cases get elevated to the Data Working Group to oversee those conflicts and find a resolution.
There are millions of edits happening every day with minimal conflict around what is truth.
A: I’m in India, and I see this border the way I understand it.
B: I’m in China, and I see the same border differently.
Such issues are hashed out and decided on via discussions, a mailing list, and a wiki. A recent conversation was on borders in Ukraine and Crimea. The community came to a consensus, based on those discussions.
If things are problematic, the Data Working Group makes the final call — they are people chosen by the community, and it’s not a single person’s decision.
OSM is free and openly editable. The data is ODBL (Open Database License), so anyone can take it and use it for any purpose as long as they provide attribution to the contributors.
Corporations look for spatial data of the world to put into their products and their maps. To them, OSM is a fantastic data set to use without paying a commercial provider.
More than that, in many parts of the world, OSM is the highest quality data because it’s the only digital spatial information source.
A number of companies have begun consuming OSM data and put it into their products. They also employ teams of editors to ensure quality. As they come across arrows in the map, they don’t just change their own map but make the changes upstream in the main database, too.
Everyone benefits from that.
Companies like Apple, Facebook, and Microsoft use some of this data. If they each have a team editing a road or adding a building, the community gets three buildings on the map instead of working individually.
Facebook is now consuming OSM data to power the maps in their products. Their goal with OSM is to make sure that OSM is the highest quality, most complete map data source they can pull to give their users the most complete experience.
The same goes for Microsoft, Uber, Lyft, or Apple. They’re choosing to use OSM instead of paying a company like Google for the data.
All these companies collaborate and create something that rivals their biggest competitor.
I can see the network effect taking over.
Corporate editing picked up between 2015 and 2018, so research was done in 2019 to address concerns.
Some concerns were valid, and there were anecdotes here and there to validate them, but ultimately, the database is vast, and these corporate teams work all over the globe, so they’re not a concern.
By introducing RapiD Editor, Facebook incorporated machine learning to help turbocharge mapping efforts. That was heavily used to map most of the road networks of Thailand and Indonesia.
They collaborated with local mappers and addressed the conflicts arising from their corporate presence there.
A lot of discussions are happening whether this is a corporate takeover.
The OSM survey that just went out asked people if they’re concerned about these types of issues. People are skeptical and aware, with good reason.
But I don’t know that the evidence supports that narrative yet.
It’s important to continue to be aware and study what’s happening to understand the impact.
That’s what we’re trying to do.
The Amazon Logistics Team edits primarily in North America and the UK.
They edit driveways to use for their delivery network so they can be more efficient.
Companies like that focus on a specific type of data.
A company called Digital Egypt is creating an incredible amount of data in OSM for Egypt.
They map building addresses. Last I looked, Cairo might be one of the most completely mapped cities for building addresses — making that data valuable and usable there.
Different companies focus on different pieces, and how those interact with the local community vary.
There wasn’t anything on the map for rural areas in Thailand before Facebook mapped the road infrastructure.
Now that the road network is there, other people can show up and continue the mapping and fill in the buildings.
There are nuances into how the map fills up with this corporate influence.
People have raised their concerns about AI-driven mapping.
Facebook’s RapiD Editor was a fork of the popular iD editor allowing a user, as they look at aerial imagery, to see a — potential — road that has been determined through AI and machine vision.
Instead of the mapper drawing that road, the computer drew it in for them. The mapper can click and confirm it is a road or rejects it because it’s not correct.
The data automatically goes in, but there’s still a human in charge who clicks submit.
The Rapid Editor now includes a full layer of all Microsoft buildings. When you look at them in an area, you can choose “Show me the Microsoft buildings.”
When you map a town, you can choose to see all the buildings AI thought were there, accept or reject what you find, and submit a changeset.
This speeds up the mapping process by not having to trace out those buildings individually. But it’s not automatic ingestion of that data or an import. Imports have their own different regulations and best practices around them.
That’s what a tool like RapiD Editor does.
Speaking of tools, Mapbox has been building tools for years around OSM.
A popular one is OSMCha (OSM Changeset Analyzer), which allows users to see what’s been happening and tag things as incorrect or potential vandalism.
Communities, corporations, and organizations have been contributing tools to the map for a long time, as well as data and imagery.
I don’t know if we’d even have OSM in its current form if companies like Yahoo and Bing hadn’t made aerial imagery openly available as early as 2007 so that people had something to digitize and to grow that map.
There have been some high-profile cases that made the news. Those have been very unfortunate.
Overall, though, explicit vandalism is far less common than accidental vandalism or poor quality.
One example would be Pokemon Go related. It was found that Niantic used OSM data for how they produced some of the Pokemons people collected.
Users went in and drew ponds in OSM in their backyards so that more of these waterborne Pokemon would spawn in their backyard for them to catch.
Obviously, that’s incorrect. There wasn’t a water feature in their backyard. That became vandalism of the map by making an inaccurate map.
But people figured that and quickly said, “Don’t do this. That’s not what this project is for.”
Some egregious cases occurred, for example, where someone changed the name of New York to an ethnic slur. That was picked up by some major media companies. That didn’t do any favors for the project.
But most importantly, the edit was reverted, changed, and fixed by someone in the community within hours of it happening.
That speaks volumes about the quality of the community and the response.
Yes, there are a lot of feel-good aspects to the project.
But there are challenges — gender and race diversity, and representation in the map is a tremendous problem. The Board needs to address it. They’ve already taken the first steps in addressing some of these issues in representation.
OSM is definitely a Eurocentric project in that it’s where it started and where it took root.
The largest community is in Germany, and then the United States.
If there is Eurocentric domination and the community’s makeup is such, is that reflected in what gets mapped?
I don’t think that’s necessarily true, but, obviously, creating a more diverse mapping community around the world will create better opportunities for local data, and it’s indeed happening.
Look at the state of the map. In Africa, for example, they have an annual conference now with a focus on the mapping that’s happening on the African continent by local communities and local chapters.
This is great to see, but to say there’s an equal representation within the community is inaccurate.
We need to be constantly striving to get closer to equality.
At the moment, the biggest competitor for data sources would be Google Maps.
But the rate at which OSM is growing and its price point being free will always be able to rival that.
I don’t know of another open geographic data project that would threaten it. If we change the way we use and consume spatial data, that could change things, but I see nothing on the horizon around that.
We just need spatial data for the entire world.
Free can be a lot of different things — it can be here today, gone tomorrow, unreliable, or not a standard product.
For the project’s longevity, the Board’s most important role is to secure the infrastructure and come up with forward-thinking next steps to keep the project alive.
I’m not worried about the here today, gone tomorrow aspect. There are a lot of great people working to make sure that OSM stays alive.
For the trustworthiness of OSM, there are a lot of projects, such as OSMCha, working on validating the data.
The map represents what people put into it. It could be vulnerable to more significant quality issues. But overall, if it’s between having no data or having the OSM data, having it always wins.
OSM is not a product by design.
It’s a collection of data intending to be the most truthful, complete, and accurate collection of open geospatial data out there for anyone to consume and build a product on top of.
It’s a map of the world, made by the world.
It’s a high-quality, open, and free source of geographic information that the Board needs to protect.
This project’s success is in the number of contributors. That’s growing daily, so people can do something revolutionary and exciting with it.
Be sure to subscribe to our podcast for weekly episodes that connect the geospatial community.
For more exclusive content, join our email. No spam! Just insightful content about the geospatial industry.
To put it simply, point clouds are a collection of XYZ points that represent some real world object of nearly any scale.They can be generated in a few ways. As geospatial scientists, we mostly work with LAS/LAZ data collected by aerial LiDAR (light detection and ranging) scanners at varying scales, from landscapes, down to project sites. We may also derive point clouds from highly detailed orthoimagery of an area, such as from the products of a drone flight.
As a data scientist, you don’t just go in and solve problems. You make recommendations to multi-faceted issues so that you get a fantastic model in the end. You’ll also be advocating a better use and understanding of the data while you do that.