Paul Ramsey is the co-founder of the PostGIS extension — a spatial extension for the PostgreSQL database.
Paul is an open-source software developer — he works on software that is free to download, free to use, and free to remix with other software.
In this article, he enlightens us on GDAL, the geographic data abstraction library, what it is, and its business model.
That’s a more profound question than you’d think.
GDAL stands for Geospatial Data Abstraction Library.
You will find people pronouncing it variably as gee-dal, or goo-dle. I come back to goo-dle frequently because it trips off the tongue a little more easily than gee-dal.
Goo-dle was the original pronunciation. The founder of the GDAL project pronounced it goo-dle for many years.
When he stopped being the maintainer of the project, he took a job with Google and found it difficult to say he worked at Google and was the founder of the GDAL project.
GDAL is data plumbing, a bit like an international electrical plug set for traveling — it’s got multiple different shaped plugs.
Electricity is “just” electrons moving around. But they can move around as DC, AC, 120 volts or 240 volts. Plus, there are all these different ways you can plug and join electrical things.
At the core, electricity is electrons vibrating, but it can be quite complex to get your hair dryer spinning.
In the same way, raster is a bunch of pixels. Each pixel has a value. But there are hundreds of ways to arrange that data on a disk, in files for computers, and to manipulate those pixels to work with as a GIS professional.
Library is a software term of art.
When programmers write functionality that’s likely to be usable in different programs, they extract it from one program and put it into what’s called a library. Multiple programs can use that same library to take advantage of that functionality.
GDAL is built primarily as a library and ships with several command-line tools, which allow you, the user, to directly use that library.
Because it’s the library, it also shows up in other places.
The idea of data abstraction comes out in other forms of GIS data. The other most prominent form of GIS data is vector data. There is a subsection of GDAL called OGR (pronounced oh-gee-ar or ogr).
It’s used like GDAL is used on the raster side, but for plugging together different vector formats, and doing transforms on that data.
Most of the links on that page are the source code. The expectation is that GDAL is going to be consumed by end-users indirectly.
They will download a third-party build that someone has made for their platform of choice — the Windows build link is to a Conda windows distribution.
Or they’ll be getting it through an operating system package manager. If they’re on Mac OS, the easiest way to get GDAL is to use Homebrew, or MacPorts.
For Linux, the easiest way to get it is to pull it down from your Linux package manager — not to pull the source code and build it directly.
If you do get it by pulling down the Windows, Mac OS, or Linux build, what you’ll get is a library that you won’t use directly because it’s just software. You’ll get an enormous pile of command-line tools.
Out of those tools, the ones you’re most likely to use are GDAL Translate, a program for the format translation problem and GDAL warp, a tool for raster quartet re-projection.
Other tools would be a contouring tool that takes Digital Elevation Models and generates contours, or slope and aspect calculations.
95% of people use the GDAL translating formats function and the warping re-projection and re-sampling function.
The command-line tools just do… what they do — which is not necessarily the only way you can use them. But that’s what comes in the standard GDAL download box.
OGR is a popular format, and with GDAL, you can go from one OGR format to any other OGR format.
Go from shapefile to geodatabase, Oracle to PostgreSQL, or from GeoJson to CSV, or any other combination.
GDAL_Translate does the same thing for raster formats but doesn’t have that cool numeral two form to it.
Over160 in the raster domain.
If you’re a GIS professional, you’ve heard of JPEG 2000, MrSID, GeoTIFF, or USGS DM as formats.
If you’re a cloud developer doing something on the cloud, you’ve heard of COGs, the cloud-optimized GeoTIFF format.
If you’re in data science or environmental science, you may have used HDF5 or NetCDF.
You may or may not have heard of GRIB, which is used for some obscure weather data.
If you’re as old, or maybe older than me, you might remember NITF or STTS — they’re USGS and the US government formats, largely defunct now.
It’s a lot of functions. Some date back to the dawn of GIS formats and some are brand new.
GDAL is written in C++ and ships with a C API.
Those are small implementation details. But the existence of a C API means it’s bindable to many languages.
It turns out that having a C API is the lowest common denominator, orlingua franca, to get your functionality to almost any language.
Because it’s C, you can call GDAL from C or C++.
Other compiled languages, the new ones like Rust or GO happily call into C. You can build your Rust or GO program and call them into GDAL, no problem.
Because it’s a C or C++ binding, virtual machine languages like .NET Java, Node.js also have bindings.
It really is alingua franca.
In the wild, unsurprisingly, you’ll find it in QGIS. Open-source desktop GIS uses the open-source format translation library. When you do things like import raster into QGIS, that import is coming in via the GDAL library abstraction.
When you publish rasters from GeoServer using non-standard formats, they’ll be pulling it through GDAL when you look at rasters.
Inside Google Earth, you’ll find, lo-and-behold, there is the GDAL library.
Same thing in ArcGIS and FME feature manipulation engine — the raster reader for FME is based on GDAL.
Almost all the “Add raster options” and GIS programs end up being backed by GDAL.
It’s not entirely surprising. The value proposition is being able to add support for one library, do the programming at that GDAL library, and immediately have support for 160 formats.
If you’re writing a piece of software, this is a huge value proposition. The alternatives are terrible.
Are you going to do an in-house implementation of every format of customer requests?
The largest GIS company, ESRI, did the value proposition calculation. It decided it made more sense to tie into GDAL than to write their own internal format translators for every format.
If such a large company thinks that GDAL’s value proposition makes sense, so they have access to truckloads of programmers, that means it makes no sense for a smaller company to write their own formats.
The very existence of GDAL is like a black hole now that sucks everybody into it. It makes little sense to do a single end-to-end point format conversion when you can make the end be GDAL, and you get 160 point format converter.
It’s a no-brainer.
Cloud computing is a setup where a single large company puts thousands of servers on racks in data centers and rents out access to those servers. Increasingly, those companies would rent access not only to servers but also to a piece of software running on those servers.
Economically that only makes sense when there is no software cost as these cloud companies expand.
When you spin up a server in AWS, or in GDAL Cloud, 100% of the time, it’s a server running the open-source Linux operating system.
Because it would cost way too much to license proprietary operating systems for all these thousands of servers they’re running.
The same thing is true for what’s happening in the world of raster imagery collection and processing.
One of the biggest step changes in our industry over the last 10 years has been that we’ve gone from having a few dozen sensors flying around the world in orbits to having hundreds of sensors flying around the globe in orbits and dumping back data continuously.
We’ve gone from having a few aerial photography companies to people flying aerial drones constantly.
A vast stream of raster imagery data is rolling into the world and into cloud data centers.
If you look for Landsat 8 mission results, you’ll find that they’re no longer stored on government servers at the EROS data center.
They’re stored on AWS cloud — they’re rolled up there more or less in real-time as they show up on the downlink. They get processed and pushed up to AWS.
There are two aspects to the processing: one, the processing before it’s pushed up to AWS, which is primarily done by GDAL. Two, once that is up there in AWS, the access to the raw data is also done via GDAL.
The data is being stuffed up on these clouds in what’s called the cloud-optimized GeoTIFF format, one that GDAL happily reads.
It’s now possible to spin up a computing server on AWS pointed at the cloud buckets versus maintaining these vast corpora of data, run the processing to pull out the information you need re-projected, do processing on the pixels, and pull it down to your system.
All that work is being done with GDAL, and it only makes sense because the piece you’re using to do it is open-source and freely available, so you can afford to run thousands of copies of GDAL to process all that imagery.
This is what Planet Labs does. It is the poster child of having an unimaginably massive image firehose because they have hundreds of little satellites spinning around, collecting imagery for the entire globe every day.
The data never leaves the cloud.
It arrives in the cloud as soon as it comes off the sensor and off the downlink. Planet does not have any servers in its own closet; it exclusively uses the cloud. Each step of the processing -- from raw image off the sensor to orthorectified and corrected -- to color balance the mosaic goes through the GDAL API in different ways.
Some use the built-in functionality of GDAL. Planet uses functionality they wrote themselves that is pushing to the GDAL access API to touch the format and going through the API.
That only makes sense because they can spin a thousand copies of GDAL because it’s free and open source.
If you took away GDAL right now, not only would Planet stop running and LANDSAT would stop moving up to the cloud, but Mapbox could no longer run all their processing to produce their imagery mosaics.
For Google, it’s the same processing chains for their image mosaics; Google also uses it to feed their Earth Engine. Microsoft uses it extensively in their new planetary computer project.
In the defense-intel space, MAXAR pulls down data, and it gets pushed through a GDAL processing pipeline.
All the newer tools looking at things like synthetic aperture radar use this abstraction library — you take it away. Everything stops.
Because it’s free and open-source software, that’s not something that could ever happen.
But as is true of other critical infrastructure, if you don’t keep it up, it inevitably degrades.
If you stop maintaining the bridge, it will eventually be unsafe to drive across. It seems like a strange thing to say about a digital good that it will degrade. GDAL is not strictly subject to wind and weather.
But it is subject to constantly changing context. The GDAL of 15 or 20 years ago that you can download may or may not compile on your current machine. It would undoubtedly have a lot of bugs that have been uncovered in the last 20 years. It will not run as fast as a modern version of GDAL that understands modern processors.
All that extra effort is stuff that happened over the years because of maintenance of the codebase.
Yes, it’s critical infrastructure. If it went away, the world would stop.
It will not go away, but if we don’t maintain it as we maintain our other critical infrastructure, it will gradually degrade, and quality will go down.
The historical business model of GDAL has been that there is one singular maintainer.
The original maintainer, the founder of GDAL, is Frank Warmerdam. He did it for over a decade.
He left for a different job, working for Google. He now works for Planet, a major user of GDAL, and continues to contribute to the project.
The new singular maintainer since then is Even Rouault.
The way that his business works is that he makes his money by making service contracts to people in the GDAL community, companies or governments, adding a format or features.
He gets paid for adding those formats and features — that’s how he ends up with 160 formats in the raster space, native formats in the vector space, and other cool features.
He also does all the other work of the project. Fixing reported bugs, integrating third-party contributions, bringing code to the code quality of the rest of the project, answering questions on the mailing list, ensuring code quality is high over the entire codebase for the 160 formats sometimes done by many people.
He does security work — one thing about maintenance is a constant context switch. As platforms change, as static code analysis comes in, people find new bugs, missed recent security problems, documentation issues.
He makes sure that people can understand how to use it and how to contribute to it.
All that work is unpaid. Done for free.
As a loss leader, it’s a way of demonstrating he’s the person to whom contracts should flow because he does all this other work.
The good news is that the model has worked for the last 20 years. Still, the amount of effort required of the maintainer, balanced against the amount of income it generates, is not tremendous.
Programmers of the quality of the people who have been maintaining GDAL for the last 20 years can make a good deal of money for a good deal less effort in different jobs in the private sector.
Having a maintainer model based on the goodwill and the love of the game of particular individuals is not the most risk-free model for the organizations and entities that depend on reliability, quality, and goodwill over the long term.
The first maintainer eventually tired of it, burnt out, and left to a newer, better-paying job.
If we lose the second maintainer, there probably won’t be an obvious immediate replacement.
If you look at the use of GDAL, it’s clear that most of the value is captured by the users and by big institutional users. They get an enormous value in having a zero-cost tool that handles a tough problem.
The only people who pay for the maintenance so far, in the current model, is a narrow band of governments and companies who pay for new formats and new things.
The maintainer model is clearly not long-term sustainable, so the GDAL model is changing.
If you go to GDAL.org/sponsors, you see that many, not all of the cloud providers, have now committed to a multi-year maintainer sponsorship.
Those dollars will be managed by the GDAL Project Steering Committee, which comprises current and past maintainers, and major contributors to the project over the years to grow the number of maintainers and make sure they’re paid for their time for maintenance activities.
The maintenance duties should be separate from the adding of features. They should be given the same priority — they are not the redheaded stepchild of the project.
This way, the maintainers will also be able to use this extra time to incubate new developers and new people who can come into that maintainer role. The goal is to have people doing this core work in the plural — not one, not two, but three, four, or even half dozen.
The project will be stronger over the long term if we do that. People in organizations and large entities getting the value will be insulated from the existential risk of having the project go unmaintained.
There is no gate on the access to some other functionality.
Freemium is another word for shareware. In some respects, they use a free price point to get people to try the software and hopefully incentivize them to pay for the full-featured version.
GDAL comes with the full-featured version. It’s also set up so as folks get more involved in it, they can contribute to it and put their oar in.
Freemium software or service models, or old-fashioned shareware, weren’t art. They are corporate entities, using free as a distribution model, not providing free access to software and source, and collaboration.
There’s a difference between doing open-source software and what was formerly called freeware.
Freeware was put out there to be spread around and be used widely. Open-source software is put out there to spread around and be worked on together. It’s this togetherness aspect that defines open source.
So it’s not like freemium.
The closest thing you can come to with this model, and it’s the one we’ve laid out, is critical infrastructure.
The software becomes widely used, and the funding model has to recognize that it’s the roads upon which everyone drives. The people who are getting the value out of those roads need to up some money for it. We haven’t gotten so far as to levy an open-source tax that everyone has to pay.
It’s still very much a matter of enlightened self-interest, but thereis self-interest involved. We need to bring enlightenment to the people who have self-interest and let them know about it to exercise their self-interest muscles on something that gives them many projects they depend on so highly.
The solution is to make sure the people who have a self-interest in this project are enlightened about the fact they use it.
There is a disconnect between the people who control the budgets and the people who control the computers.
If you’re a GDAL user and you want to help the project, the number one thing you can do is tell your boss you use it and do that frequently.
Tell her how you solved a problem by setting up GDAL. Tell her that the script that does the magic is running GDAL under the covers.
Many who use GDAL understand how useful it is and see the importance of keeping it maintained and useful.
But the Venn diagram of those people and the people who control budgets — those two circles overlap little.
It’s really contingent that the people who recognize the value communicate it to the budget holding decision-makers.
The people who hold the budget know there is value in the ESRI software used in their organization. They know it because they get sent an invoice every year and they have to pay it.
This must be worth a few thousand a year because I constantly see this invoice. If I stop paying it, the GIS techs will come up and say, “but we need the software.”
The loop of being told they need it and then having the dollar value assigned and having to cut the check is missing in the open-source software realm.
If the users do not communicate to the budget allocators that they’re getting value… budget allocators, very reasonably, will have no idea, and bad things will happen.
It’s 100% a communication problem.
If your use is large enough, tell your boss and your company they should sponsor.
Do not be silent.
Go to GDAL.org/sponsors where there are different sponsorship levels. Not everyone has or should sponsor because not everyone derives thousands of dollars in annual value from GDAL. But many organizations do, and they should probably think about making sure that the tools they use are supported in the long term so that they’re there in the future.
There are two aspects of open source that are not consistent with that.
If you have to pay, say, for an online course, then one thing course providers notice is that once they put the dollar cost above zero, the amount of people coming in the door drops dramatically.
Open-source has achieved ubiquity because of being both free from a monetary perspective and free from an intellectual property perspective to use and reuse and rework.
You can’t get a “little bit” proprietary. Once you put up those doors, you take away the things that give open-source those superpowers.
I think we have to forge a fresh path.
Tell your boss about it. If you are using it in a cloud environment and using it a lot, tell your sales rep.
Also, there are many hot CPUs out there in the cloud world that people pay good money for. The software that makes the CPUs desirable and generates revenue for the cloud companies is… GDAL.
But cloud companies don’t know that — they just see it using their compute and say, “Good news for us.”
They don’t recognize they’re being paid because GDAL exists.
I think the number one misunderstanding relates to scope and scale.
Everyone approaches GDAL from their own particular needs. Those needs can be quite narrow compared to the range of use.
I’ve written several hundred blog posts. The number one blog post on my site, year on year, is always “GeoTiff compression for dummies,” which is a walkthrough on five different compression options people can use on aerial or satellite imagery when writing out GeoTIFF files using GDAL.
But it’s the thing people consistently come back to repeatedly.
It would be easy for folks to think that GDAL is simply a translation library full stop.
It’s not. It’s an access library.
You can build apps that don’t care what the formats are. Your imagery doesn’t have to be local to the machine — the processing on it can be off in the cloud, and it doesn’t have to be on a file system.
That’s a quantum leap in functionality compared to just five years ago. But people don’t know that it exists or even how to use it.
It’s only the tip of the iceberg, and there is a huge chunk of stuff under the water that most people don’t know exists.
If you’re doing spatial programming, you probably already understand the value of GDAL.
If you’re a non-programmer, it’s still well worth checking it out. It ships with a bunch of incredibly powerful command-line tools which you can just download.
They just work. There’s nothing else to install.
Look through the documentation and understand what these tools can do for you if nothing else.
The story of GDAL is divided into two pieces.
There is the functionality side of it.
What can it do? How can we get access to these different tools? Where can we use them? Where is it being used today? What can it do for me? What problems can it solve?
Then we have the story of a piece of critical geospatial infrastructure and the business model behind that.
The functionality side of the conversation is much easier for us to understand and grasp because we can immediately see how this could benefit us.
The story of GDAL as infrastructure and the business model, which ensures that infrastructure is maintained when we need it, is a different story.
I hope you’re not hearing the story of a starving artist begging for recognition, but the story of a community working to develop and support a piece of critical geospatial infrastructure.
I hope you heard the story of the same community generously insisting the work continue so that we continue to benefit from decades of software development.
Once we understand that this is not a story about any of us, but a story about all of us, the question, at least for me, is,
How do we help support it?
If you work for an organization that derives significant value from using these tools, tell them, let them know that there’s an opportunity to support the tool, the community, and help them understand the value of the software product.
Maybe you need to tell them a story about why this is the right thing to do. Perhaps a story about the cheapest insurance policy they will ever buy or cost-benefit.
You pay more than nothing, but you get significantly more than what you pay for.
If you don’t work for such an organization and cannot influence purchasing decisions, spending habits, perhaps the next best thing you can do is celebrate the people who have contributed.
Go to gdal.org/sponsors, and you can say thank you to MAXAR, Microsoft, Planet Labs, Safe Software, Google, Esri, Spark Geo, and MapGEARS.
Thank you for contributing and giving back. Thank you for making things better for all of us.
Be sure to subscribe to our podcast for weekly episodes that connect the geospatial community.
For more exclusive content, join our email. No spam! Just insightful content about the geospatial industry.
To put it simply, point clouds are a collection of XYZ points that represent some real world object of nearly any scale.They can be generated in a few ways. As geospatial scientists, we mostly work with LAS/LAZ data collected by aerial LiDAR (light detection and ranging) scanners at varying scales, from landscapes, down to project sites. We may also derive point clouds from highly detailed orthoimagery of an area, such as from the products of a drone flight.
As a data scientist, you don’t just go in and solve problems. You make recommendations to multi-faceted issues so that you get a fantastic model in the end. You’ll also be advocating a better use and understanding of the data while you do that.