John Bryant the founder of Mammothgeospatial introduces us to the power of an opensource SQL database called PostgreSQL. We talk about some advantages that this database has over flat files and why you might want to invest time and energy into learning more about relational databases and the Structured Query Language “SQL”
What is PostgreSQL/PostGIS?
PostgreSQL is an open-source relational database management system, which means you can download and use it for free. PostGIS is a database extension that is run alongside PostgreSQL and allows you to do spatial operations, store data, geometries, build spatial indexes and more.
What are the advantages of using a relational database like PostgreSQL?
While working with folders full of spatial data files is a perfectly valid way of managing spatial data, it has its limitations. For example, a file format as the shapefile has its own limitations in terms of file size and column names are limited to a maximum of ten characters. You can do a lot more with a spatial database than with a folder of shapefiles, and when the complexity of a file system grows, you may find yourself in a situation where you want to start looking for alternatives. A spatial database is a natural progression in this regard.
Relational databases also offer multi-user functionality, so that multiple users can connect to a database and edit the data at the same time. Having a centralized data repository means that all users in an organization can work with the same data, that will always be the most up-to-date version of it. In terms of data management, this will save everyone a lot of time and force people to adhere to the data structure or data schema.
What are PostgreSQL views?
A view is the output after running an SQL statement on the data in one or more database tables. That output can be used as a virtual database table, which offers interesting use cases. For example, it is possible to create multiple views from one and the same dataset and add them to a desktop GIS environment. Another possibility is to add attributes from different database tables that share a spatial relationship with each other through a view.
The end user does not have to understand the underlying queries or logic that is needed to create these views, because the output of the database queries can be added to a map and shared within an organization directly. This way, the complexity of relational databases is hidden from the end user.
What is a trigger in PostgreSQL and how can it be used?
A trigger is a custom function that is executed when a database event takes place, for example inserting new rows in a table. Rather than just inserting that new data, a trigger can manipulate that data before it is being inserted, for example perform data validation to make sure that data is correct.
Such self-defined functions can replace multiple individual coding scripts that are operating in different places at different times on different datasets. Compared to scripts, triggers form an elegant solution to maintaining such scripts and getting the same results. Also, triggers allow spatial databases to maintain themselves so that there is always a single source of truth, which is harder to accomplish with a myriad of datasets and individual scripts.
How can databases help to create reproducible workflows?
GIS analysts can relate to the idea of a reproducible workflow, where the inputs can be changed easily without having to recreate the entire workflow from scratch. While there are existing solutions to do this, such as Safe Software’s FME (Feature Manipulation Engine) or the ModelBuilder tool in ArcGIS, a spatial database such as PostgreSQL/PostGIS can be a good alternative to existing solutions. For example, an SQL query can contain all the different processing steps that are required for an entire spatial analysis workflow. That code can be commented, so that other users are able to understand it or when that code is reused by its creator at a later point in time.
Is it worth investing time and energy in learning SQL?
SQL, short for Structured Query Language, is a programming language that is used in all kinds of places, such as PostgreSQL, SQL Server, Oracle Spatial or with SpatiaLite file-based databases. This means that learning SQL is not just an investment in learning PostgreSQL, but an investment in learning a powerful way of querying databases in general. Other than programming languages such as R or Python, that both can be used for many different purposes, SQL has a focused purpose, namely getting your hands on your data quickly. This is also its strength, and it is a very versatile language too.
How can I get started with PostgreSQL/PostGIS?
The first step in getting started with PostgreSQL/PostGIS is to download and install PostgreSQL on a local machine. There are free desktop installers available for Windows, Linux, and Mac. With the installation of PostgreSQL comes the option to install multiple extensions, including PostGIS. Then, you can choose to load some data and start poking around with it or choose a PostGIS database from QGIS Desktop and load some tables into it. There are special plugins available for connecting to a PostGIS database inside QGIS and push spatial data to a spatial database, such as DB Manager. There are many more free plugins available for connecting to a PostGIS database, such as DBeaver or pgAdmin, which comes with a default installation of PostgreSQL. Both DBeaver and pgAdmin come with a spatial previewer for displaying spatial data on a map.
How does a PostgreSQL/PostGIS database fit into a geospatial stack?
There are many different configurations and variations on what you can do with a PostgreSQL database. In the end, it comes down to what you are trying to do and who is the end user you are trying to reach. The good thing is that there is a lot of flexibility available when choosing a geospatial stack. To start off with something simple, the most basic stack would be a PostGIS database connected with QGIS Desktop. From there, it is possible to insert open-source geospatial server packages such as GeoServer or MapServer. These packages can expose data from a database and expose it through web services so they can be consumed over the web. All these tools are available freely so you can get started right away.
Where can people go and ask questions when they get stuck when learning new geospatial tools?
Open source is all about community and there are many channels and communities for when people need help. The GIS StackExchange is a stack exchange subsite for asking and answering questions about GIS topics. Also, Slack has a lot of specialized channels that focus on tools such as PostGIS, GeoServer or QGIS. Finally, visiting a local or international FOSS4G event or a local meetup group helps you to share your enthusiasm for open-source technology and tools with similar-minded people.