An Introduction to Geospatial Database Management
We live in an increasingly spatially aware world. Many of the decisions we make are based on location. Location data helps us to answer everyday questions like “Will my new apartment be close enough to the grocery store?”, “Where is the nearest park to walk my dog”, “What route can I take to avoid rush-hour traffic?”. Getting accurate answers to these questions requires high-quality, up to date, geospatial data. Data preparation is not a trivial task, and a massive amount of work has gone into compiling, organizing, and storing these data that enable us to make spatial decisions (Google Maps is a shining example).
In this blog, we will explore how geospatial data is stored, managed, and accessed using spatially enabled databases. We will cover a review of how databases are structured, then discuss how geospatial data is managed in a database with PostgreSQL and PostGIS.
Quick Review: What is a Database?
Let’s begin with a very high-level definition of a database:
“A database is an organized collection of structured information, or data, typically stored electronically in a computer system.” – Oracle
There are several logical models that define how data is organized and connected in a database. The three main types are:
We will discuss the Relational Database Model in this post as it is the most relevant and widely used for vector geospatial. There are some newer logical models used for geospatial purposes, such as an Object-Oriented database used by Oracle, but discussion on those is outside the scope of this article. A relational database is a type of database that allows us to access data records in relation to some attribute data.
“In a relational database, each row in the table is a record with a unique ID called the key. The columns of the table hold attributes of the data, and each record usually has a value for each attribute, making it easy to establish the relationships among data points.” – Oracle
A relational database table is organized into rows and columns. Each row represents a single entry, and every column contains an attribute value. This logical model presents a lot of flexibility to relate data contained in separate tables based on a common attribute column via joins.
Why Do We Need to Manage a Database?
A database represents a unified collection of data made available to users. The missing piece here is a set of rules and methods that govern how users interact with a database. For example, a single database may store many tables. There needs to be a way to access data, manage the connection between tables, add/remove data, and maintain consistent data definitions. This is where a Database Management System (DBMS) and the Structured Query Language (SQL) come in.
A DBMS provides methods and rules for how to organize, store, and retrieve data from a database. It can also control access to the database, ensure data integrity, and provide backup and recovery procedures. Traditional databases and DBMSs are concerned with tabular data in rows and columns. This is fine when considering only the attribute data discussed above. However, geospatial data also include geographies or geometries associated with each record that do not fit into a normal DBMS.
In addition to structured vector data (points, lines, and polygons), geospatial data can come in the form of unstructured data, like rasters. This makes geospatial data significantly different from traditional tabular data, and requires a somewhat specialized method of management.
Geospatial Database Management
PostgreSQL with PostGIS is an open-source solution for managing geospatial data in the PostgreSQL DBMS. PostGIS is a spatial database extender for PostgreSQL that adds support for geographic objects, allowing users to store, index, and query geospatial data. It supports a wide range of geospatial data types, including points, lines, and polygons, as well as raster data, and provides a variety of functions for performing spatial queries and analysis. Data models for most vector data rely on a relational model, while those for raster have a different model included with PostGIS.
Using PostgreSQL and PostGIS together, it is possible to store and manage large amounts of geospatial data in a single database, making it easy to perform complex spatial queries and analysis. This can be useful in a variety of applications, including urban planning, location-based services, and environmental modeling.
See this blog post for an introduction to PostGIS, this post for an introduction to PostGIS geographies and geometries, and this post for a guide on using PostGIS to create a routable topology in a spatial database.
The Geospatial Advantage
The powerful advantage of geospatial databases is that they enable joining separate datasets, not by a shared attribute, but by location using a Spatial Join. Some examples of the insights offered by a geospatial database could be:
- Find all Laundromats contained in the Borough of Brooklyn, NY using a table of businesses and administrative boundaries.
- Find all walking paths that have a slope less than 10-degrees using a walking path vector table and a raster table.
Check out the New York GIS Clearinghouse for a great example of the data stored in a geospatial database.
Another advantage of using PostgreSQL and PostGIS is the wide range of tools and libraries that are available for working with the database. Both QGIS and ArcGIS are compatible with PostGIS and provide a range of features for visualizing and analyzing geospatial data. This means a geospatial database can be a central repository for data that can be accessed by multiple users without copying data to their local environment. This (1) reduces redundancy and storage costs, (2) makes data easily available to users, and (3) separates the raw data stored in database tables from interpretations on a map or derivative tables in a spatial analysis.
Overall, PostgreSQL with PostGIS provides a powerful and flexible platform for geospatial database management. Whether you are working with small or large datasets, these tools offer a range of features and capabilities that make it easy to store, manipulate, and analyze geospatial data.
This was a very brief and high-level overview of geospatial database management, and the advantages of implementing a DBMS with geospatial capabilities. Geospatial database management is a critical component of many modern applications, workflows, and industries, enabling users to analyze and visualize data. Geospatial database management empowers unique spatial insights that have a powerful impact on industries like regional planning, city planning, sustainability, utilities, and emergency response, just to name a few.