Python is one of the most popular programming languages for geospatial analysis, and data science in general. Its popularity comes from Python’s high-level, human-readable syntax, and extreme flexibility. Before going any further, lets go ahead and define a couple important terms to know when working with Python.
What Is A Python Library / Package?
A Python library is a collection of modules that extends base Python functionality. When you download Python, it comes with a collection of functions and classes known as the Python Standard Library. Including all Python functionality within the Standard Library isn’t feasible, so Python developers can pick and choose how to extend the Standard Library by installing additional libraries.
What Is A Python Virtual Environment?
Not all Python libraries are compatible with each other, and sometimes you will want to have a stable install of a collection packages, and then a separate install for development with the latest versions. Python virtual environments are basically independent installs that allow you to install multiple collections of Python libraries on the same computer. A package manager like Anaconda is typically used to create and manage virtual environments.
A Python environment comes with useful functionality right off the bat, and further functionality is stored in the libraries that extend Python functionality. There are many open-source libraries available that can be installed in a Python environment with pip or Anaconda. We will get into how to install libraries and set up a geospatial Python environment in future articles.
-All in due time- For now, here is a selection of the most widely used geospatial Python libraries, organized by primary data structure.
Essential Geospatial Python libraries For ESRI Enviroments
If you work in the Esri ecosystem, you have been using ArcPy. This is Esri’s Python geoprocessing library that allows users to automate and extend raster, vector, and point cloud workflows in ArcGIS. It is important to note that that library requires an active Esri license.
A neighboring library to ArcPy, the ArcGIS API for Python, simply known as ‘arcgis’, includes several classes and functions to interface with ArcGIS Online and ArcGIS Enterprise. The ArcGIS API for Python is freely available with an Esri Developer account.
Essential Geospatial Python libraries For Raster Data
The Geospatial Data Abstraction Library (GDAL) is the raster processing powerhouse. It provides extremely flexible reading and writing capabilities for both raster (the GDAL part) and vector (OGR) formats, making it an essential tool in any Extract, Transform, Load (ETL) workflow. The vast majority of GIS software (and the geospatial industry at large) depend on GDAL. GDAL is written in C++ which is run on the command line, but includes a set of Python bindings, maintained by members of the GDAL community.
Rasterio is the alternative to GDAL for open-source raster processing, developed by MapBox. Rasterio utilizes GDAL under the hood and provides much of the same functionality, but uses a more Pythonic language style, making it more familiar for some users. It should be noted that Rasterio and GDAL’s Python bindings are incompatible so you will have to choose between the two in a single Python environment.
The Remote Sensing and Geographical Information Systems Library (RSGISLib) provides Python algorithms for several remote sensing workflows, including image segmentation, zonal statistics, change detection, and time series analysis.
A library of functions used for summarizing raster data within polygon vector features. Rasterstats is also used to query cell values from point features.
Scalable geospatial data analysis is in high demand. Rasterframes provides the ability to process raster data on a distributed, horizontally scalable environment utilizing DataFrames inside Apache Spark.
Essential Geospatial Python libraries For Vector Data
OGR is the vector processing arm of the GDAL library. It provides the ability to read, write, and process vector data in many different formats. Like GDAL, OGR is a C++ library with Python bindings.
Shapely is a computational geometry library based on the C/C++ library, GEOS (the software engine behind PostGIS) and Java library JTS. It enables PostGIS style geometric processing outside of a relational database management system, and outside of SQL. Shapely does need to be combined with other libraries to read and write spatial files.
Remember how Shapely needs other libraries to read and write spatial files? Well, Fiona fills that void. Fiona is a lightweight library of wrapper functions around OGR, concerned with reading and writing vector data. It has a more Pythonic syntax than OGR, utilizing Python’s way of accessing data in memory, rather than the C pointers used in OGR Bindings.
A library of wrapper functions around the PROJ (Cartographic Projections and Coordinate Transformations Library) C library. PyProj offers functionality for converting coordinates from one spatial reference to another.
GeoPandas extends the very useful Pandas library, enabling geometric data processing in Pandas DataFrames that otherwise would require a spatial relational database. GeoPandas depends on Fiona for reading and writing capabilities, and Shapely for computational geometry functions.
Another library for scalable Geospatial – GeoMesa offers capabilities for large-scale vector data processing through distributed computing. Near real-time data streaming is also an option for point, line, and polygon vector features.
Essential Geospatial Python libraries For Point Clouds
The Point Data Abstraction Library (PDAL) is the powerhouse of point cloud processing. The library is great for point cloud workflows organized into data pipelines. Like its cousin GDAL, PDAL is a C++ library with Python bindings. The PDAL Python API enables developers to write pipelines in json format to be run from a Python script. It also has the capability to incorporate custom Python functions extending PDAL’s base functionality. The downside is that PDAL has a somewhat of a steep learning curve, with syntax that may seem foreign to users only familiar with Python.
Listen to our episode on PDAL with Howard Butler
LasPy is a convenient Python library purely for reading and writing point cloud data in the standard LAS (or compressed version, LAZ) formats. The functionality of LasPy is less than that of PDAL but it has the advantage of being easy to install and work with in cloud hosted environments that might struggle with PDAL’s C++ dependencies.
Numpy, the classic Python library for working with multidimensional array data structures. When combined with a reader/writer library like LasPy, we can store point cloud data in a NumPy array, as well as filter/process the data (see this tutorial for an example). NumPy is also good for general use across the geospatial domain.
Essential Geospatial Python libraries For Visualization
Powerful 2-D and 3-D plotting with map visualization capabilities for both raster rand vector data. Matplotlib also has support for interactive geographic visualizations and animations.
An alternative to Ipyleaflet, Folium is also a bridge to leaflet.js. The difference between the two is that Folium is built toward static visualizations, whereas Ipyleaflet builds interactive widgets. A useful feature of Folium is that it provides easy functionality to export an interactive map to HTML, making it a useful tool in web development.