Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
podcast
Filter by Categories
Galleries
Print Category 1
Print Category 2
Uncategorized

Raster Processing with GDAL and Python

Tutorial: Getting started in Raster Processing with GDAL and Python

In this geoprocessing with Python guide, we will cover the basics of reading and analyzing raster data with GDAL. After following this guide, you’ll have the knowledge to read raster data stored on the cloud, extract metadata, load individual raster bands, and calculate band statistics.  

Working with GDAL

The “Geospatial Data Abstraction Library” is one of the most widely used raster processing libraries and is the power behind much of the geospatial software industry at present. It is written in C++ for reading and writing raster data. Its, free, open-source, and very well supported. However, for the lay person (and even experienced developers), coding in a low-level language like C or C++ can be daunting. Conveniently for us, some very smart people have built Python bindings for GDAL (bindings are a package of code that allows us to run the underlying GDAL C++ code using Python commands). 

Want to learn more about how GDAL works? Listen to this podcast episode with Paul Ramsey, the co-founder of PostGIS.

We have three main goals for this guide:

  • 1. Open a raster dataset with GDAL and read image bands
  • 2. Extract metadata 
  • 3. Calculate band statistics

Requirements 

This guide assumes that you have Python and Anaconda installed on your OS, and already have a Python setup environment with GDAL installed. If you do not, follow this tutorial to set up a Python environment with many of the essential geospatial packages.

Extract Metadata

Our first step is to import the required packages. GDAL is contained within the OSGeo library and is imported with the following command:

“`

  • from osgeo import gdal

“`

We will be working with a NAIP image of downtown New Orleans, Louisiana USA, hosted by the United States Geological Survey (USGS). GDAL has the capability to read raster data in from cloud storage using a GDAL virtual file system method. 

These data are publicly available, so we append the generic “/vsicurl/” before the raster URL.

“`

# Url to publicly available geo-image. A NAIP image of New Orleans, Louisiana, USA

url = “https://prd-tnm.s3.amazonaws.com/StagedProducts/NAIP/la_2015/29090/m_2909007_ne_15_1_20150430_20151019.jp2”

ds = gdal.Open(‘/vsicurl/%s’ %(url))

“`

We now have a variable “ds”, which is a GDAL raster dataset. Raster datasets are composed of one or more bands, where each band contains a single layer of data. For example, a Red/Green/Blue (RBG) image has three bands, one for each color, whereas a grayscale image has a single band. We can learn more about this dataset by extracting the metadata. The command below prints all of the metadata header included in the image.

“`

  • print(gdal.Info(ds))

“`

Here is the result

“`

  • Driver: JP2OpenJPEG/JPEG-2000 driver based on OpenJPEG library
  • Files: /vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/NAIP/la_2015/29090/m_2909007_ne_15_1_20150430_20151019.jp2
  • Size is 6963, 8037
  • Coordinate System is:
  • PROJCRS[“WGS 84 / Pseudo-Mercator”,
  •     BASEGEOGCRS[“WGS 84”,
  •         DATUM[“World Geodetic System 1984”,
  •             ELLIPSOID[“WGS 84”,6378137,298.257223563,
  •                 LENGTHUNIT[“metre”,1]]],
  •         PRIMEM[“Greenwich”,0,
  •             ANGLEUNIT[“degree”,0.0174532925199433]],
  •         ID[“EPSG”,4326]],
  •     CONVERSION[“unnamed”,
  •         METHOD[“Popular Visualisation Pseudo Mercator”,
  •             ID[“EPSG”,1024]],
  •         PARAMETER[“Latitude of natural origin”,0,
  •             ANGLEUNIT[“degree”,0.0174532925199433],
  •             ID[“EPSG”,8801]],
  •         PARAMETER[“Longitude of natural origin”,0,
  •             ANGLEUNIT[“degree”,0.0174532925199433],
  •             ID[“EPSG”,8802]],
  •         PARAMETER[“False easting”,0,
  •             LENGTHUNIT[“metre”,1],
  •             ID[“EPSG”,8806]],
  •         PARAMETER[“False northing”,0,
  •             LENGTHUNIT[“metre”,1],
  •             ID[“EPSG”,8807]]],
  •     CS[Cartesian,2],
  •         AXIS[“easting”,east,
  •             ORDER[1],
  •             LENGTHUNIT[“metre”,1,
  •                 ID[“EPSG”,9001]]],
  •         AXIS[“northing”,north,
  •             ORDER[2],
  •             LENGTHUNIT[“metre”,1,
  •                 ID[“EPSG”,9001]]]]
  • Data axis to CRS axis mapping: 1,2
  • Origin = (-10039629.790132820606232,3503552.430475980974734)
  • Pixel Size = (1.000000000000000,-1.000000000000000)
  • Image Structure Metadata:
  •   INTERLEAVE=PIXEL
  • Corner Coordinates:
  • Upper Left  (-10039629.790, 3503552.430) ( 90d11’15.10″W, 30d 0′ 0.07″N)
  • Lower Left  (-10039629.790, 3495515.430) ( 90d11’15.10″W, 29d56’14.91″N)
  • Upper Right (-10032666.790, 3503552.430) ( 90d 7’29.93″W, 30d 0′ 0.07″N)
  • Lower Right (-10032666.790, 3495515.430) ( 90d 7’29.93″W, 29d56’14.91″N)
  • Center      (-10036148.290, 3499533.930) ( 90d 9’22.51″W, 29d58′ 7.51″N)
  • Band 1 Block=4096×4096 Type=Byte, ColorInterp=Undefined
  •   Overviews: 3482×4019, 1741×2010, 871×1005, 436×503, 218×252, 109×126
  •   Overviews: arbitrary
  •   Image Structure Metadata:
  •     COMPRESSION=JPEG2000
  • Band 2 Block=4096×4096 Type=Byte, ColorInterp=Undefined
  •   Overviews: 3482×4019, 1741×2010, 871×1005, 436×503, 218×252, 109×126
  •   Overviews: arbitrary
  •   Image Structure Metadata:
  •     COMPRESSION=JPEG2000
  • Band 3 Block=4096×4096 Type=Byte, ColorInterp=Undefined
  •   Overviews: 3482×4019, 1741×2010, 871×1005, 436×503, 218×252, 109×126
  •   Overviews: arbitrary
  •   Image Structure Metadata:
  •     COMPRESSION=JPEG2000
  • Band 4 Block=4096×4096 Type=Byte, ColorInterp=Undefined
  •   Overviews: 3482×4019, 1741×2010, 871×1005, 436×503, 218×252, 109×126
  •   Overviews: arbitrary
  •   Image Structure Metadata:
  •     COMPRESSION=JPEG2000

“`

This output tells us a lot about the dataset. First, it tells us the image format (JPEG-2000) and the image size (6963, 8037). The next is the coordinate system specification of WGS 84 / Pseudo-Mercator with units of Meters. Below the projection is the origin coordinates and the pixel size. Units of meters was specified in the projection, so we can see that this image is 1m resolution. The lines following describe the image structure. First, the image bounding box in both Web Mercator coordinates (Meters) and Lat/Lon. Lastly, some information is provided about the image bands. We can see that the GDAL dataset has four bands.

The same metadata can be extracted and parameterized with the following code block:

“`

  • driver = ds.GetDriver().ShortName
  • size = (ds.RasterXSize, ds.RasterYSize)
  • bands = ds.RasterCount
  • proj = ds.GetProjection()
  • geotransform = ds.GetGeoTransform()
  • print(“Format: “, driver)
  • print(“Image size: “, size)
  • print(“Number of bands: “, bands)
  • print(“Projection: “, proj)
  • if geotransform:
  •     print(“Origin = ({}, {})”.format(geotransform[0], geotransform[3]))
  •     print(“Pixel Size = ({}, {})”.format(geotransform[1], geotransform[5]))

“`

Read Raster Bands and Calculate Statistics

Now we have opened a raster dataset, and have learned a bit about what’s inside. Specifically, we know that the image has four bands. The NAIP data model indicates that these bands represent Red, Green, Blue, and Near-Infrared (NIR) layers. Let’s drill into this dataset by fetching each raster band, and calculating statistics. 

Raster bands are fetched with the following command, where the number within the parentheses is the band index (starting from 1).

“`

  • red = ds.GetRasterBand(1)

“`

GDAL’s Python RasterBand class has many useful functions. For now, we will take a look at the methods that allow us to get statistics about our data. 

The command below returns the band datatype. In this case, our data is in “Bytes” binary format.

“`

  • gdal.GetDataTypeName(red.DataType)
  • Byte

“`

There are several ways to fetch band statistics. If the raster band has minimum and maximum data included, we can extract those values with the following:

“`

  • min = red.GetMinimum()
  • max = red.GetMaximum()
  • print(min, max)
  • 36.0 250.0

“`

The GetStatistics() method will return the minimum, maximum, mean, and standard deviation if it is stored in the band metadata. 

“`

  • min, max, mean, stdev = red.GetStatistics(True, True)
  • print([min, max, mean, stdev])
  • [36.0, 250.0, 119.06808859721, 34.933810008993]
  • “`
  • Not all bands will have statistics available. In that case, we can calculate the statistics with the following commands.
  • “`
  • min, max = red.ComputeRasterMinMax(True)
  • mean, stdev = red.ComputeBandStats() 
  • print([min, max, mean, stdev])
  • [36.0, 250.0, 119.06808859721, 34.933810008993]

“`

Those are the core methods we can use to extract information from raster bands and calculate statistics. Look at the Python GDAL Cookbook for more info on working with raster bands. The code block below will loop through each raster band in a dataset and print the statistics. If statistics are not included, the code with calculate them.

“`

  • for index in range(ds.RasterCount):
  •     index += 1
  •     band = ds.GetRasterBand(index)
  •     print(“Band: “, index)
  •     dtype = gdal.GetDataTypeName(band.DataType)
  •     print(“Data Type: “, dtype)
  •     stats = None
  •     stats = band.GetStatistics(True, True)
  •     if not stats:
  •         min = band.GetMinimum()
  •         max = band.GetMaximum()
  •         if not min or not max:
  •             min,max = band.ComputeRasterMinMax(True)
  •         mean, stdev = band.ComputeBandStats()
  •         stats = [min, max, mean, stdev]
  •     print(“Minimum=%.3f, Maximum=%.3f, Mean=%.3f, StdDev=%.3f” % ( \
  •                 stats[0], stats[1], stats[2], stats[3] ))
  • Band:  1
  • Data Type:  Byte
  • Minimum=36.000, Maximum=250.000, Mean=119.068, StdDev=34.934
  • Band:  2
  • Data Type:  Byte
  • Minimum=57.000, Maximum=245.000, Mean=124.498, StdDev=28.529
  • Band:  3
  • Data Type:  Byte
  • Minimum=50.000, Maximum=255.000, Mean=112.321, StdDev=30.924
  • Band:  4
  • Data Type:  Byte
  • Minimum=33.000, Maximum=233.000, Mean=141.057, StdDev=33.471

“`

In this tutorial, we’ve covered how to use the GDAL Python binaries to read in a raster dataset, extract metadata, fetch raster bands, and calculate band statistics. In the next post in this series, we will dig in deeper and do some raster processing that involves processing and writing raster data.

Recommended Podcast Episodes related to GDAL