Geospatial imagery data is constantly being collected by sensors which have been mounted to satellites, drones, planes, etc., and then uploaded to cloud storage. Much of that data, like USGS Landsat data, is publicly available.
In the past, users who wanted to access data would have to download a copy of relevant files to a local machine. This led to long download times, data duplication, and unnecessary processing of unused data.
This is where the Cloud Optimized GeoTIFF (COG) fits in, the foundation of the cloud native geospatial architecture. In this article, we will walk through what a COG is, how COGs compare to traditional GeoTIFFs, and provide resources for working with, and creating your own COGs.
What’s a TIFF?
TIFF stands for “Tagged Image File Format” and is a lossless image format. It begins with an Image File Header (IFH) that contains useful metadata and directions to where data is stored within the TIFF. TIFF files function like a container that can hold many images (or bands), as well as metadata.
Each image within a TIFF file has its own Image File Directory (IFD) which is made up of tags describing the image, as well as the image data itself. This makes the TIFF file format particularly well suited for satellite and aerial imagery as it often includes many different bands, like Landsat images for example, which have anywhere from 4 to 11 bands.
What’s a GeoTIFF?
Location, Location, Location! A GeoTIFF is basically a normal TIFF file, but with additional metadata that describes the image’s location, and its spatial reference.
TIFF files on their own don’t contain any georeferenced information to make the imagery useful to geographers, so this needs to be added later. For more information about GeoTIFFs, see the Open Geospatial Consortium (OGC) standards (https://www.ogc.org/standards/geotiff).
What’s a Cloud Optimized GeoTIFF?
A Cloud-Optimized GeoTIFF, or COG, is defined as
“a regular GeoTIFF file, aimed at being hosted on a HTTP file server, whose internal organization is friendly for consumption by clients issuing HTTP GET range request”
To put it simply, a COG is a GeoTIFF with an internal structure that is organized to make the reading of the data more efficient.
HTTP range requests are a method of requesting data from a server spanning only a specified range, and not the entire dataset. COGs enable data to be held in cloud native storage, and then accessed via streaming data to users. The process is similar to how you would stream videos or music. It does this by creating a tiling structure and overviews, used as follows:
Tiling organizes the image into internal tiles, so it is easy to access single tiles of data instead of the entire dataset.
Overviews are downsampled copies of the original image at lower resolutions. This makes rendering the image significantly faster when zoomed out because a viewer needs to render less data at lower zoom levels.
COGs are widely supported in coding libraries and software including QGIS, ArcGIS, Rasterio, and GDAL. For a comprehensive list of all libraries and software that support COGs, see https://www.cogeo.org/.
Although the COG format is relatively new, it is still a GeoTIFF at its core, meaning that it is backwards compatible with older software. COGs are great even if you’re hosting files locally because the overviews and tiling structure make rendering the image a breeze.
Want to see it in action? This viewer lets you enter the URL to any COG, and quickly view it in your browser.
At this point we have established that GeoTIFFs are TIFFs with georeferencing metadata, and that COGs are GeoTiffs that have been reorganized to enable streaming directly from cloud storage.
The main takeaways here are that COGs are the best suited file format for cloud storage because they enable users to only read the data they need, not the entire file, and enable streaming directly from cloud storage.
For these reasons, the COG format has been adopted by many organizations what work in geospatial cloud storage, including Planet and Google.
By now you might be asking how you can get your hands on a COG or, better yet, how you can make your own. There are a number of examples available on how to create a COG from a normal GeoTIFF. Below is a non-exhaustive list of useful resources for working with COGs.
COGs aren’t the only option for storing cloud-native raster data. Zarr and TileDB are two examples of formats that hold tiled ndarray data structures. Both have GDAL read/write support, and appear to perform better than COGs at-scale.
The advantage of COGs is that the GeoTIFF format is conveniently read by almost all Geographic Information Systems, and holds its place as the gold-standard for raster data within the Esri ecosystem.
Your personal choice of format should depend on your specific applications needs, and expectations for scaling over time.
Interested in what’s going on with cloud-native formats for other datatypes?
FlatGeobuf is a cloud-native format that enables streaming of vector data. The format is supported in QGIS 3.16+ and GDAL 3.1+, in addition to others.
The SparkGeo Terradactile project enables users to download a DEM COG for a given area of interest using GDAL. They also have some useful Python functions in their open-source repo.
QGIS 3.2+ supports reading COGs directly from cloud storage using GDALs virtual raster format.
You can check if a GeoTIFF is cloud optimized by using the validate_cloud_optimized_geotiff.py function.
Haven’t had enough of COGs? Check out our podcast episode on Cloud Native Geospatial to learn more with Chris Holmes from Planet.