Our guest on the show today is Chris Holmes, the Vice President of Product and Strategy at Planet. Chris entered the geospatial arena almost 20 years ago as an early contributor to the GeoServer project. In the beginning, he was writing code, but eventually discovered his talents were better spent helping to build a community and awareness around GeoServer and the greater Open Geospatial Consortium (OGC). Chasing innovation, he now works to promote wider spread adoption of cloud native geospatial solutions such as the SpatialTemporal Asset Catalog (STAC), and use of cloud optimized geotiffs (COGs) from his position at Planet.
The Basics of Cloud Native Geospatial
As one of the newest technologies on the GIS scene, cloud native geospatial can seem a bit intimidating. Realistically, it is a lot of familiar industry staples repackaged to take advantage of huge advancements in computing technology.
At its core, cloud native geospatial (CNG) is your classic geospatial infrastructure, without all those pesky computational power and storage limitations.
By leveraging the power of AWS, BigQuery, Snowflake, Google Earth Engine, or ArcGIS Server, users can access and analyze global-scale datasets without needing to purchase and maintain the physical servers traditionally used in on-premise setups.
This shift reduces the barrier to entry for scientists, and even casual users, allowing more spatial questions to be asked and answered.
CNG systems are flexible and scalable to most needs. One can choose to host some, or all of their data in the cloud, then access it for local analysis, complete that analysis in the cloud, or take a hybrid approach.
GIS work has traditionally followed a desktop-centered workflow. Using cloud native geospatial, it does not matter if you access and analyze your cloud data through a browser, or desktop application, although each platform will come with its own natural limitations.
GIS Data Formats and the Cloud
Cloud technology alone has existed for a while, but took a bit of a learning curve to adopt into GIS due to the specific requirements and preferences of cloud infrastructure.
This means that some people took this rare opportunity to essentially start from scratch, and build data formats that are optimized specifically for cloud systems, but often still maintain the flexibility to be backwards compatible to desktop and enterprise workflows.
It is not possible to talk about CNG without talking about Cloud Optimized GeoTIFFs (COGs). COGs are the backbone of cloud native geospatial, and are essentially responsible for starting the GIS industry’s race to the cloud. The beauty of COGs is that they can be used as a regular GeoTIFF in a desktop setting, or leveraged in the cloud to unlock fantastic real-time data streaming and analysis.
The key element of COGs is their compatibility with range requests. The efficiency of range requests is what enables streaming in a lot of our favorite applications, like Spotify, and even Netflix or YouTube. A range request is when a client reaches out to a server for information in its HTTP header to first know if the server supports range requests.
If range requests are supported, this means that the client can query what is essentially a table of contents for the data to retrieve only the data that the client is interested in, then stream it back for use.
If you are familiar with Python, you can think of this almost like slicing a list [x:y]. By pulling only the data relevant to the query, performance is greatly increased as fewer packets can be transferred back to the client.
At this time, streaming optimized geospatial formats are mostly limited to raster and point cloud data. There are nevertheless hopes that we will see options for optimized vector formats in the not too distant future.
Open vs Closed Geospatial Data Standards
In the past, GIS was often more or less siloed within an organization, making it reasonable to work in closed or proprietary formats. Today, in an increasingly interconnected world, sharing data and preparing it with interoperability in mind is essential.
Open data standards have been embraced by many as they can be used as-is, or maybe modified to work with existing infrastructure to create a better fit for an organization’s needs.
Having this groundwork in place gives developers a good spot to start from when creating a custom implementation, and leads to greater potential for innovation.
Although open data standards and formats have come to dominate the industry, closed data standards still have a presence. The best example is Google Earth Engine. Within its system, Google Earth Engine ultimately handles data in a closed proprietary format.
They know, however, that consumers today are generally unwilling to accept the risks of holding all of their data in a closed format.
After all, if the provider went out of business, then the consumer would lose usability of their data. Google remedies this by allowing other data formats to essentially port into their own, allowing clients the flexibility and security they would get if they utilized open standard formats.
What’s Next for Cloud Native Geospatial?
CNG has already brought a huge paradigm shift to the industry, removing traditional access, storage, and processing barriers for those who enter the game. As more and more datasets are uploaded into the cloud, it begs the question of what the next great leap forward will be.
One potential development we may hope to see in the future is data that is optimized for retrieval by search engines. At this point, the data exists, but it needs to be described in a way that allows the search engines to find it, and match it to user needs. This means fully populated metadata, and plain text descriptions that allow data to be matched to a query.
More accessible and queryable geospatial datasets could play a huge role in bringing geospatial to the masses, rather than following the current theme of only being used by those involved in the GIS industry. The SpatioTemporal Asset Catalog is a great example of what this may look like.
Another glimpse into the future of CNG is that eventually, pretty much all the data we could want will be available in the cloud, in open data standards. This can allow the focus to shift from data aggregation, to creative data analysis and applications. As young people come into the industry, they will not be clouded with the ideas of what cannot be done, but will rather see the wealth of options and resources available, and take it and run with it, hopefully leading to the next big thing.