COPC – A Cloud Efficient Data Format
In this episode, the discussion revolves around cloud-optimized point clouds. Our guest is Martin Dobias, the CTO at Lutra Consulting. Coming from a background in computer science, and a passion for geospatial, Martin has been part of a team that has done a ton of interesting work in the open source geospatial world. Today, he shares about their latest, state of the art developments in working with point clouds on the web; Cloud Optimized Point Clouds (COPC).
What Does Cloud Optimized Point Clouds Even Mean?
Point clouds are sets of individual points plotted in 3D space. They are typically very large datasets, as they must capture a real space in great detail. For instance, point clouds for a whole country may easily be several trillions of points, and many terabytes of data. Handling these large datasets on the web requires a lot of bandwidth to download and process.
The files for Cloud Optimized Point Clouds are structured with indexes for each part of the dataset. The index structure makes it possible to stream only the parts of the data that are required, without having to download the entire dataset.
What is Point Cloud Indexing?
Point cloud indexing structures a point cloud file, making it possible to find any particular point of interest in the file without having to scan through the entire dataset. COPC files are internally indexed using a 3D structure of cubes called Oak trees. One part of the file is the data itself, and the other part contains the hierarchical information of where to find each cube.
At the root level of the hierarchy is a single cube, which at the next level is split into eight smaller cubes. The splitting continues subsequently up to the highest hierarchical level. As the cubes get smaller, they contain a smaller amount of data, which saves bandwidth if a user is only interested in a small part of the data. It is similar to traditional tiling, but with the added 3D context.
How COPC Files Are Accessed Using HTTP Range Requests
Range requesting is a feature of the HTTP protocol used to access information more efficiently from web servers. Instead of a server sending an entire file to a browser, the range request feature allows the browser (client) to define a specific part of the data that the user is interested in. Subsequently, only the requested part is sent by the server.
For cloud-optimized point clouds, the server will simply go through the hierarchy and find the cube or multiple cubes that satisfy the request, and return these. This process is much faster than having to search through the entire dataset. Moreover, only the point cloud cubes that satisfy the browser request are sent by the server, which reduces the bandwidth used.
Converting LAS Files to COPC
LAS is a standard open source format for point cloud data interchange. However, it is less efficient to work with on the cloud since the data in it is not indexed. This means the whole LAS file must be loaded before it can be queried, sidestepping the efficiency we see with COPC indexing.
Conversion of LAS to COPC can be done in QGIS using Entwine. Practically, when a point cloud file is loaded to QGIS, it is automatically converted to COPC. QGIS structures the data automatically in order to make operations more efficient as opposed to working with unorganized datasets.
When point cloud files are converted to COPC in QGIS, the new COPC file contains all the information in the original dataset. Unlike other software that may discard some information when processing a file, for COPC nothing is discarded. This makes the COPC format great not only for visualization purposes, but for analysis as well.
What Infrastructure is Required to Serve COPC Data?
Serving cloud-optimized point clouds does not require any special infrastructure between the server and the client. It is easy to host and get the data to the client without complex infrastructure, i.e. there is no need for something like GeoServer, MapServer, or QGIS. Just having the COPC data in blob storage somewhere is all the infrastructure that may be needed.
Compatibility of COPC Format
A compressed LAS file is called a LAZ file. COPC is much like a LAZ file. This means that applications that accept these formats will also be able to work with COPC files without having to implement special support. The only difference is that they will not be able to use the extra features of internal indexing in the COPC file.
Where Can You View COPC data?
QGIS offers support for viewing COPC – both stored locally on your device, or remotely in the cloud. Using a link that points to a server containing COPC data, QGIS will load the data on demand according to the queried range. The data is further cached in QGIS, which makes subsequent data loads and views much faster.
There are also a couple of projects coming to life that explicitly support cloud-optimized point clouds. An example is the web viewer built by Hobu. With the link to a COPC data server, the web viewer will fetch the relevant COPC files and render them in the browser.
The PDAL Library
PDAL (Point Data Abstraction Library) is a library that contains a set of tools for working with point cloud data. In the QGIS environment, many users are familiar with PDAL’s feature for the data access of point clouds. Many of the library’s other functionalities are unknown to a lot of users. It contains a dozen features to classify, filter, export, convert point cloud to raster or meshes, amongst others.
The main reason why many functionalities in PDAL are not popular among ordinary users is due to the complexity in using it. The PDAL library uses pipelines that need to be crafted manually when working with point cloud data. While this may work well for advanced users, ordinary users find it a bit too complicated.
After a successful crowdfunding campaign, Lutra Consulting and several other partners are working to reduce this complexity, and make the functionalities in PDAL more user friendly. The project seeks to build a simple integrated toolbox within QGIS for point cloud data processing. The same way there are integrated toolboxes in QGIS for working with vector or raster data, there will also be one for working with point cloud data. These developments may be expected across the next two QGIS releases; in February and June of 2023.
What is the STAC Protocol?
STAC (SpatioTemporal Asset Catalog) is a protocol for easy access to spatial and temporal data. It makes it easier to index, discover, and work with geospatial information. STAC is commonly used with satellite imagery but recently it is increasingly being used for distribution of point clouds as well.
How Were Point Clouds Streamed Before COPC?
No doubt, before COPC there were some existing formats for streaming point cloud data. In the open source world, one of them is the EPT format, built by Hobu Inc.. The EPT format closely compares to raster tiles; but for point clouds. It is structured in a big directory with individual files (tiles). COPC files have an advantage over EPT format, as opposed to having thousands or even millions of files in a folder structure, COPC is just a single file – which is much easier to work with. In the proprietary world there are a couple of formats as well, one being the I3S format from ESRI, which supports point cloud data and other 3D data. There is no doubt we will continue to see explosive growth supporting point cloud data management. Stay tuned with the MapScaping Podcast to make sure you stay current on the latest and greatest developments!
PDAL – Point Data Abstraction Library