Some Background on Point Clouds
To put it simply, point clouds are a collection of XYZ points that represent some real world object of nearly any scale.
They can be generated in a few ways. As geospatial scientists, we mostly work with LAS/LAZ data collected by aerial LiDAR (light detection and ranging) scanners at varying scales, from landscapes, down to project sites. We may also derive point clouds from highly detailed orthoimagery of an area, such as from the products of a drone flight.
Another way point clouds can be generated is via standalone laser scanners. This is generally associated with smaller objects, think statues, down to items you can hold in your hand, like an apple.
There are hundreds of LiDAR/laser scanners on the market, and while the core collection concepts are mostly the same, their output formats can widely differ. There is no one ubiquitous point cloud format, but the closest we get is LAS.
In addition to being a bit ambiguous, point clouds are huge. Storing, managing, classifying, and visualizing collections of billions on billions of points is no simple task, for man or machine.
Despite these obstacles, point clouds and their consequent information are invaluable, meaning brave software engineers everywhere have stepped up to the challenge.
Let’s talk about what that looks like…
What is PDAL?
PDAL (also pronounced poodle) is the point data abstraction library. It is an open source library which has the unique strength of being able to handle any point cloud format.
PDAL is not the only program for handling point cloud data, but it is the leader for handling point clouds in a geospatial context.
Think back to our hypothetical laser scan of an apple. It really does not matter where that apple is in space, and knowing where it is does not change how we handle its data. It can be picked up and moved, or after lunch, cease to exist. This is perfectly fine for a wide variety of point cloud applications.
When it comes to landscapes, project sites, etc. the real world location matters. You may want to add imagery, impose some CAD objects, or create an inventory of items from the collection. In this case, space matters, and that is where PDAL shines.
PDAL is a developer tool, written in C/C++. It utilizes a JSON pipeline schema and contains some directions for use, command line utilities, and some conda accessible packages.
Despite this treasure trove of utility, it does lack a GUI. This does not mean, however, that it is not accessible. PDAL can be embedded in alternative or custom applications. One notable application is QGIS, which is ramping up its point cloud functionalities. That may take some time though, as supporting a GUI integration was never in the scope of the original development team.
Once you are up and running with PDAL, you can begin working with the powerful tools it gives you to filter and manage your data. We will get into that in more detail in a bit…
The Origins Of The Point Data Abstraction Library
PDAL has its humble beginnings as a project to support the Iowa Department of Natural Resources statewide LiDAR project (the first of its kind) called libLAS.
It was an impressive open-source tool that solved a niche problem.
LibLAS began to evolve when it was noticed by the US Army Corps of Engineers. They liked what they saw, but they wanted more. They needed something geared towards supporting a large scale data warehousing scenario. This was a “green field” in the market. The title project ensued, and PDAL, a child of Hobu Inc. was born, managing to maintain its open source origins.
Due to the massive scale of data that comes from working with point clouds, there are some unique challenges. How can one manage all this data, without losing its integrity? You need tools for data management, compression and translation, and access. Existing libraries, such as GDAL, were not prepared for this. While point clouds contain massive overhead in the form of points that may be treated (to a degree) as vector data, they also have uniquely raster needs, such as for subsampling and resampling. For these reasons, point clouds can functionally be treated as a third data type.
To address the needs of point cloud data, we have a few options that work in tandem to create a sort of stack.
PDAL works as the core data management option. Entwine supports data access and streaming (from the scale of your neighborhood, to entire continents). The task of compression is best tackled with LASzip to turn your LAS files into a compact LAZ.
Note: LAStools is also an awesome resource for working with LiDAR data
How Does PDAL Work?
To get started with PDAL, head over to their GitHub page to download the code.
It ships with the following commands: delta, density, ground, hausdorff, info, merge, pipeline, random, sort, split, tile, tindex, and translate. This is a pretty short list, but PDAL pulls its power from its ability to string together filters into a reusable pipeline. To conceptualize this, think of your classic model builder. If you want to accomplish a task, you need to put together the right tools, fill their parameters, connect all of your elements, cross your fingers, and hit Run.
PDAL follows this similar structure, and calls it a pipeline, built up of a series of algorithm based steps, called filters.
Lets say you want to classify the noise in your point cloud. You cannot just call a Classify Noise tool. For your first step, you would need to apply an algorithm that is sensitive to rogue, outlying points. Next, you would apply another algorithm that filters for clusters of a certain density threshold at high altitudes (to filter out planes, UFOs, freakishly large birds, etc.). You would define these steps in your pipeline, and save them for future use, thus the reusability aspect of pipelines. Imagine now you have been approached by a top secret government agency with a time critical mission of identifying UFOs. You can easily access your previous pipeline workflow, modify your density parameters, and now you have a new pipeline which is suitable for your super secret purposes.
One of the most important and beloved features of PDAL is its ability to translate between spatial reference systems, and file formats. Below is an example of how the skeleton for the hyper flexible ‘translate’ command can do either.
Skeleton:
$ pdal translate [options] input output [filter]
File translation:
$ pdal translate myfile output.las –metadata=meta.json -r readers.text \
–json=”{ \”pipeline\”: [ { \”type\”:\”filters.stats\” } ] }”
Spatial reference translation:
$ pdal translate input.las output.las -f filters.reprojection \
–filters.reprojection.out_srs=”EPSG:4326″
To round it all out, in addition to commands and filters, we have Readers and Writers. The job of the readers is to interpret ‘Dimensions’, ie. XYZ, and Intensity values, and the job of Writers is to consume this data. For example, you could use readers.i3s to read data stored in the Esri I3S format, then use writers.gdal to create a raster of the data using an interpolation algorithm, all without leaving your couch. Ok, maybe you will have to leave your couch for some of it…
The Point Cloud Collection Pipeline
Quality point cloud data, for the time being, does not manifest itself out of thin air. It goes through a number of steps and processes before it is ready to be visualized as meaningful data, like a finely detailed DTM or DSM.
First, the raw data needs to be collected. This can be via active (LiDAR, RADAR, Sonar) or passive (orthoimagery) methods. For a long time, availability of data was a massive barrier for entry into the market. Collection was limited to government projects, generally of large landscapes in order to support use cases like assessing flood risk. Once collected, data was not necessarily readily available to the public either. This began to change circa 2015.
Google Street View introduced a new scale of collection, at the more human level, and how much more human does it get than streets and buildings? Vehicle mounted scanners began a sort of democratization of point cloud data. Scans could now be collected at the municipal level, introducing a whole new world of applications and possibilities.
There can also be a mentality where you “might as well” collect one type of data while collecting another. If you have already rented a plane, or bought a high quality drone to collect LiDAR data of an area, you “might as well” collect some orthoimagery while you’re at it.
“Great! We have all this data. Can I get to work on building my augmented reality build of NYC?”
Not quite.
While the data may be available, it is still not necessarily accessible. Remember that we are working at massive scales of data. For context, Denmark, a country taking up only 43,000 sq km on Earth, takes up 2 TB of data as a point cloud. This is far more than your average analyst will have the resources to manipulate, process, and, the real feat, visualize.
Most point cloud processing is still going to be a server scale workflow, maybe a desktop workflow if you are well invested in your setup and realistic with your scope.
This does not mean that your laptop needs to collect dust if you find yourself heavily involved in this area. Mobile workstations provide a lot of flexibility, and can be great for working out the kinks on a small subset of your larger dataset before moving up to production scale.
There are resources for streaming and managing data at scale. The most important here is Entwine. You can hit the ground running by installing the package from the Conda Forge catalog using Conda. Entwine gives you the ability to access your preferred scale of point cloud data via Entwine Point Tiles. These are essentially your classic take on tile based data, but optimized for point cloud streaming. Get into some data manipulation with PDAL, then you can then move onto visualizing your findings with Potree or Plasio.
The Future
Point clouds are an exciting technology, with huge amounts of potential on the horizon as hardware and software race forward. As resources for managing these behemoths of data scale, we will see more and more value and applications in the geospatial industry, and the broader world itself.
One of the flagship examples of this merging of worlds is the iPhone 12 with its built in LiDAR scanning technology. Tell a photogrammetrist 10 years ago that high schoolers would be able to carry an (albeit limited) LiDAR scanner in their pocket, and they probably would have laughed. Today, that is very much a reality. LiDAR and point cloud technology has taken the world by storm.
The expanding presence of high quality data in our world raises a number of questions and concerns about the societal impact of data collection, in addition to the well-justified excitement.
When someone takes a 3D scan of a sculpture, who owns that data product? The scanner, or the artist?
This scales to risk and error as well. If an automated vehicle, equipped with LiDAR sensing which is responsible for capturing the road’s basemap on the fly is involved in a crash, where does the blame get put? The inputs to the vehicle’s software, the software itself, the engineers, the driver? Science often precedes ethical discourse, and there are still many discussions to be had as these technologies disseminate to the masses.
Another aspect of the exponential growth of available data, is the concept of data as infrastructure. Data has represented infrastructure since the beginning of GIS (and before), but as coverage grows, when does the data itself become the infrastructure? When countless organizations and entities rely on data, and quality data at that, where does the burden of maintenance and responsibility get placed? Who owns the risk?