Simone Giannecchini is the director and founder ofGeoSolutions.
He’s been in the geospatial industry for 17 years. But before that, he worked in a completely different field — real-time systems. Then, he got a contract at the NATO center in Italy where he started exploring GIS.
It was supposed to be a six-month contract, and that was 17 years ago.
It’s an open source product developed in Java, using enterprise Java architecture, that allows you to take your geospatial data (shapefiles, special DBMS like post GIS) and publish it to the web.
You get good-looking maps or other types of services where you can directly access your data.
You can also use raster data like GeoTIFF and protocols standards for OGC (Open Geospatial Consortium) and ISO, plus de facto standards such as TMS, etc.
The idea is to serve your data on the web quickly, with well-known protocols.
GeoServer is a platform and a product.
The product you download from the web is what the GeoServer team decided to be the product — it’s a selection of the most important plugins — at least in our opinion.
These plugins carry the input format, as well as services focusing on simplicity and usability. They allow people to go from “I want to use GeoServer” to “I’m using GeoServer to publish a map” — fast.
Several formats are supported by default, for example, geoTIFF, PostGIS, and shapefile — some requiring a free and open source extension.
Depending on the licensing of the “plugin” itself, you can very quickly publish a shapefile in GeoServer through a decent admin GUI. You can go from a shapefile to having a map published on the web fast and making queries to the shapefile through specific OGC protocols like WFS (Web Feature Service).
It can be a shapefile or any number of other geospatial file formats. It could also be an enterprise level database — PostgreSQL or Oracle, or in sequence services such as Mongo or Elasticsearch.
Geopackage is also gaining momentum as a format for serving.
The number of plugins you can select and the number of input formats you can serve are extensive.
Your mileage may vary.
When you have a GeoTIFF, you can point GeoServer at it and serve it. However, we recommend to pre-process it a little if it’s big — with GDAL, QGIS or ArcGIS overviews, tiling, and so on.
The baseline is quick to do.
Sometimes, we have installations of GeoServer serving petabytes of raster data from observation missions or from drones. Those cases are more complicated.
We use COG (Cloud Optimized GeoTIFF) a lot lately.
GeoServer doesn’t have an internal format that is proprietary specific to GeoServer. Instead, it uses the data in the format you have. However, we usually ask people to pre-process and optimize them a little.
You can do it with GeoServer — but we usually don’t.
It’s computationally too intensive. It’s not something you want to do on the same servers where you serve, for example, “get map requests” for mapping — they would compete for the same resources. It will end up in poor performances.
Usually, a large infrastructure pre-processing is part of an ingestion chain, which is done separately from GeoServer.
If I want to build tiles, should I be doing that with GDAL somewhere else and then move them to the server and point GeoServer at them?
For the use cases we work with, it’s nearly impossible to generate a complete cache upfront.
Allow me to take a step back here.
GeoWebCache now ships by default with GeoServer. You can still use it standalone, but it’s integrated.
Most of the time, we let GeoWebCache create the cache on the fly — you ask for tiles and create the cache as needed with no seeding (sometimes we do seed, at least some levels).
Let’s say you have a web application, and you know your application lands at certain zoom levels. It’s good to cache at least at those levels. But on average, we don’t cache in advance.
The only cases where we do this are background maps. This is because they don’t change frequently, and you need them for a long time. In that case, you can, as part of the pre-processing, ask GeoWebCache itself to generate the cache for you.
Once the feature is released, you’ll also be able to generate them offline as a GeoPackage and put them behind GeoWebCache — this is still in development for the German space agency, though.
Most of our enterprise clients work with corporate databases — Oracle or PostgreSQL. We use GeoServer in larger workflows for a big city, a region, or an organization producing and selling data. The GeoServer sits on top of a database where it’s the endpoint of a longer data production pipeline — which can be automated like a sensor or something generated by humans, such as water and river authorities.
There is a lot of manual data production for vector data.
For raster data, it depends.
GeoTIFF is like the Swiss knife of raster data. We work with COG, and NetCDF (Network Common Data Form) as well. In addition, we’ve worked with a METOC — meteorological and oceanic, and atmospheric — organization. They run models for forecasting and produce NetCDF. The key in this arena is to get the data as quickly as possible to the web.
The importance of the data decreases quickly as the “freshness” of the data decreases.
These are the most common formats for raster and vector data.
For services, it’s maps. Everybody likes maps. Everybody wants to see maps.
The main protocols involve map generation such as WMS (Web Map Service), WMTS (Web Map Tile Service) and so on.
In the enterprise environment, people want more than just maps — they want to drill down data, put it on a chart, and run some processes.
We use OGC’s WFS (Web Feature Service) a lot, allowing us to access vector data.
It’s like a geospatially enabled SQL interface you can call from the web and get results back in JSON or DML (Dynamic ML).
We work a lot with WPS — a processing service for extended functionalities, like the ability to compute statistics or compute aggregations on vector data and turn it into charts you can see on our web client map stuff.
We work quite a bit with WCS (Web Coverage Service) — the raster counterpart to a WFS service. It’s used for doing subset crop, reprojection, sub-sampling or resampling raster data — you get back numbers rather than an image.
COG shifts some of the processing. If you want to pre-process to do analysis and visualization to the web, you can transfer numbers from the GeoTIFF to the web quickly and do visualization.
The distinction between what you can do server-side with WCS and what you can do with COG is fading more and more but let’s put it this way:
Having a large GeoTIFF is one thing. Another thing is accessing a 20-petabyte mosaic seamlessly and subset a portion without implementing client-side alogic to mosaic.
Perhaps in the future, we can think about COG, not as a GeoTIFF but as a WCS like interface MPI (Message Passing Interface).
It could be an output format for WCS while you still have WCS in the back end doing all the magic. So rather than every GeoTIFF and accessing through COG, it could be a ? GeoTIFF, or an MPI, and you could have a WCS that does everything.
WPS is one of the most exciting services from OGC (Open Geospatial Consortium). But, unfortunately, it is also the least understood service because it’s a black box to a certain extent.
It’s a way to expose processing, like routines or existing services, to the web and call them in a standard way. The magic is done by the process itself that you expose.
Until a few years ago, everybody implemented the buffer. But, in reality, you don’t need that for WPS — to buffer for vector data.
Sometimes standards are good, but they don’t cover all needs, especially when you have to do a project.
WPS gives you a quick way to implement functionalities that you might not find in the existing protocols.
For example, in our web client in MapStore, by default we use a few WPS processes to do more advanced stuff like the aggregations on vector data for the charts or to ask GeoServer silly things, which we couldn’t in WFS.
From asking the unique value on an attribute on a particular data to more advanced stuff: in the OGC interoperability experiment, I’ve seen people using WPS to expose machine learning and artificial intelligence. At the beginning when there wasn’t a straightforward way or a specific service for these.
WPS is a black box you can fill with things you want to expose to the web in a relatively standard way.
Bit of both.
Several processes are available by default in GeoServer, such as an aggregate process and a distinct process.
These things are the values for an attribute on a unique data set. There are processes, for example, for computing statistics for contouring. There are processes for doing heat maps, and so on.
These are pre-built in GeoServer and freely available. There is also support for scripting in Groovy, which is a scripting language built for the JAVictor machine for Java, which is popular.
You can also write more compelling processes or advanced processes in Java and deploy them in GeoServer. There are also ways to call processes from the command line but it’s not what we use most of the time.
The real power of the WPS implementation in GeoServer is that you can write processes that can work in close cooperation with the other services. There is the concept of rendering transformation; there are already a few available by default, but you can call the process so that it can be plugged as part of the rendering process into your server like you can make a “get map” request.
You can have a process that contours your data on the fly. Suppose you publish a digital elevation model in GeoServer — you can ask GeoServer to, as part of the rendering of the DEM, transform the DEM from raster into vector and render the control lines at the end, rather than the original raster data.
The actual power of GeoServer is to manipulate data that is published in GeoServer quickly, and if possible, as part of the other services like WMS.
And as long as you do the “get feature info,” you will get information about the underlying geometry as it’s being created by the contour process. The process is called every time as part of the rendering itself.
Yes, there is a free and open source extension for GeoServer, available as part of the downloads, which is well supported and serves vector tiles via WMTS and WMS. However, it’s more than just vector tiles.
Lately, we have also worked on adding support for TileJSON.
So we added it to GeoServer. You can style vector tiles from GeoServer using Mapnik, one of the most widely known visual stylers for vector tiles.
The GeoServer product you download is a standard composition of these extensions — the minimum set of extensions we think people need as a part of GeoServer.
But there are two levels of extensions — the official extensions and community extensions.
This is both a community decision and a maintainability decision — we try to lower the bar for contributing to GeoServer by introducing the community extension.
Suppose you develop new functionality and you think it’s suitable for the public. You can publish it as a community extension — we don’t ask too much in terms of quality or support. It’s just something you can say, “I want people to get familiar with this and see if it’s of interest.”
The official extension is more complicated; you need to have strict quality thresholds for your code. In addition, you need to be responsive on the mailing list as you will be part of the full build of GeoServer and testing. If there is an error, you need to fix it right away. Otherwise, we kick it out of the build.
The next step is becoming part of GeoServer, the core product, but this process moves slowly. We don’t want everything that is contributed to becoming part of the core. If you do too many things simultaneously, it’s hard to do any of them very well.
That’s our goal.
GeoServer can be administered via the GUI and via a REST API, which is a programmatic interface.
For the smallest scenarios, use cases and deployments, there might just be a single GeoServer installation — someone doing things infrequently manually via the GUI.
For large installations with tens, hundreds or thousands of GeoServer instances running in parallel in the cloud, auto-scaling up and down, depending on the load, you can simply gothere and change the configuration by hand.
There are well-defined procedures and guidelines for changing the development environment, test environment, QA, production, and things that have to move between environments in an automated way.
To do that, you use a REST API. GeoServer allows you, via the REST API, to programmatically administer it and do 99% of the tasks you need — only a few things can’t be done solely via the web interface.
This is how we do things mostly in our installations where GeoServer is serving data that is continuously ingested — via the REST API without human intervention.
For the largest installations we’ve done, we have used GeoServer with time-series data.
The Luxembourg Collaborative Ground Segment is a project from the Luxemburg space agency — GeoServer is serving around 25 petabytes of Sentinel-1 and Sentinel-2 data with the opposite time dimension to move back and forth in time.
We have several clients doing the same thing. For example, we work with ship position data and moving objects. The time dimension is an important part of it.
We do work a lot with sensor data — it’s pure time series. You have a position and have records generated at this position over time.
GeoServer supports the time dimension with WMS very well, with a few extensions inside it. It does the same for WCS.
Like all open source products, the new things are mandated by what clients ask.
Not promising anything, but we’re working on some interesting stuff.
We want to be more cloud-friendly.
GeoServer is not cloud-native, obviously — it was born before the cloud even existed.
People ask, “Is GeoServer cloud-ready/cloud available?”
Yes and no, depending on who you’re talking to.
With docker and Kubernetes, we have large installations of GeoServer that have been up and down with no issues.
It's goal to make GeoServer more cloud-friendly.
But what is cloud, anyway?
Public cloud, like Google Cloud or AWS (Amazon Web Services)? Or a private cloud?
It’s a vast term.
The problem I see is that usually, people put technology before the problem. Instead of understanding the problem, they throw technology at it and make it worse.
Another thing we are working on, mind you, this is an internal experiment, is more support for tilling stuff.
We are interested in implementing 3D tiles specification on the client-side.
We are also looking at ways to serve data according to the specification server side — this is at a prototypal state and ongoing. COG is already supported but we want to extend support to as many cloud platforms as possible.
COG is supported by Amazon S3, as well as some pure HTTP but we are working with Google Cloud, and soon with Azure, to streamline the support.
We’ve also talked to our clients about becoming more Esri-friendly for serving our data to ArcGIS online and ArcGIS desktop, or the various desktop versions of the Esri stuff.
We are all for open source, but we simply can’t say no to people using Esri who want to connect to GeoServer.
Public organizations serving data to the public tend to work with Esri stuff.
There is also some work going on internally to improve vector tiles generation, more advanced visualizations, and more use of the database.
GeoServer abstracts away from the underlying database, be it PostgreSQL or Oracle, although we have optimized sequels for each of them.
There are things that still, in our opinion, could be further optimized at the cost of writing more database-specific code. We can leverage generating vector tiles directly in PostGIS. When we see you’re working with PostGIS, we can generate the vector tiles directly into it rather than outside it.
This will be a lot of work to make specific database code, but it’s worth it. The same thing goes for advanced visualization like heat maps, hexagon, clustering — if we move this to the specific database, it will be faster and more scalable.
People compare what you do with software like CARTO, or in the past, Google Maps. We need to be independent of the various databases but we need to make sure we exploit them as much as possible.
If you had to look at CARTO, it was made in the past for PostgreSQL — it was using advanced PostgreSQL functionalities.
We’re not trying to do the same, but we need to go in that direction.
As a company director, it’s stressful because you need to keep up and innovate.
You need to make sure that the things you do translate and work well in production.
From a technical standpoint, it’s all exciting and challenging. It’s good to be in this field right now. With COVID, everybody knows what the dashboard is — I don’t need to explain it anymore.
In the past, when someone asked me what we did, I’d say, “Something like Google Maps, just on a smaller scale BUT probably more flexible.”
Smaller scale didn’t sound good. So I added the flexible bit.
Now I say, “Did you see those dashboards?”
That’s what we do, but it’s more for technical people than the public.
A couple of times during this interview, we mentioned cloud-optimized GeoTIFF.
If you’re not already familiar with it, it’s sometimes referred to as a “COG.”
It’s an amazing concept.
You can already produce these using pretty much every open source piece of software that runs on GDAL.
This is going to be a game-changer. It’ll change the way we share raster data on the internet.
It’s worth finding out more about it if you’re not familiar with it already.
Be sure to subscribe to our podcast for weekly episodes that connect the geospatial community.
For more exclusive content, join our email. No spam! Just insightful content about the geospatial industry.
To put it simply, point clouds are a collection of XYZ points that represent some real world object of nearly any scale.They can be generated in a few ways. As geospatial scientists, we mostly work with LAS/LAZ data collected by aerial LiDAR (light detection and ranging) scanners at varying scales, from landscapes, down to project sites. We may also derive point clouds from highly detailed orthoimagery of an area, such as from the products of a drone flight.
As a data scientist, you don’t just go in and solve problems. You make recommendations to multi-faceted issues so that you get a fantastic model in the end. You’ll also be advocating a better use and understanding of the data while you do that.