Geospatial data - What is it?

November 18, 2019 3 min read

Geospatial data - What is it?

Geospatial Data

Geospatial data is data that has a machine readable spatial component to it. There is a common saying in the geospatial industry that 80% of all data has a geospatial component to it but there is no numerical proof that this is actually the case. While it is true that a great deal of data refer to a spot on the earth's surface or can be related to a location that does not mean that it is geospatial data.

If we take a street address as an example. Humans can understand that “redruth 19, in the city of Christchurch” refers to a location on a street in the city of Christchurch and because it refers to a location you could be forgiven for assuming that this automatically means it is geospatial data. The problem is that the street addresses are not machine readable ( not yet anyway ) so while the address refers to a location it does not specify that location in terms that computers can understand. Therefore this data is spatial but not spatially enabled. This means that the technologies used in the geospatial industry can not make sense of it and are unable to position it on the earth. Applications like google maps are able to convert address to geospatial data on the fly through a process called geocoding. Geocoding is the process of translating human readable addresses to machine readable locations. Reverse geocoding describes the process of converting machine readable locations to physical address. 

You will find a more detailed discussion on geocoding here

So now we have established that geospatial data needs to be machine readable and this means it needs to be formatted in structured way so that computers can understand it. 

Geospatial data is often stored as binary files or as text based files. Both have their pros and cons.

Binary - spatial data is often stored as binary objects. This is a way of packing data into a readable file that the software can access. Binary spatial data is often a very efficient way of storing data and enables easy transfer of data between the software that understands how to read and write the binary file types. Storing data as a binary file is also a great way of restricting what software can read and write to your data file and in some cases it's all so a way of locking users into a certain kind of software. The discussion about binary data formats leads very quickly into a conversation about open and proprietary data formats but this is beyond the scope of this post.

Text data - storing data as text can take up sufficiently more storage space but it has the advantage that the data is also readable by humans. This means that you can open a data file and physically read the file in a text editor. This can help you understand what the data is and how it fits together. Some examples of text based geospatial data are .gml,  well known text (WKT), topojson, KML, geojson . Text based geospatial data can also be written directly into code, this means that geospatial data objects don’t have to be loaded as a separate file they can exist in the code. This could be a huge advantage for portability, meaning that a single file can hold the data and the code and makes an application much easier to share or distribute. That said text based geo data types can very quickly become too big to manage and there is no built in indexing available in text based geospatial data files.

 Now that we have talked about how geospatial data is stored perhaps I should mention that geospatial data presents features that have a physical location but how that feature is represented can vary greatly depending on what data type you use. 

Geospatial data can be separated into two main vector data and raster data. Vector data represents features as points, lines or polygons. This are discrete representations of objects, features represented in this way have a clearly defined edges. Representing a house as a vector object would be a perfectly valid choice. But representing a continuous feature, one that does not does not have a hard boundary, like air temperature as a vector might not be the best choice. A raster, or image, might be a better way of showing air temperature. Raster data is made up of cells, pixels, and each pixel can hold a different value so that the gradual change in air temperature can be shown over a geographic area. 

OK, so now we know that geospatial data is data that has a spatial component that is machine readable, it can be binary or text based data format and geodata can represent the world as either vector or raster data formats.