The Difference between Vector and Raster Data in GIS
In this article, we will cover the fundamental differences between raster and vector data.
Geospatial data can be represented using either vector data type or raster. The two data types are very different in their internal representation, the operations you can do on them as well as their look and feel. The figures below show a representation of the same geographical location.
The first map is composed of vector layers representing different features put together to form a map with labels. The second one is a raster satellite image representation of the same area with additional labels.
If there is anything the maps above should tell you is that both data types do a good job of representing geographical data.
What is the Difference Between Vector and Raster Data Formats?
Vector data represent features using geometry(Geometry according to the oxford dictionary is a shape and relative arrangement of parts of something). Two or more vertexes representing a specific position in space using coordinates are combined to form geometry.
A vertex existing alone represents a point, while two or several vertexes connected form either a polyline or a polygon feature. A line is formed by two vertexes connected. Polylines are formed where the last vertex and the first vertex are not occupying the same location. A polyline consists of several lines connected.
On the other hand, a polygon is formed if the last vertex and the first vertex occupy the same location.
Raster data on the other hand represents a phenomenon as a collection of pixels in rows and columns. Each cell stores a value representing the information being portrayed by the raster data.
The cell value can either be a positive or a negative integer, a floating point value, or a NoData value to represent the absence of information. Depending on the phenomenon being represented the cell value can either represent the central value or represent the entire area covered by the cell.
Spatial Processes for Raster and Vector
Typical Raster Spatial Processes
These are some examples of the operations that are best suited to raster data.
- Neighborhood operations: These operations involve analyzing the values of cells in the vicinity of a particular cell. Examples include calculating the mean, median, or standard deviation of the cells within a specified neighborhood, or identifying the cells that fall within a particular range of values.
- Reclassification: This operation involves reassigning new values to cells based on their original values. For example, you could reclassify a raster layer of land use data, where each cell has a value that corresponds to a specific land use type, to create a new layer where cells with the same land use type have the same new value.
- Slope and aspect calculations: These operations involve calculating the slope and aspect (direction of maximum downhill slope) of a raster surface, such as a digital elevation model (DEM). These calculations are often used in fields such as hydrology and civil engineering.
- Surface analysis: This operation involves analyzing the surface characteristics of a raster dataset, such as identifying the local maxima or minima, or detecting the presence of specific patterns on a surface.
Typical Vector Spatial Processes
These are some examples of the operations that are best suited to Vector data.
- Topological operations: These operations involve analyzing the relationships between different vector features, such as determining if one feature is contained within another or if two features share a common boundary. Examples include buffering, intersection, union, difference and symmetrical difference.
- Network analysis: Vector data can be used to model a transportation or utility network, such as roads or pipelines, and then perform analysis on that network, such as finding the shortest route between two points or identifying which parts of the network are most vulnerable to failure.
- Overlay operations: Vector data can be overlaid with other data, such as raster data, to combine the information. For example, you can overlay a vector layer of property boundaries with a raster layer of land use data to determine the land use within each property.
- Geometry operations: These operations involve analyzing and modifying the shape of vector features. For example, you can measure the length of a line, the area of a polygon, or calculate the centroid of a feature.
- Geographic feature manipulation: vector data can be used to create, edit, and update geographic features. This could include creating new points, lines or polygons, or splitting, merging or deleting existing features.
- Select by attribute and Select by location: Vector data can be queried based on the attribute information and spatial location. For example, you can select all the parcels in a city that have a certain value in a specific attribute column, or select all the points within a certain distance from a given point.
Types of Vector Data.
Point: A single x,y coordinate that represents a specific location, such as a city or a building.
Whether a feature is to be represented as a point is largely dependent on scale, type of feature as well as convenience. For instance, on a small-scale map a town can be represented as a point. However, on a large-scale map, it is represented as a polygon. Information about the represented features is stored as attributes in an attribute table.
MultiPoint: A collection of Point geometries.
A typical use case for storing point data as a MultiPoint geometry would be when you have a collection of individual points that are related to one another in some way. For example, if you are working with a dataset that contains information about the location of trees in a forest, a MultiPoint geometry could be used to represent a cluster of trees that are growing close together.
Because a MultiPoint geometry can store multiple points in a single feature, it can be more efficient to work with than a collection of individual point features, especially if you need to perform spatial analysis or make map visualizations that involve multiple points at once.
PolyLine: A set of connected x,y coordinates that represent a linear feature, such as a road or a river.
Polylines are used to represent linear features such as rivers and roads. The information relating to the features represented is stored as attributes.
MultiLine: A collection of Line geometries.
A typical use case for storing polylines as a MultiLine geometry would be when you have a collection of individual polylines that are related to one another in some way. For example, you might use a MultiLine geometry to store the locations of multiple road segments that form a route, or multiple power lines that are part of a power grid.
Just as MultiPoint, using MultiLine geometry can be more efficient when working with large number of lines and performing spatial analysis or map visualization that need to consider multiple lines at once.
It’s important to note that while MultiLine geometry can be used to represent multiple individual lines, it’s important that they don’t have any intersections or gaps between them and it’s also limited to 2D. If you need to represent more complex or three-dimensional lines, you might consider using other geometry types like MultiCurve or MultiLineString.
Polygon: A closed set of x,y coordinates that define an area, such as a forest or a lake
Polygons represent enclosed features such as administrative boundaries and water bodies. Just like polylines, points information about the features represented by polygons is stored as attributes. Additionally, there are topology rules that are specific to polygons such as sharing of boundaries by two adjacent features among others.
MultiPolygon: A collection of Polygon geometries.
A typical use case for storing polygons as a MultiPolygon geometry would be when you have a collection of individual polygons that are related to one another in some way. For example, you might use a MultiPolygon geometry to store the locations of multiple parcels of land that form a larger piece of property or to store the locations of multiple islands that belong to the same archipelago.
Another use case for MultiPolygon could be when you have multiple polygons that are split by some other feature, such as a river, and you want to keep them in the same feature. For example, a wetland area is split by a stream and you want to keep track of it as one feature.
Also in fields like urban planning and land use, MultiPolygon can be used to represent multiple land-use types like residential, commercial, agricultural, etc.
Just like MultiPoints and MultiLines, MultiPolygons geometries can be more efficient when working with a large number of polygons and performing spatial analysis or map visualization that needs to evaluate multiple polygons at once.
It’s also important to note that, MultiPolygon geometry can be used to represent polygons with holes. Each polygon in the MultiPolygon geometry can have one or more internal rings that represent holes in the polygon. This can be useful for representing complex shapes like islands with lakes or shapes with multiple smaller polygons within them.
Geometry Collection: A collection of different geometry types.
A typical use case for using a Geometry Collection would be when you have a collection of different types of geometries that are related to one another in some way and you want to store them together in a single feature.
One use case for a Geometry Collection could be to store all the different types of geometries that make up a complex feature. For example, you could use a Geometry Collection to store the locations of a building, including its Point location, the Line locations of its walls, and the Polygon locations of its roof and various rooms.
Another use case could be a collection of geometries that have been derived from a single source, such as a set of building footprints, building heights, and address points all captured during a single survey.
Geometry collection can also be useful in fields such as cadastral mapping where it can be used to store various types of geometries such as points,lines and polygons related to a single parcel.
It’s worth noting that while Geometry collection can be used to store multiple different types of geometries in a single feature, it can make analysis and visualizations more complex since it requires the user to differentiate between the different geometries and select the appropriate one for analysis. Therefore, it’s usually recommended to use a specific geometry type if possible.
Types of Raster Data.
Raster Data can be either discrete or continuous.
Discrete Raster Data.
You will sometimes find discrete raster data being referred to as thematic or classified, discontinuous, or categorical raster data. Discrete raster data represents features with discrete(defined) boundaries e.g., buildings, parcels, land use and dams.
Continuous Raster Data.
Continuous (surface, non-discrete, or field) Raster Data is used to represent a phenomenon that continuously changes over a geospatial space or phenomenon where each location on the surface is a measure of its relationship from an emitting source e.g., Digital Elevation Model, soil, and noise. Such a phenomenon does not have a discrete boundary.
For more details on Raster data, check out this article on Fundamentals of Rasters and Imagery
Digitizing Raster Data in QGIS
To create vector data from raster format you can digitize the individual features. This is the process of drawing the individual features in a desktop application.
To do this in QGIS create a new shapefile using the new shapefile layer tool.
After creating the new shapefile with an appropriate file name and providing attribute information activate toggle edit mode by clicking on the layer loaded on QGIS. This will take you to edit mode.
Depending on the type of vector data you want to digitize you can either click the add line feature, add point feature or add polygon feature menu to start drawing. For instance, to digitize roads you would click on add new line feature and then start tracing the road from the raster data from the QGIS canvas.
Apart from digitizing from Raster data, you can also convert Raster to Vector using the Raster to Vector tool.
To open the Raster to Vector tool dialogue click on Raster ► Conversion ► Polygonize (Raster to Vector).
In the dialogue specify the input raster layer and output file where your vector layer will be saved. You can also provide optional input depending on your data requirements.
When digitizing you should look out for vector-associated errors such as overshoot, undershoot, and slivers.
Slivers occur when two adjacent polygons do not meet properly resulting in space between them while overshoot and undershoot occur when the line features do not intersect properly and go past the point of intersection and terminate below the point of intersection respectively.
Rasterizing Vector Data in QGIS.
To convert vector data to raster data, you can use the Rasterize tool that comes standard with the QGIS desktop.
To launch the tool, click on Rasterize (Vector to Raster) by navigating to Raster ► Conversion ►Rasterize (Vector to Raster).
Then fill in the provided fields with the correct information.
On clicking the Run button the algorithm will then transform the vector data to raster data using the input you provide.
Since you can always convert between raster and vector data formats it is not a question of one data format over the other but rather a question of which one is best for the given task.
Best could mean,
- The fastest to process
- The highest level of accuracy
- The easiest to work with and incorporate into an existing workflow
- The format that best preserves the original shape of the data
The point here is there is no “one size fits-all”, and no “one data type to rule” them all.
You can solve any GIS problems using either of the two formats as well as work with analysis techniques that are unique to either of the two data formats by doing conversions between the two data formats.