Machine Learning in QGIS for satellite Image Classification
From this satellite image, we want to create a land use land cover map by extracting various land use land cover classes such as built-up, vegetation(forest and grassland), water, and bare land. To achieve this, there are several machine learning approaches:
- Supervised Machine Learning.
- Unsupervised Machine Learning.
If all these things sound new to you or you would like to refresh your understanding of the concepts here are two resources on urban land cover and image classification respectively:
Machine learning is a field with many applications, and image classification is one of the common areas that use machine learning. Machine learning is an expansive subject of study. However, the main aim of this article is to show you its capabilities and get you excited to explore it on your own.
Image classification using machine learning algorithms can be achieved through the following sequential steps:
- Training Data
- Generate Signature data
- Apply classification algorithm
- Accuracy assessment
The process involved varies slightly depending on whether you are doing supervised or unsupervised machine learning. Additionally, when working on software such as QGIS through a plugin or a library, most of the steps are abstracted from the user. Under the hood, there are various processes implemented by the software.
Image classification in Qgis.
We are going to use the Semi-Automatic Classification plugin, a plugin that provides tools for machine learning and digital image analysis. Semi-Automatic Classification Plugin (SCP) allows for the classification of remote sensing images, providing tools for the download, preprocessing, and postprocessing of images. To open and check all the available tools that come with the Semi-Automatic Classification go to the plugins menu on the topbar from there download and install the plugin.
After installing the Semi-Automatic Classification Plugin an SCP menu will be added to the top toolbar. If you want to carry out some preprocessing on your image click on the SCP > Band Set which will open a dialog box with various tools to do both pre-classification and post-classification operations. If your image has noise do correction before proceeding to the next stage.
Unsupervised Machine learning in Qgis.
Unsupervised classification clusters pixels in a dataset based on statistics only, without requiring you to define training classes.
Image clustering using K-means Nearest Neighbor Machine learning algorithm.
When doing clustering, you need to manually assign class names to the clusters based on your knowledge of the area. K-means is an iterative algorithm that groups pixels into a predefined non-overlapping number of clusters with each data point belonging only to one cluster. A pixel is assigned to a cluster based on the squared distance between the data points and the cluster’s centroid. K-means is mainly used as an exploratory algorithm for getting an intuition of how the data is structured rather than for actual classification.
Supervised Machine learning in Qgis.
Supervised classification clusters pixels in a dataset into classes based on user-defined training data
Training Samples selection and labeling in Qgis.
How many distinct classes can you have from your satellite image? This is determined by factors such as the spatial resolution of your imagery, your use case as well as the homogeneity of your area of study among other factors. For simplicity, we will have five classes: forest(thick vegetation), built-up, grassland, bare land, and water.
On the SCP Dock click on Training input to open a dialog from which we will create training samples for our classification. From the screenshot above you can see that you have to enter the main class id(MICD) and class ID(CID). SCP allows you to create sub-classes within a class. For instance, for the built-up class, we can create informal settlements and formal settlements as subclasses. Create a .scp signature file which we will use to store our training signatures using the create new training input button at the top of the SCP & dock menu.
After entering the class name and class id use create ROI (Region Of Interest) Polygon tool to select training samples from the image. To save the selected training samples use the save temporary ROI to training input at the bottom of the SCP & dock menu. Repeat this for all the classes to create training signatures.
You can view the plot of the reflectance values of the different training sample classes using the SCPs Spectral Signature Plot. Additionally, to check the quality of your training samples you can see a preview of the classification output with the option provided as classification preview. To do this, click on the plus sign, and on the image, click on the area where you want to see the preview. If you feel the preview is a good representation of the land cover on the ground go ahead and run the classification algorithm. Otherwise, you can go back and select new training samples.
Maximum likelihood Machine learning algorithm.
The maximum likelihood algorithm assigns a pixel to a particular class based on its probability to belong to that class. The minimum probability is specified and if the highest probability is smaller than the threshold you specify, the pixel remains unclassified.
After creating enough training samples click on SCP at the top of the QGIS taskbar and on the dropdown menu select band processing and then select classification to open up the image classification dialog menu. Apart from the maximum likelihood algorithm, you can also use minimum distance and spectral angle mapping algorithms. Both these algorithms come bundled in the plugin.
Select Whether to use the Micro class ID or the Class ID for classification. Also, try out several threshold values to see which one returns good results. After providing all the parameters to be used in your algorithms click on run and the program will prompt you to provide a path to save your classification results. Check your results and if you are not satisfied with them feel free to tweak your parameters as well as your training samples. A good understanding of your area of interest will go a long way when judging the quality of your classification!
After classification, you need to check the integrity of your results and ensure that your final output is reliable. Accuracy assessment is done by comparing our classification results with data from a source with higher accuracy or ground truth data. A common way to carry out accuracy assessment is to take random samples from ground truth data or a data source with higher accuracy and then compare the samples to the classification results in a confusion matrix.