Our guest today is Mallory Dodd, a Solutions Architect at iMerit, a company which specializes in producing labeled training data for clients in the early stages of kicking off their deep learning and machine learning projects. Dodd studied Anthropology at Texas State University, later on supplementing this with a GIS Certificate. She progressed through a few GIS roles before coming to her current one with iMerit, where her job is to educate new clients and to guide them through their experience with her company’s training data labeling services.
What is Training Data?
To back up here a bit, deep learning and machine learning are both types of artificial intelligence (AI). Ultimately, the goal of AI is to help humans solve problems more efficiently. Although they take a hefty time and resource investment to set up in the beginning, a deep learning or machine learning model can replace a variety of manual processes that will save man hours in the long run. If a model is trained well, it is not uncommon for the finished AI model to even surpass human accuracy in some situations.
Training data is a collection of annotated data (images, videos, vector data, etc.) that demonstrate to a deep learning or machine learning model’s algorithm how to repeatedly and consistently extract the information desired. Basically, training data is created by human annotators, and then fed into a model to “teach” it what to look for, and train it to mimic the human annotator’s decision process.
Training data can be very time consuming to build. The more complicated the objective of your model, the more training data it is going to take to train the model. Additionally, if you are hoping to pull multiple data points and add attributes (ie. this is a car + this is a car with 4 doors) from each training data source, this will also add time and complexity, but the payout in the end will potentially be better for what you need.
Types of Annotation for Machine and Deep Learning
Annotation is the human part of the process in generating training data for a model. Annotators will go in and physically mark out the features that they want the model to learn how to identify, and maybe add additional tags to help describe the image if needed. There are different types of annotation and labeling, and some are better matched to certain use cases than others.
The simplest form of annotation is simple classification of images. These are binary decisions about an image. Is it a dog or a cat? Is it day or night? Is the target present or not? This kind of training data can be generated quickly, but does not produce very detailed or informative outputs when compared to other annotation methods.
The coarsest method of marked annotation is the bounding box. Annotators will mark out the lower left and upper right bounds of the feature they are highlighting, and the generalized box represents the feature. It is even possible to create 3D bounding boxes, called cuboids, to use point clouds as training data. This method is great for uses like object tracking, as there are a lot of changes in movement going on that would be difficult to precisely mark, and an increased level of granularity would not add much to the output.
A step up from the bounding box is using polygons, which is basically digitizing the target feature. These allow the annotator to more precisely delineate the feature’s extent, and collect more specific information. For example, collecting polygons of cars, this could allow you to help train the model to more accurately identify if the car’s doors are opened, or closed. Of course, this method takes more time for the annotator than the bounding box, but the additional information and improved accuracy may be important for your use case. Other vector collections can be taken, such as points to track facial movements, or lines to track and predict routes, but polygons are the most common of the big three data types used here.
As far as raster training data methods, we have semantic segmentation, instance segmentation, and panoptic segmentation to choose from. Semantic segmentation is the practice of marking all of the pixels of the desired object as the “correct answer”. Instance segmentation is very similar, but adds the extra level of assigning a unique identity to each feature. For example, with semantic segmentation, we would have data tagged “car, car, car”, whereas with instance segmentation we would have “car 1, car 2, car 3”. Panoptic segmentation is the practice of each pixel being marked as something. Instead of just giving an identity to the target feature, we would also tag pixels of the background, sky, buildings, etc.
Raster training data classification methods are generally the most resource consuming to create, but they are known to produce the most accurate and precise products in the resulting model. Considering the potential payoff, this is definitely appealing, but there are some things to take into consideration if you want reliable and consistent results.
Creating and Maintaining Quality Training Data
Your algorithm will only ever be as good as your training data. This is why it is vital to understand what makes for high quality training data before getting deeper into the process.
The key element to getting expected results with your final model, is to use training data that is as close to the data the final model will be run with as possible. This is known as ground truthing. If you will be using the model with 3 band rasters and 1x1m cell sizes, your training data should be of the same type. Differences in resolution can cause issues in your model as it is now seeing the target in a different context than how it was trained too, resulting in loss of quality in the results, if it works at all. Some other differences to consider are the types of sensor used, the angle the target is viewed from, and lighting and weather conditions
Let’s say you have gone through the whole process of creating a machine learning model to identify cars in an image. You have annotated your training data, built and trained the model, and are getting the results you expected, but now your organization’s needs have changed and they need to know what color each car is as well. Do you need to start the model building process over from scratch? Thankfully, no, you don’t.
If you have a working model, you can choose to go through and update its training data to adjust the model to your new use case. In our car example, the annotator can go through and tag each of the training samples with just the color, as the car has already been delineated. Another option is to use the outputs from the previously created model to train your new one. If your model is giving you image chips of cars, then you are already most of the way there, and can simply classify the car colors using those outputs, and plug them into your next model.
A logical question is to wonder if the human role in this process can be removed. That is unlikely, as introducing human logic and thinking into the system is necessary in order to make sure the end product is meaningful to humans. When computers are left to their own devices they will take short cuts and make interpretations that make no sense to people, and can render the output useless. Keeping humans in the loop encourages transparency throughout development of the model, and ultimately results in a better product.
Artificial intelligence is, of course, still a young field. 10 years from now the landscape will have changed, and things we thought impossible we may have just failed to consider possible. As the technology finds its way into more markets, new use cases will develop, and with them, new ideas.