Dr. Markus Müller is the lead data science engineer at UP42. They develop algorithms, such as super-resolution — an intelligent way of upsampling images.
Markus studied geography and basic remote sensing at university. Only when he worked in Indonesia a decade or so later did he use it for development work. He went to Borneo’s most isolated places and gave GIS training on carbon accounting.
Suddenly, remote sensing was exciting and he was happy to learn more. He went on to work with forest classification, carbon accounting, and training others on the topics.
Super-resolution originates from the computer vision domain.
The quality of an image is defined by its resolution. Super-resolution gives you a better image by applying an algorithm — to get a higher resolved image.
It’s like upsampling, just smarter.
It’s done for two reasons.
First, the images become sharper and visually more pleasing — people can identify objects better.
Second, algorithms do better object detection when they process super-resolved images.
Based on what’s been happening in the last decade with deep learning and computer vision algorithms, there’s a strong connection.
Modern computer vision algorithms use deep learning or ConvNets (Convolutional Neural Networks). These use the same principles as the human visual system.
What a human can perceive in an image and what modern algorithms can do is similar.
When you develop a deep learning algorithm, and you want to know if it performs well enough, you take what a human can do as your baseline. Humans are good at interpreting images, so if you can get past that threshold, you’ve got an excellent algorithm.
For now, the human cognition system is still the reference.
A single-image super-resolution is an image you’ve upsampled in an intelligent way and ended up with a sharper, more advanced image.
You train a neural network with images, including low-resolution and high-resolution images—the neural network figures out how to do the smart upsampling.
A multiple-image super-resolution is several images, like a video, for example.
These images had a slightly different angle when recorded, but you can still come to a higher resolved image.
Multiband-image super-resolution is like pansharpening (as in panchromatic “PAN” sharpening).
You take a satellite image with different resolutions, resulting from sensors with an RGB, a near-infrared band, and a panchromatic band.
Using the panchromatic band which has a higher spatial resolution, you can inform the upsampling process of the lower resource bands to get a higher resolution version.
This is ideal for Sentinel-2 images with images and bands in two resolutions (10 meters, 20 meters, or 60 meters). Using a super-resolution algorithm, you can re-sample these bands and end up with all of them being at 10 meters.
This is one of the first algorithms that we implemented. We did not develop it ourselves; we used something already published as an open-source.
Let’s think of a spot image with a panchromatic band, four times the other bands’ resolution.
Pansharpening applies the same resolution of the panchromatic band so that all the bands end up with the same, higher resolution — which is the limit.
The panchromatic band informs the upsampling.
Now let’s think of a Sentinel-2 image and apply the super-resolution algorithm. It’d be challenging to upsample that many bands to the same resolution. Plus, Sentinel-2 doesn’t have a panchromatic band.
With pansharpening alone, you can only get up to the resolution of the highest given band.
It’s worth applying super-resolution to see if you can get to a higher resolution for these cases.
In the past, we’ve worked with Pleiades imagery, which was a 50-centimeter resolution after pansharpening. After giving it the super-resolution treatment, we end up with a resolution of 12.5 centimeters.
That’s four times the resolution of the original image.
There’s that, but let’s keep things real.
The super-resolution algorithms result in a better-resolved image. It looks better and the objects are sharper.
But in quality, it’ll never be comparable with an image recorded at a higher GSD.
Suppose an image was originally recorded at 0.5-meter resolution. In that case, it’ll always be sharper and you’ll always find more detail than in an image upsampled from 2 meters.
The underlying assumption of the super-resolution algorithms is that they’re scale-independent.
If you develop an algorithm that resamples 2 meters to 0.5 meters, you’d assume the same upsampling will work from 0.5 meter to 12.5 centimeters or on an old LANDSAT image from 60 meters to 15 meters.
It’s possible, but it can never be the same quality as if you had an image with a higher GSD
When you went to university, the idea behind ground-truthing was to know if your model or analysis reflects what’s happening in the real world.
You went to the place, maybe even flew there, and looked at what’s happening on the ground. You examined it and you came up with an accuracy.
That’s not how it’s in the modern world of deep learning and computer vision.
We look at accuracy, values, and so on and compare them with other data sets.
But who makes these other data sets?
Humans, by looking at the images. The human visual system and what humans can do is still the reference for what is achievable.
Ground-truthing is often problematic because you rarely have an image at a high resolution you’d want.
The classic approach for this situation in computer vision, where most of these algorithms come from, is to create an image, downsample it to a lower resolution, then apply a super-resolution algorithm.
Then you compare this to your original image. The scale independence lets you go back even more in resolution, sum it up again, and compare it using a matrix.
Yes, there is a possibility of that.
Especially if all you do is create training data sets and apply a straightforward mathematical operation to downsample. The neural networks just figure out how to invert their downsampling method — mostly cubic re-sampling.
That’s why super-resolution is something different.
You can take a single image without downsampling or upsampling it. Use it at the original resolution at 2 meters and pansharpen it.
You get a new image at a 50-centimeter resolution. Then you train the algorithm with these paths.
The reasoning behind this mechanism is that you don’t apply a straightforward mathematical operation on all of your pixels, but you have a genuine upsampling.
If then your algorithm figures out how to do that, it learns out of the context how to create a higher resolution image — at least that’s the hypothesis around the model.
I have a high-resolution image I created through the pansharpening process.
This is the result I want.
I take a super-resolution algorithm and apply it to the images before they were pansharpened and say, “This is what I have, and this is what I want. Please figure out how to make the connection between the two.”
Is that what happens?
And then, you apply your metrics.
It’s important to visually interpret your results and apply quality metrics, which show if you got to something better than just upsampling.
The application of the algorithm is for going one scale higher. You trained the algorithm with image pairs of 2-meter and 50-centimeter resolutions, but after the application, you apply it on the pansharpened images at 50-centimeter resolution — ending up with a 12.5-centimeter resolution.
The goal of super-resolution is to go beyond what you could achieve with pan sharpening. You test it against the pansharpened image.
The combination of pansharpening and the super-resolution algorithm makes this so powerful.
Let’s say I have high-resolution satellite imagery of a particular area of 25 centimeters. I’ve got a stack of images, or a single image from a different sensor, over the same geographic area.
Can I then say, “I want an extremely high-resolution image like this. Here’s my coarser-resolution image. Algorithm, please figure out how I can get my coarse resolution over to the high resolution?”
Yes, you could do that. In theory, that would be an ideal case.
I say in theory because almost no studies are doing this. Very few people have these image paths because they need to be from the same area and be taken from exactly the same time.
There are almost no training data sets for doing something like that.
Again, in theory, yes. I haven’t read any papers so far doing something like that, though.
The nature of speckle might pose a few problems there. But I’d still go for theoretically, yes, you could do that.
It would probably be wiser to do data fusion for such a use case — to combine the best of two or more sensors.
Perhaps your sensor only has two bands with high temporal resolution. You might want to combine it with another sensor with a better spectral resolution, more bands, and so on. You could try developing an algorithm to fuse these two together, so you have a high revisit, but also plenty of information in your data set.
A super-resolution algorithm is suited for some things, but not others.
There are two primary use cases, number one being simply creating sharper images for the visual observer.
Even in the modern age of artificial intelligence, it’s still taking too many people to interpret satellite images and aerial imagery manually. I’m talking about intelligence agencies, for example, employing teams of people interpreting satellite and aerial images, looking for specific objects.
A super-resolution algorithm and a super-resolved image can help because it aids the human to better identify objects and find them easier.
Number two is essentially the same — creating sharper images, only for algorithms.
Computer vision algorithms that detect objects mostly pick up on the shapes of objects — and to some degree, the intensity and texture. If an algorithm can create a clearer, more defined, shaped object, it makes work for the algorithms easier.
This has been proven, and so has the fact that super-resolution might also help with classification or segmentation problems.
Yes, it’s just the way things developed in the last few years in the deep neural network and ConvNets space.
The first breakthrough came after these competitions, where researchers tried to find cats in images.
Then, about ten years ago, suddenly, these competitions were won by deep learning algorithms. That’s where it all started, and then it took a while until the Earth observation community picked up on those algorithms.
First, people in the Earth observation domain wanted to apply them, but its algorithms were tied to three-band natural images, which also have a lower Bit depth than what we are used to in remote sensing.
People used to throw away the other bands; they took the RGB images, downsampled them to 8-Bit so the algorithm would work and applied it.
That was the starting point of the love affair between computer vision and remote sensing. Over time, Earth observation developed its scientific research for applying deep learning algorithms to remote sensing problems.
New algorithmic developments are still coming out of the computer vision domain, but there’s also a significant development in the Earth observation domain, where these algorithms apply to problems specific to multi-band images, SAR imagery, and so on.
Not really. It’s more the other way around.
It’s a convergence.
If you go to the big computer vision conferences, often there’s a track for observation. People working on observation problems also attend these conferences and talk about what they’re doing.
Earth observation is more and more accepted as one of the application domains, like medical imaging, which is already integrated. The core technology is the same — computer vision based on deep learning, and there are several application domains.
These communities and domains grow together. When we published our super-resolution paper at UP42, we followed what most computer vision people do and published a preprint on archive because that’s what the computer vision domain’s practices are.
I’m not sure I ever had such a title, even though I have an extensive background and a Ph.D. in it, and I work with the data inside that domain.
I’ve been called a development advisor, environmental informatics specialist…
Then data science came about — a technical convergence. The tools we use are the same tools used at Facebook for making recommendations. We can switch between these domains with relative ease.
There is a strong community developing these tools with Python, TensorFlow, and SciKit.
I wonder the same.
I can’t really say — data science is still a young discipline. These generic titles that cover many things were the starting point.
We already see specializations in data science. We talk of data scientists, machine learning engineers, and deep learning engineers — their sub-domains are developing.
Some do already, but it’s a personal decision at the moment. Most job descriptions still say data scientists, and you must read on to see what domain the position is in.
Apply at scale? Yes.
With some hype.
A bit of both — and that’s normal in business.
Research-led literature has shown us that there is value in super-resolution. When you have a problem, you’ll need to decide on a case-by-case basis if there is value in it for you.
The Sentinel-2 super-resolution algorithm has immediate value in it — I would use it myself right away.
You want to do an analysis. You’ve got some Sentinel-2 images with different less-than-ideal resolutions. You upsample them in a smart way, and you can process them.
For super-resolution of Pleiades images, I’m more cautious.
The upsampling might help your algorithm identify objects, but it comes at the cost of a higher amount of data; more pixels need more processing power to analyze them. That’s one of the limiting factors of deep learning and affects the processing power and the cost.
If you take an image and you upsample it with a super-resolution algorithm, you end up with 16 times the amount of pixels. You need to invest 16 times the amount of compute power if you want to develop your model.
Anyone considering developing a new model needs to consider if the advantages are worth the additional cost.
It’s hard to predict.
Every other week, a new startup flies another satellite. We’ll have more and more data available. There are some use cases where service solutions can clearly help, but we’ll have to see how things develop on how powerful and how much you’ll use it in the future.
Algorithms are also difficult to predict. The developments in the artificial intelligence community over the last five years have been amazing.
We don’t know what we’ll be doing in the next five years.
It applies… but the algorithms are also getting hungrier. The neural networks over the last few years became deeper and deeper.
It’s hard to predict.
Where is deep learning going, anyway? There’s talk of non-supervised or semi-supervised learning, which would bring on a whole new era for deep learning, and then things might change entirely.
Yes, I believe so.
The exciting applications of super-resolution are, for example, for upsampling movies.
If you have a movie that’s not well resolved, you can apply an algorithm to it, andvoila. It’s much sharper.
Plus, it’s cool. People like it.
For Earth observation, I’m not sure the same excitement applies.
The Sentinel-2 pansharpening and super-resolution have value, and with it, we’ll see algorithmic improvements.
But that’s it.
For multi-image super-resolution, there are already problems.
Multiple images are not taken at the same time. If you’re resolving several images, not taken at the same time, it means they may contain moving objects, and the algorithm will get rid of them, as it would do with a car. After the super-resolution treatment, it won’t be visible anymore in your image.
There is some value in this for the observation domain, but it’s not a huge topic for the future like other topics might be.
The fusion of data sets is going to be significant.
Segmentation for land cover classification is also already decent. With the relevant segmentation algorithms and good training data, you get excellent results.
The algorithms that we have right now are decent at detecting objects and they’ll get even better.
The exciting applications will be for data fusion, such as how to combine SAR with optical imagery, or imagery with other data sources like Twitter feeds and other social media data, or from tracking telecommunication devices.
I hope you understand a bit more about super-resolution and the origins of computer vision in Earth observation — where we are today and where we might hit in the future.
I love talking to people like Markus; it’s so clear that he’s not a hype-man.
He’s not here to sell us anything.
He’s not here to convince us of anything.
He’s simply presenting the facts.
And I’m sure he would take it as a compliment. I refer to him as a scientist. He’s someone who really believes in the scientific method and scientific rigor.
UP42 offers a free trial. If you go to UP42.com/pricing you’ll find there a free tier you can try. It’s so worth experimenting with these algorithms that Markus and his team are building.
Plus, you can build your own.
Be sure to subscribe to our podcast for weekly episodes that connect the geospatial community.
For more exclusive content, join our email. No spam! Just insightful content about the geospatial industry.
Crafting a quality application for a job you really want takes time, so you do not want to spread yourself too thin. When constructing your CV, it is important to keep your audience in mind. Realistically, the first set of eyes will likely be a computer algorithm, scraping the submitted CVs for certain keywords.