Our guest today is Ron Hagensieker Ph.D., the founder, and CEO of OSIR.IO, an artificial intelligence and earth observation company. Ron has had an interest in remote sensing in the beginning, starting with his BA in geography, all the way to his PhD in remote sensing. All along the way, he has found ways to incorporate machine learning. One project, in particular, was thiscitydoesnotexist.com, a site that uses AI to randomly generate a false Landsat-style image of, you guessed it, a city that does not actually exist.
How is Fake Aerial Imagery Created?
In order to create fake imagery, first, you must obtain a great deal of real imagery. These images will be used to train your neural network to recognize the unique landscape characteristics that make a city a city. The machine learning models learn to identify and recreate the patterns we associate with small-scale views of cities (i.e Landsat or Sentinel imagery) mimicking the transitions from city center, to suburbs, to farmlands. They can even be trained to create fake DEMs.
ThisCityDoesNotExist.com’s algorithm actually involves creating two competing machine learning algorithms, the generator, and the discriminator.
The generator algorithm generates images, attempting to fool the discriminator model which dubs each image as a ‘real’ or ‘fake’ city.
Based on the responses of the discriminator model, the generator will adapt its next image, and try again. In programming, this technique is called a Generative adversarial network (GAN). GANs are especially powerful algorithms, and can even be pit against other algorithms to expose their weaknesses.
The positive feedback loop the generation and discrimination process creates can hypothetically go on forever, but will ultimately be limited by computer resources, or be influenced by the finite number of images originally used to train the model.
These algorithms use unsupervised classification, which is when the computer groups like pixels into a prescribed number of classes.
Unsupervised classification distinctly does not require drawing bounding boxes around features, and therefore may be faster to prepare than supervised methods.
It is possible to add some influence to the process with supervised classification. This involves inputting a dataset of city center points, telling the model “here, these are cities”, potentially improving results.
Why Create Fake Imagery?
In a data rich world with so much access to all varieties of raster data, why would we want to add fake imagery into the mix? Well, the truth is we aren’t quite sure yet. When plenty of real data exists, there is little desire to inject energy into simulating data. While there are no especially compelling uses for large amounts of fake imagery, doctored imagery is a different story.
There may be military applications for faking parts of images, but not necessarily whole landscapes. This could be identifying and disguising sensitive military operations in satellite imagery, or maybe even simulating the result of damage to landscape elements.
Randomly generated AI landscapes don’t have much use on their own, but once you add more channels to the data (Elevation models, population density, etc), and start adding constraints, things can get really interesting.
The trick here is the channels must be coregistered to be meaningful. Coregistering means that the images have all been lined up with each other spatially.
By combining information from many channels, you can create a living modeling environment. It gets even more exciting if you can pull off adding time. This allows predictive simulation of urban planning scenarios, deforestation patterns, and even of events resulting from climate change. By simulating different population counts and densities, we can begin to identify common patterns in how people grow into their cities.
Fun Fact: If you drop oat flakes into slime mold at the positions of train stations in Tokyo, the resulting fungal network grows to look just like the actual Tokyo Rail System.
The reason landscape modeling does not qualify as a compelling application of fake imagery is that it is incredibly costly in terms of computing resources. Without adequate financial backing, it does not make sense to create a program like this. For now, coders are just having fun without funding.
What Are the Risks of Fake Satellite Imagery?
The most obvious risk associated with creation of imagery from machine learning models is the risk of people using this technology maliciously. Deep faking images may allow you to represent extreme scenarios on a landscape. One popular one is predicting the same landscape in all four seasons, but you could also simulate flooding, or wildfires.
Luckily there is not much genuine risk of misinformation (amongst free media at least) about events as these things are so easily verifiable.
If someone actually takes the time to simulate media coverage of a catastrophic event somewhere using deep-faked images, fact-checkers can verify from half a dozen public imagery sources if the event actually occurred (or even simply just call someone there).
So if you come across a fake image in the wild, how would you be able to tell it was fake? Well, one characteristic is that fake imagery often looks downscaled. Depending on the base logic in the algorithm used, sometimes you can pick out a fake image by the lack of truly straight road segments. This also applies to the lack of natural angles on building corners, or the presence of especially blurry building footprints.
Characteristics like haze or odd colors in areas are not reliable tells of fake imagery as these elements are normal in most real imagery as well due to atmospheric effects.
Overall, fake imagery has the potential to be very powerful, but in the current technical landscape, there is not much need for it. Due to the high skill, time, hardware, and financial investment necessary, it is unlikely we will see too much development growth in this area. The one thing we have learned from artificial intelligence, however, is to expect surprises.