Introducing Google Earth Engine
Qiusheng Wu is an assistant professor in the Department of Geography at the University of Tennessee.
His research focuses on GIS, remote sensing and cloud computing. He’s been using Google Earth Engine since 2017 for his research and teaching.
And for the last year, he’s been developing a Python package called geemap, which is widely used by the Google Earth Engine community.
WHAT IS GOOGLE EARTH ENGINE?
Google Earth Engine is a cloud computing platform for scientific analysis and visualization of geospatial data sets.
It is free to use for research, education, and nonprofit.
Why do we need it?
In the past, before Netflix and Amazon Prime, if you wanted to watch a movie, you either went to the movie theater or bought a DVD from the store.
Once you got your DVD home, you needed a DVD-ROM or Blu-ray to read it and watch the video.
Fast forward to today, you can watch a movie using streaming services, like Netflix. All you need is a browser and an internet connection — you can stream movies online without having to worry about a DVD or loading it into a player.
Traditional remote sensing is like your old DVD experience. You go online to various agencies and download the remote sensing data set. You do geoprocessing on it, but you still need professional software, like ArcGIS or ENVI, to load the data set to geoprocess it and get the results.
Google Earth Engine is essentially streaming data. You don’t need to go online to download the data — you just need a browser, and you can access the entire Google Earth Engine data catalog and a bunch of tools to do the analysis and visualization.
Previously, for extracting features from the imagery using traditional remote sensing, you needed to do a lot of work.
Using Google Earth Engine, things become simple. With some filters and algorithms, you can do some cool analysis without worrying about data storage and computing power.
Plus, it’s free.
It’s going to save you tons of time not having to download a data set and do pre-processing — most of the data sets in the data catalog are analysis-ready.
The catalog currently has over 35 petabytes of data. 1 petabyte is 1000 terabytes, so you get 35,000 terabytes of data you can access using a browser. It’s a massive timesaver for research and teaching.
IS THE DATA PIXEL DATA?
Pixel data and vector data — a geospatial data set.
Land use, land cover, weather, and climate data.
Also, vector data; polygons and census data for the US. These may be locked-up products — data sets that are not available within the public data catalog. Users upload them to their private accounts.
Every account comes with 250 gigabytes of storage for private or commercial data that only belongs to you. It’s not public until you change it. You can use your private data for deep parallel computing within the Google Earth Engine platform.
HOW DO I GET ACCESS TO IT?
Go to the Google Earth Engine website and sign up using your Gmail account.
If you have a university or .edu email address, the approval might even be instant. If you have a regular Gmail address, it may take a couple of days to get approval because they need to verify your status.
HOW DO I FIND THE DATA THAT I’M INTERESTED IN?
On the Google Earth Engine website, there is a specific section called data sets.
Within a data set, you can source any data you want.
In the data catalog, each data set has a unique ID. Inside each set, there is an image collection — a stack of images or time series images in a hierarchical structure.
For example, “Landsat 8” is a set.
Inside it, you’ll find multiple products — raw data and surface reflectance. The surface reflectance itself is an image collection with many images, just like a folder on your computer. In that folder, you’ll find individual geoTIFFs and they all have a unique ID.
When you filter the images, for example, you want to find images for the US, you say,
“I want a Landsat surface reflectance data set.”
The unique ID for that would be ee.imagecollection(ID). Think of it as a “parent” and apply a filter.
Filter by date, location, or metadata.
Perhaps you want cloud cover less than 10% or 5%. Continue step by step applying the filter. You’d need three or four lines of code before you get the data set you want.
In the past, you’d have to go online and download the data set. Sometimes, if you downloaded the wrong one, the wrong date, or the wrong location, you’d have to start over again.
If something goes wrong in Google Earth Engine, you change the call and get the new results instantly.
It’s going to save you tons of time.
WHAT ABOUT ATMOSPHERIC CORRECTIONS? IS IT ANALYSIS READY?
Landsat and Sentinel, the most common ones, have multiple image collections.
They have the raw data, and you can correct them by yourself — if you need to.
They also have other data products called Top of Atmosphere and Surface Reflectance. Plus various derived data sets, such as Normalized Difference Vegetation Index or Normalized Difference Water Index.
You can use a raw or an analysis-ready data set, depending on what you’re trying to do.
Yes, it has several machine learning algorithms such as the traditional random forest, decision tree, or support vector machine. You probably have access to seven or eight machine learning algorithms you can use for image classification.
There’s also some unsupervised clustering — X means or K means. Exactly like you’d be doing traditionally, using other remote sensing software packages to do classification.
The advantage of Google Earth Engine is that the scale doesn’t matter. Once you figure out your method, you can easily apply it to the entire globe without worrying about the limitation. The process is the same.
Say you’re doing land use or land cover classification for the US. Once you have the Google algorithm design, you can apply it to any other country, continentally or globally.
Change the location or the date. When you do the image collection filter, you can do it by date or location, but the algorithm remains the same. You can run the algorithm for anywhere.
This is a tremendous advantage. You get the results almost instantly without using traditional remote sensing; re-downloading the data set for other locations and repeating the analysis.
Using traditional methods, it can take a couple of days or weeks to get a product. You might end up being late; something goes wrong, or a parameter is not set correctly, and you need to re-run the analysis.
It’ll take you another couple of days or weeks to start again.
Within minutes, in Google Earth Engine, you can change the settings and get the result in a few seconds.
I CAN TEST MY ALGORITHM ON A GLOBAL SCALE? HOW IS THAT POSSIBLE?
Google Earth Engine runs the computation based on the map display, the data display on the map, and the zoom level. It does not run on the native resolution, which, in the case of Landsat, is a 30-meter resolution. It doesn’t run the computation on a 30-meter resolution.
If you look at the globe’s overview, it’s probably computed in kilometers or tens of kilometers. When you zoom in, you get a higher resolution. Keep in mind, the map view also becomes smaller, so we compute small pieces.
Behind the scenes, there are tens of thousands of Google Earth Engine servers doing the computation. They subdivide the data into small pieces and save one piece to each individual server. They run the computation, send the result back, aggregate it, and display it on the browser.
That’s why it’s so efficient, computationally. It doesn’t matter whether we’re doing something on a global or regional scale; the algorithms behind the scene are much the same.
WHAT HAPPENS IF WE INTRODUCE GEOMETRIES TO PARALLEL COMPUTING? IS THERE AN OVERLAP IN THE PIXELS OR WHEN WE DO COMPUTATION AROUND A NEIGHBORHOOD?
Google Earth engine is best suited for doing pixel-based analysis.
You can invoke a neighborhood. Google Earth Engine has functionality for convolution or reducing the neighborhood. Tweak it by three by three, or five by five, or segment by segment.
But you don’t want to go too far away. When a pixel is related to something far away, it’s challenging to do this in parallel, to subdivide into smaller pieces. This is especially true for dealing with vector data.
Vector data, If we have one detailed polygon, It cannot be efficiently used for parallel processing. Google Earth Engine still has capabilities for dealing with vector data, but there are limitations.
If you’re pulling in tons of vertices, it’ll most likely run into memory issues because it’s difficult to subdivide.
Sometimes you need to pull the entire polygon, or all the vertices, into the same server — the memory runs out, and your computation stops.
You need to figure out what’s wrong. There are some best practices to follow to avoid the memory issue.
Google Earth Engine documentation is very comprehensive on best practices and avoiding this computation memory issue.
IS THERE ANY INTEGRATION BETWEEN GOOGLE EARTH ENGINE AND BIGQUERY GIS?
I’m not an expert on BigQuery GIS.
But, behind the scenes of Google Earth Engine, I’d imagine there’s some kind of technology between the two. It’s not officially documented on the Google Earth Engine website; it has no mention of BigQuery GIS.
I’d assume they must somehow integrate, just not yet released.
When you go on Google Earth to view satellite imagery, you see a lot of imagery, but you don’t see clouds unless you view MODI’s data at coarse resolution.
At the city and street level, the images are pretty good. Some of those come from commercial imagery. But there is also locked-up imagery from Landsat satellite sensors — if they’re cloud-free, they’re not real.
Some of those data sets have been cleaned by the Google Earth Engine to create a cloud-free mosaic — Google pulled that data set from the Earth Engine data catalog.
CAN I EXPORT THE RESULTS? CAN I SAVE THE INTERMEDIATE RESULTS FOR LATER?
If you want to save your result, there are a couple ways you can do that.
One way is to save them to your Google Drive.
Another way is to export the results to your Google Earth Engine account, which comes with 250 gigabytes of storage.
You need to understand the differences between different export locations.
If you export data to your Google Drive, you can then download it from your Google Drive to your computer. However, it becomes a regular data set. It’s no longer a cloud data set — it’s not optimized for parallel processing anymore.
Once you export the data to Google Drive, you can no longer use it with your Google Earth Engine script. You’d have to upload the data again to your account — it’ll take some time again to ingest the data set because it needs to be optimized for parallel processing for it to be used in your script.
If you want the end product and you no longer need it for computation, it’s better to export it to your Google Drive.
If you export your data to your Google Earth Engine account and it stays in your account, it remains a cloud data set, and it’s analysis-ready. You can directly pull it into your script to do computation.
If you want to use the intermediate results later for computation, it’s better to export them to your Google Earth Engine account.
Bonus tip: use the geemap package for exporting data (raster, geoTIFF, vector, shapefiles, geoJSON) directly to your local computer without using Google Drive.
CAN I SHARE MY RESULTS WITH OTHERS?
The sharing protocol is similar to other Google products.
You can share a folder with anyone publicly, or you can share a folder with a group.
You can create folders within your Google Earth Engine account, and when you export something, you can share the folder with a Google group.
If you want to make your data set popular, you should submit your data set. If you come up with a fantastic data product, you can submit a data request to the Google Earth Engine forum.
You can report bugs, but you can also submit feature or data requests. If the data gets popular, it can be adopted by the Google Earth Engine team, and it’ll become part of the data catalog.
You’ll get more exposure, and your dataset will be used by tons of people if it’s available in the data catalog.
WHAT’S MISSING FROM IT?
Google Earth engine is excellent for pixel-based analysis.
One analysis I do is hydrological studies. When you have a pixel related to some other pixels far away, it’s sometimes difficult to scale.
That scenario is not a good fit for parallel processing. You’ll still need a desktop computer to do some processing.
If your research involves several steps, some of those might be best suited for Google Earth Engine. However, others might still need a local computer.
If that’s the case, get the intermediate results first, upload them to Google Earth Engine, and use it for the steps best suited for parallel computing.
Google Earth Engine is an excellent tool, but it cannot do everything we want. There are certain limitations in its design.
Still, it’s a huge timesaver. No more downloading data sets from USGS and NASA. Just use the data catalog, quickly visualize the data set, and do filtering.
You can now create so-called Earth Engine apps. A few years ago, we had journal publications, and some people came up with products that way.
Now we have Open repositories where people can upload/download their data sets — still, they’ll need a pro software to view those.
If you developed your algorithm in Google Earth Engine, you can release your end product, save everything on your Google Earth Engine account, and develop a web interface. People get a URL and can visualize your data.
Not only that, they can do a query, click on the map, get the pixel values, or do filtering.
Ideal for when you want to make your data public or accessible to a wider audience because anyone can use a browser to look at it without having to install anything on their computer.
THAT’S A GIANT LEAP FORWARD FOR REPRODUCIBLE RESEARCH
I’ve been using it precisely for those reasons — reproducibility and transparency.
You can make an algorithm, share it using one click, and send the URL to anyone. They open it and run the source code, just like you did, and get the same result. You can release your products, and people can build on top of them.
So much repeat activity is reduced this way: less paperwork or step-by-step trial and error. No reinvention of the wheel — people just take your algorithm, improve, and build on top of it.
Human resources are better used by working on the same baseline. If everyone works on their own version of a data set, the result is fragmented.
Use the centralized data catalog and the products already there. Take deforestation — global forest data products already exist, so do global service data products for the past several decades, such as the service product developed by the European Resource Commission. It’s popular, and many people build on top of their products to improve the algorithm and create an even better product.
We save time by having something to refer to as the baseline, and we keep improving it.
IS THIS A CODING OR DRAG-AND-DROP INTERFACE?
You need to call each function and then put all the functions together to do an operation. But you get comprehensive documentation to help you.
In Google Earth Engine’s Code Editor, you can also access each individual function — there’s a list of snippets or sample scripts you can learn from. Then you can change your settings or the image collection.
On their documentation website, they provide many sample calls — buttons, most of the time. The button opens in the code editor. Click the button, and it opens and loads the call. You can run the source code and see the results.
Undoubtedly, the process has a learning curve. But once you get used to it, it’s intuitive.
You have a lot more control over the data set and the design, and you understand things behind the scene. If you’re using the clean interface, sometimes you don’t know what went wrong and how to fix it.
In Google Earth Engine, you write everything from scratch or build on top of your algorithm from other people’s source code. You can see behind-the-scenes step by step of which line does what. You can go back to revise the script.
THERE’S ALSO A PYTHON API
Google Earth Engine has two components. Computation and visualization.
To cover this gap, I developed the geemap Python package.
I’m also simplifying things, and eventually, the user will click a few buttons to do analysis. They can adjust the parameters using an interactive user interface without having to write a line of code.
My goal is to make it easier for people to use Google Earth Engine. Users with no programming background will be able to use it and do scientific analysis of geospatial data sets.
WILL GOOGLE EARTH ENGINE BECOME A STANDARD TOOL FOR EARTH OBSERVATION SCIENTISTS?
It’s getting popular.
But it’s not open-source. A lot of government agencies are discouraged from using Google Earth Engine.
For people in academia, though, it’s useful. It’s a link in the chain of environmental problems that are getting more and more challenging. Nothing’s going to be solved by an individual person or select group.
People working together can solve major global issues. Google Earth Engine is a fantastic tool because people can work together, build on top of each other’s work, and continue to improve on it.
With the increasing number of satellite sensors, the data sets’ volume is getting much larger than 20 or 30 years ago.
Now that most things can be stored in the cloud, we need efficient ways to access the data and do the computation. Desktop computing cannot handle this. We have a lot more upcoming data sets from NASA and the European Space Agency.
For regular research, the data resolution is getting higher and higher. Landsat is 30 meters and there are many data sets at 3 to 5 meters. Temporal frequency is also much shorter. An individual can’t handle that volume and the computation.
Google Earth Engine is just one example of great tools coming up. I’d imagine there will be similar data products coming from other companies or agencies in the next couple of years. I’m excited to see the development in cloud computing and in the open-source geospatial community.
HOW LONG DOES IT TAKE FOR A LANDSAT IMAGE TO APPEAR IN GOOGLE EARTH ENGINE?
One or two days.
Most of the images are being ingested and there is a mechanism behind the thing. It’s all automated. Once the data is acquired by the agency, they might do something to automate the process. It’s almost instant, especially for weather and climate data sets, and you could use it in a couple hours and access the data set within the Google Earth Engine data catalog.
ARE THERE PAID PRODUCTS ON TOP OF THE OPEN-SOURCE, FREE STUFF?
Not that I’m aware of.
Google Earth Engine itself does not sell data sets. You buy the data set through different vendors and upload it to your account, which you can keep private to do computation.
IS THERE ONE THING THAT YOU’RE MOST EXCITED ABOUT?
My research and teaching have greatly benefited from using it.
My students use commercial software. They need to have a good computer to run that, and that’s not available to everyone.
If we use Google Earth Engine, students need a browser on a laptop or a cell phone, and they have access to the data catalog. It saves me time setting up the environment for them.
For my research, I no longer need to download data sets. I run everything in the cloud using a browser.
I am enjoying building up the geemap package and making it easier for other people to use the Earth Engine — especially those from developing countries, without access to commercial software packages.
They have access to the internet and can use Google Earth Engine, geemap, Jupyter Notebook, or Google Colaboratory. They can do scientific analysis and contribute to solving the global challenges we face right now.
We need to work together. Without a powerful tool, it’s impossible; we cannot process this kind of data set, nor can we store those folders on our computers.
With Google Earth Engine, all we need is a browser.