Stace discovered GIS and science when he was completing his undergraduate work in archaeology. For his senior thesis, he did a logistic regression analysis on archaeological site location and environmental variables. Things like distance to permanent water, elevation, slope, and aspect. He knew nothing about geographic information systems or making maps with computers, so he did everything analog. He used USDA soil maps and USGS topo maps, blew those up, and scaled them on acetate to each other so he could mark where sites were and record their various attributes for this analysis.
This was in the mid-90s, and data collection took months. Around that time, he was experimenting with computers and graphics. He knew that might be another way to do his analysis. He discovered that the University of North Texas had a program, so he went up there and started digitizing his data. It took him three weekends to digitize all of his data after he'd been working on it analog for six months. Once the data was digitized, the actual analysis on a mid-90s slow computer took about five minutes.
That was the moment he was sold on using computers for managing, analyzing, and capturing archaeological data. From that point, he worked on creating spatial archaeological databases explicitly for geospatial data collection in the field and working on ways to capture data. He used kite and balloon cameras from the air. Archaeologists are notoriously poor, you see. Everything they do has to be done on a shoestring. They would never have the budget for an aerial imagery flight, but they can afford a kite and a camera.
I started in academic GIS for research and teaching 25 years ago. Back then, it was all about making the data. The geospatial world at that time was, especially in academia, a one platform world. Everything was Esri. It was great at the time; it was magical to do the things that you could do with spatial data, especially the stuff for archaeology.
The problem was we didn't have the data. There wasn't this massive push to create data.
Fast forward 25 years, and we're in a world where there's too much data. Most of the spatial data that we're producing right now will never be looked at or used because there's too much of it.
Earth-observing satellites are creating publicly accessible datasets with massive amounts of data that anyone can go and leverage themselves. It's harrowing to think about how to manage these datasets and come at them with questions. At the same time, we've seen the rise of data centers, parallelized computing, and other methods of scaling our compute power.
The first time I saw that platform was in 2013. It blew me away. I was used to spending hours preparing Landsat and high-resolution imagery that DigitalGlobe and those platforms were creating.
Now all of that processing and management is taken out of the way. People can do science with massive amounts of data we've never had the scale of access to that we have now.
Not at all.
I can't wait to see what we're going to be talking about as the next big thing in ten years. Google Earth Engine is still face melting in its ability to move the processing and the things that are hard about doing remote sensing out of the way and allow people to get to the science. I think it's a mic drop, but then, we get those every ten years.
Google Maps, OpenStreetMap, Humanitarian OpenStreetMap, and the tools that sit on top of those were a mic drop. These platforms have taken the idea of scale and laid it on top of what was a fragmented and localized GIS world.
When I came out of UT Dallas, the GIS world was still primarily people coming out of college, ready to work for municipalities, state GIS and work locally—designing things within their environment. Google Maps, in particular, put everyone at the center of the map. That suddenly opened up opportunities for services that you could provide to people when they were at the center of the map. Without doing that, services like Uber, Lyft, and DoorDash wouldn't work.
It's all about changing the map perspective from a high level, authoritative, administrative functionality to that personalized, individual, human-centric viewpoint that has allowed these services and companies to thrive.
By redesigning things.
My job is to support the use of geospatial technologies, research, and teaching at Stanford. I need to have a broad general knowledge of the spatial data science playing field. Who are the big players at any given time? What are the exciting and innovative technologies that are happening? How might those be leveraged in research?
One thing that's immensely satisfying about my job is that I sit in the Library. My center is part of the Stanford University Library system. It's ideal. There are other types of support there, like people supporting the use of R for statistics, or people supportingPython for programming and managing large datasets and so on. Having this set up in the Library allows that service to be departmentally agnostic.
I'll have a meeting later today with a researcher from the Med School. I'm helping her find nomadic pastoralists using daily imagery from planet.com so she can build randomized public health surveys for this population that's never been surveyed for their public health needs. I'll also have a meeting with some folks who are working on the deep history of Rio de Janeiro. They need their geocoders for the 17th, 18th, 19th, and 20th century Rio de Janeiro up and running on an arcgis.com service so that their scholars all over the world can use those geocoders to take their historical data and geocode it to the historical addresses that they're interested in.
That requires a broad knowledge of the infrastructure that runs spatial data in general. Things like ArcGIS, PostGIS, Mapbox, and infrastructure. Understanding grid work. Probably a more accurate description of what I do is general contracting—I know who all the subcontractors are.
A researcher may come to me and say they need to find people in the middle of nowhere, and they need to find them and get to them within a couple of weeks of finding those locations.
I have to know where that data comes from, if the resolution is going to be sufficient, and how to manage that massive amount of data. How to process it to get it ready, how to create an application that allows several people in tandem to crowdsource the survey of that imagery to locate the settlements. Then get that data into a format that's usable for the statistician to build the public health survey.
Where I sit, I see a lot of things and I learn by supporting projects. For instance, public health surveys for Med School researchers are cross-pollinating research in areas like digital humanities, social sciences, or even archaeology. It's an interesting nexus—I get to see what everybody's doing. Everybody comes at things a little differently and uses different parts of the universe of geospatial technology.
I can help take ideas from one place and test them out in other places and make suggestions on what's being done in that space or application area. This could be interesting, and it might intersect with what others are trying to do.
We shouldn't limit ourselves.
Everything is somewhere. And that somewhere matters. It doesn't matter if you're doing medical research, or if you're interested in car accidents, or if you were interested in incidents of COVID. Those events you're interested in took place in a geographic location; the things around that location influenced the event or the development of that thing you're interested in.
Back in the 40s and 50s, a lot of the geography departments in academia folded or drew back from academia. Geography atomized and became a set of tools that other application areas began using to measure, analyze, and manage their data explicitly spatially. Everyone should learn spatial thinking.
Previously, this course was taught traditionally. It's a Fundamentals of Geographic Information Science course that covers basic vector and raster analysis and basic introduction to remote sensing.
It's been taught in a traditional manner from a common textbook that most programs still use. What I wanted to do for Stanford researchers, in particular, was to move away from that canned approach to spatial data, which is prescriptive, and single platform focused; it's teaching people to do their research on the Esri platform.
There's nothing wrong with doing that; we use lots of Esri products to get our work done. But Esri is not the best way to do everything.
You look around the Stanford campus, and 80% of the students are on Macs. That makes it problematic to put the Esri software on their computers, so considerations like that force you to broaden the landscape of solutions you're teaching students about and exposing them to.
I'm replacing that single platform, uniform experience with something a little more akin to the annual hackathons that we put on at Stanford. One's called TreeHacks. It's gigantic, and last year it involved 1,600 students from all over the world coming together to work on problems that stakeholders had presented and pitched to them. They work with mentors from companies like Google, Mapbox, Planet, and so on. The other annual hackathon is in the School of Earth. It's focused on environmental applications, and the last one was on wildfire.
We bring mentors from companies that are making the platforms we're using to manage these massive amounts of spatial data. We're dealing with commercial products mostly when we work with geospatial data. Even though Google Earth Engine is freely available to anyone with a Gmail account, it's being integrated now into the Google Cloud Platform. Eventually, they want to place metering on that and allow people to build businesses on it.
I'm exposing students to this entire universe. Ten years ago, there were a few players in the game. After Google Maps and the mashups craze happened 15 years or so ago, companies like Mapbox, Carto, Cesium, Boundless, Planet, and these other players that are now big in the spatial data world were small startups, and they were doing exciting things.
2013 was the first time I saw Dave Tao from Google Earth Engine bring up all of North America's Landsat imagery and create a composite for an entire year. The pseudo image of all the best pixels from all the Landsat images for a whole year for all of North America. In seconds. It blew me away, and I knew it was a game-changer.
Same thing happened when I saw the Carto database in its earliest iteration and how it was a powerful back end of PostGIS with a beautiful rendering engine on top. That was the vector version of Google Earth Engine to manage massive amounts of vector data in the same way we can manage raster data.
I want to expose students to all of those things. When we talk about projections and coordinate systems, we'll bring someone in from, say, Trimble to talk about surveying and how projections and coordinate systems are critical to accurate surveying technologies. When we talk about remote sensing, we're going to bring the Google Earth Engine developers in, and they'll talk about how and why they develop this platform, particularly for students.
I want my students to be exposed to the bleeding edge technology so they can make critical decisions about what's the best tool for their particular problem. You shouldn't be a shop with just a hammer. There are lots of shops that have just hammers. They have the Esri hammer, and that hammer is ready and useful. It’ll do nine out of the ten things that you need it to do. It'll do them in a way that's easy for people even without a deep technical background to execute, but it may not be the best way to do things.
If you're going to do fieldwork and manage a hundred people in the field collecting data, ArcGIS collector is the platform to do that with. If you're helping a grad student do personal research and they need a map of frogs in the rainforest, a PDF map might be the best thing in the world for them. I want students to know what the entire landscape looks like, and we're going to expose them to that.
We'll move through everything we can. For example, create spatial data with the Humanitarian OpenStreetMap platform for active response projects that are being posted by Missing Maps, Médecins Sans Frontières and the Red Cross. We'll do remote sensing in Google Earth Engine and manage the extraction of features with a platform called Robo SAT (now called Robo pink).
We're going to play with all of those tools. Nobody's going to come out of my Fundamentals of Geographic Information Science course as an expert in Esri software. They'll understand what they can do with it, and they'll have used it. They'll be ready to decide whether it's appropriate for what they want to do. But they'll also have been exposed toR, to the command line withGDAL, and working withPython notebooks to do machine learning work. I want to show students all the tools that they can manage their research with.
What we'll be learning as we investigate and use these tools isspatial thinking. That's the most important thing to get out of the course. We'll be learning about the power of co-locating datasets against one another in geographic space to analyze features and phenomena against one another in that space.
One of the most powerful things you can teach a social scientist is how to bring a set of their events into a geographic information system, add a bunch of demographic data into that same system, and transfer the data from the demographics to their dataset. They can go back toR, orStata orSPSS or whatever it is they're interested in doing the actual analysis in. But that idea of spatial thinking, in terms of proximity, overlap containment, adjacency, and the ways of measuring geographic space and the relationship of features and phenomena to one another. That's the thing you're teaching with these tools.
You're teaching folks that you can create vector datasets that allow you to get at something, like population density, through an image of the surface of the earth. You need to select the right tool to get to that final dataset from capturing an image of the planet, vectorizing a sample set to implementing that in a model, and then making the prediction on the larger dataset of imagery that you're interested in. Just knowing that the opportunity is there is essential.
I learned early on at Yale that not everyone needs to apply these tools. Some people just need to know that these tools exist. I learned this with a faculty member, Durland Fish. He's a famous professor from Yale who works on Lyme disease and other vector-borne diseases. He would always sit in the back of my class, and he would never turn on the computer. He said he didn't need to. He just needed to know the vocabulary so he could talk to his research assistants about it.
That's when I realized that I need to build in not just the ability to use these tools, but the ability to understand and communicate about these tools into everything that I'm teaching. Not everyone needs to sit down and use these tools. Some people are managing the folks who need to use them. It's important to make sure that those folks understand spatial thinking, the power of geographic data, and the tools we use to manage and analyze that data.
Learning how to learn is the most important thing.
I've spent the last 20 years building the filter bubble that Google places onto every search that I make. Google has built up this picture of what I'm interested in. My filter bubble is particularly well-tuned to geospatial data technologies, machine learning on imagery, things of that sort. I can find things quickly.
That's what you're after. You don't necessarily need someone to know the name of every machine learning platform that can handle geospatial data and earth imagery data efficiently. You want them to know that there are platforms, and to understand enough about the infrastructure of geospatial data in 2020. They can get to what state of the art in creating spatial data training datasets might be. Should they use Label Maker from Development Seed? Should they use Mapbox? or should they use a platform like labelbox that has some level of geospatial capabilities?
Just the fact that they know there is multispectral satellite imagery and they can vectorize it; just those terms will be enough for them in the future to explore the landscape at any point.
That's what we try to get across. If you understand at a fundamental level how things work in geospatial data and technology, then, if you're thinking about novel applications, you can anticipate what it is that you're trying to get towards.
We should stop being mean to each other. There's no sense in compartmentalization.
There's an existential compartmentalization that I see in the geospatial data and technology world. Esri versus open source. That's not useful.
Esri is a 50-year-old, geospatial data technology company. They know what they're doing, and they do it well—some things better than anyone else. Taking an existential stance that you're not going ever to consider using those tools is only going to hurt the projects and the results for yourself.
However, boxed into that Esri world limits your exposure to some of the most interesting and innovative work being done in geospatial at any given time. Machine learning on satellite imagery didn't start at Esri.
It started with a bunch of techies working for other geospatial startup companies that were based on open source. They grabbed the open source software that folks in imagery recognition used, and found cats and hamburgers in photos. They used and tinkered with the software until it worked for geospatial data, and then released that to everybody. It caught on, and people started focusing on that.
Next thing, there were five different platforms for managing geospatial data for machine learning projects. They were all open source. Neither one of those worlds should be mutually exclusive, if you want to do effective work and if you want to see what's coming on the horizon. If you're going to build lasting solutions, you have to be open to using whatever works best at any given time.
These things have five-year shelf lives. You're thinking about five-year futures when you're talking about these technologies, and that's a long shelf life for some of these technologies; these things turn over much quicker than that sometimes.
I would also love to see companies that are beginning to image the surface of the Earth apply the same sort of model that they're applying to the creation and deployment of those satellites, like the microsatellite model. It makes everything nice and cheap and builds robustness into the launch system.
I would like to see them apply that same model to their pricing.
What you see in imagery right now is an innovative model to deploy, capture, analyze, and distribute satellite imagery, but a very .gov and .military-based pricing scheme. You're not finding out what's the most innovative thing that can be done with your product.
Working with brilliant students, faculty, and staff who are always teaching me things. The real secret is to be a lifetime learner. That's probably the most important thing in life. Always place yourself in an attitude of learning. There are folks that I work with who I love having meetings with because every time I walk out of a meeting with them, I've become more knowledgeable, or smarter, or I can see things differently.
That's an obsession with learning, knowing, and developing your instrument, which is your knowledge of the tools and technologies and application areas that we do these things in.
One of the smartest things that Esri has done for the last couple of decades is just pouring resources on universities and K12 education.
From a business perspective, they're playing the long game. Most people who come out of a university GIS program are Esri users.
Other companies are being smart about that, too. Planet are reaching out to research and education. Mapbox provides resources for research and education.
The bottleneck at this point is the incredibly useful high cadence, high-resolution Earth Observing data that is still primarily the domain of the government and the military. Once you start opening that data up, you'll see an explosion of applications that will end up building profits. Those companies eventually are going to discover the long tail that has made Amazon successful.
Until you open up that long tail, you don't know what's there. You don't know what markets and application areas are there. I hope that folks at Maxar and other companies that are producing the high-resolution data that is applicable in so many areas, where we need good research, open that up more in the future.
This autumn, Stace will be releasing his GIS Fundamentals lectures as an open course shortly after he delivers them in class. Watch out for that; it's going to be a people's guide to GIS. Can't wait for it. Can you?
Be sure to subscribe to our podcast for weekly episodes that connect the geospatial community.
For more exclusive content, join our email. No spam! Just insightful content about the geospatial industry.
Commercial satellite providers produce somewhere between 100 and 200 terabytes of imagery a day ̶ a monstrous amount of information. Sentinel 2 has five years of daily refresh data. We have 40+ years of Landsat data. It’s a massive amount, particularly in the temporal dimension, where you can do longitudinal studies. Apache Spark and Raster Frames might just be the tools we need to handle this much data.
With the open data movement, there’s an ubiquity of data. We can let students pick their own data on topics that interest them. They find their own data for a geographic area they’re interested in, perhaps where they live or where they’d love to travel. They make connections to their own interests and lives. The more they’ll see the relevance of what they’re learning, the more they’re motivated.