Frank’s mission is access to data and reminds us what access to data might mean for unexpected contributors, like Dr. John Snow in 1854.
He also shares a few exciting examples of how ease of access has dramatically contributed to the local economy.
The National States Geographic Information Council — a club of individuals around the country who are in leadership roles in the US in their states around geospatial.
I’m grateful to be part of this community that has something in common. We want to leave the world a better place through the use and application of, and our contribution to spatial data.
Plus, now there’s a worldwide community for people like “us.”
If you’re reading this, you’re part of that community.
Joining the community spirit wasn’t so evident to me early in my career — I was trying to work hard and make my own contributions.
Then, my intention shifted to empowering others I knew and worked with.
Now I want to empower other people I’ve never met or never will with the addition to doing so sustainably that will outlast us.
In the last year, something else became important — maximizing our impact we spend on doing anything, be it at our desk, on the phone or doing a podcast.
Are we doing what we can during those times to leave the world in a better shape?
What are our unique contributions we bring to the table to figure out how to apply or make spatial data available?
Can we and do we trust each other and our roles?
These questions are our guiding stars at NSGIC.
Let’s go back to London, England, in 1854.
The cholera outbreak was devastating millions when an unexpected contributor to the events changed everything.
Dr. John Snow wasn’t even an internal medicine doctor. He was a surgeon, and no one expected him to be the one to come up with the answer for a cure for cholera.
Cholera was understood to be incorrectly transmitted in the air, much like COVID-19. Dr. Snow had a different idea and he solved the cholera outbreak not with test tubes and microscopes but with a map.
He mapped the location of the people affected by the illness. He determined that those who became ill were closest to a particular water well.
Let’s jump back to the current pandemic in 2021.
We’ve got a tough question to ask ourselves,
“Are we making data available to those unexpected contributors? And are we making it available at a resolution sharp enough or high enough for the Dr. John Snows to have found that well?”
Sometimes, the answer is yes.
In other cases, we’re working hard across the country and the world to be better equipped for the next pandemic.
Take the example of the State of Wisconsin mapping positivity rates by census tract and making that data publicly available.
MIT picked up that data and assigned students to it. They did some outstanding analysis to explain the vectors, the hotspots, and the types of facilities affecting the spread most.
Or another example in the Northeast US, in Vermont, where the state has a tourism-driven economy.
Analysts carefully studied the travel restrictions; they used trillions of cell phone tracks for where people came from and where the spread happened. They drove policy decisions on travel restrictions based on real-life data.
These are excellent examples of what we can do when we make data available.
The fact is, it’s difficult to share data, or at least it seems to be because it doesn’t happen very often — we have concerns about privacy issues that Dr. John Snow wasn’t facing. He didn’t have to be concerned about HIPAA.
You can take the records of people who have been tested or vaccinated. Then you can aggregate those to something like the census tract (the unit of geography the US Census Bureau uses to protect privacy under the American Community Survey).
Once you’ve aggregated that data, it’s no longer about an individual — it’s about a rate in a polygon. You’ve removed some of that privacy.
Sure, your data still is in there, but there’s a level of anonymity.
There are reasons to protect data and a lot of them are codified in law. But there are also insufficient reasons to protect data. For example, you may think your data is incomplete. Or, you think it’s too expensive to let it out in the wild. Or you’re afraid someone will crush your servers when they download it.
To some people, it was clear, and for some, it was up for debate.
And so, county was agreed as the unit of resolution.
As we watch the news, we can see how one county is doing better than another, or how travel restriction differs from one county to another.
That’s all good.
But as events unfold, the critical details may come down to the local variances — neighborhood by neighborhood. There are different criteria, and the spread may not be homogeneous.
To see what’s happening, you’d need a unit of geography — like the census tract.
In Rockland County, New York, analysts took the same data, mapped it by county, zip code, and census tract.
Then they mapped it to a grid cell.
Exact same data.
The result gave them a radically different appreciation of what was going on with the spread.
Depending on the problem you’re trying to solve, you need a unit of resolution that allows you to resolve the slight differences.
One is mindset.
Some folks are of the mind that it’s my data. Data is power, and it’s my power. If I give it away, I give away my power.
That’s a mindset about 20 years out of date.
We can get our power by empowering others.
We can clearly see computer architecture scaling. How about the use of our data in our impact scaling? Can we put it out there, so thousands or millions of people have their job or life a tiny bit impacted by the availability of that data?
That’s where it really takes off.
Another reason for not making data available is that many people are detail-oriented and do a magnificent job getting their data dialed in… but does the data have to be perfect to be useful?
It might be perfect three years from now, or it might never be perfect. You’re never going to be satisfied.
Just put your data out in a way that works. As you refine it, it’ll work better.
For instance, we have a dot on the roof of some 6 million addresses in the state for our New York address points.
Sometimes that’s not good enough, sometimes we need a dot on the entrance too — apartments 1-6 enter this door, apartments 7-30 enter that door.
For 911 purposes, we don’t have to have all those sub-addresses all figured out and dialed in for the dots on the roof to still be helpful.
That’s an example of when we don’t have to wait for the data to be perfect — we’ll eat on the way to the dream. In the meantime, we’ll do something useful and keep refining it: business requirements and the cost-benefit drive towards that.
Think about the young people entering the workforce.
20-year-olds don’t call for a pizza on their phones — yet they get pizza.
Suppose your pizza ordering system is not available and you’re asking people to call instead. In that case, they’ll just order from another place.
They don’t want to sign up, get a demo, sign an agreement or make a phone call.
That’s the expectation we’re working with.
It’s cheaper and easier to make data publicly available to efficiently support the computer infrastructure than manage the user accounts necessary to constrain access.
Web services are a big part of that. You should be able to use your standard web search and find the data you’re after.
If you’re looking for parcels in New York State, here’s an explanation of how to use the web service:
There’s a URL you plug into your GIS software (or for developers into your app).
The URL just waits for that data to come at you as you need it.
There are excellent ways to remove any caveats. But it’s up to the data author to be clear about their data and how it’s being used via licensing.
We have a team of eight people. Last week they made 9000 edits to our streets and address data.
The streets and address data are the underpinnings of the Geocoder — a URL. It’s a web link sitting out there; you send it an address, it sends you back a correctly spelled address, and if you are close enough, it sends you back a coordinate.
That’s all it does — one simple thing.
This is publicly available for all those addresses in New York State.
Our small contribution to hungry residents — if they got their pizza faster and hotter, then we’ve improved the world a little.
The data is now also lined up to go into different systems to support 911 dispatch. It’s publicly available and it’s a live data set.
You do it with 200 partners.
We have every addressing authority in the state working with us to keep the data fresh and continue to improve it.
That job is complicated.
But for the user, who doesn’t want to be weighed down with unnecessary complications around the data, it’s a simple URL.
For a developer writing a little web form asking for addresses, customers can hit that URL and respond to “Did you mean this one?”
Now it’s all properly formatted and spelled correctly.
If you also have to ask the customer which county they’re in, you can leave that off your form. Take that coordinate and bounce it off another web service.
What county is this?
It drops a point there, just as if you had clicked your mouse on a map, and now the county comes back, and St. Lawrence is always abbreviated the same.
Not only have I made my user interface more straightforward, but I’ve also improved my data quality just by interacting with the spatial data.
If we can make enough data to move the dial on the economy, in this case for New York, even one tiny percentage to become more attractive, more accessible, or more efficient to site a business or to do site development, we would have improved our economic competitiveness and efficiency.
If we can move the dial just a tiny bit, we’ve paid for our salaries and other infrastructure over and over again.
The best examples I have are around parcel data.
(Quick statement of respect here: local governments have various policies, and it’s their prerogative to have various policies in place for how they control parcel data. They do that out of a sense of stewardship.)
With permission, we publicly share their parcel data through these web services — around half the counties in the state have said yes, share our parcel data publicly.
This example is from Rensselaer County, New York, based on statistics from 2015 to 2019.
Before Rensselaer county made their parcel data public, there were 127 downloads of their data. Through a data-sharing agreement we have in the state, the year they made that data public, the number went to 2730.
Clearly, more people downloaded the data once it was public.
We studied the weblogs and determined what type of businesses and URLs were downloading the data.
854 of those downloads were from businesses; one of those businesses was Amazon.
Subsequently, in 2018, an Amazon distribution facility was applied for and is now open in Rensselaer County.
Amazon put another facility up in the state — which just happens to be in another county that makes their parcel data publicly available. Amazon had also downloaded their parcel data six months before that project started.
We’ve been told by the economic development experts that municipalities or regions have been left out of consideration for some economic development projects simply because their data was not publicly available.
The timelines for making those project location decisions have shrunk to a point where companies start facilities where they can grab the data, decide, and move on.
It used to be months — now it’s a day or two.
Going back to the Amazon example. Not everyone in the area thought that putting a facility there was a great idea.
A neighborhood group also downloaded the parcel data to hold the developer’s feet to the fire. Residents knew of a regulation that if Amazon was to do blasting on the site; all property owners within a mile had to be notified.
They used the parcel data on the other side of that debate.
It’s not just the deep-pocket company that has access to the data — it leveled the playing field for a reasonable discussion. The Neighborhood Association had the parcel data for free, as well.
We’ve uncovered several other examples where a company downloaded or accessed the web services and economic activity happened — Regeneron, a pharmaceutical company, downloaded the parcel data and then opened a facility in Rensselaer County.
The first Amazon facility created 800 full-time jobs, plus their tax increase of USD 1.5 million a year to the county (after county and town taxes and investments the county made to handle the load and facilities).
Regeneron created 1500 jobs in that area.
We’ve had conversations with some of the more rural counties in the state where they’re not convinced a smaller county would do so well.
I’d ask the question,
What about your ability for your citizens to connect to the internet? Do you have universal broadband? Wouldn’t it be helpful for broadband developers, or solar energy developers, to understand the property ownership to better plan and expand those facilities?
There’ll be more and more of those decisions and we expect things will head towards open data.
If your data is still behind a password, you’re leaving your constituents out of a segment of the economy.
You need to start with strategy.
If you have to add another staff member for every thousand GIS users, you’ve failed.
You can’t scale your workforce to come anywhere near the demand that you’re hoping to use your services.
You need to set things up so that people can self serve with simple but clear documentation.
In our case, we have to have URLs that are caveat-free.
The key is to not try doing too much. Our Geocoder does one thing — send it an address, it sends you back that address correctly spelled and gives you a coordinate.
Can it also return the school district?
Yes, it can… But no, we’re not going to.
We’d rather have another service that’s just the school districts — you send it a coordinate, it sends back the school district; you turn it on, and you see a map. There’s your map of school districts.
We can stack up a synergy of simple things, microservices, each one very understandable. We don’t make it over-complicated by trying to do too many things in the same service.
The Geospatial Advisory Council in New York State has existed for two decades. It has representation from every sector, most heavily weighted with local and federal government, private and state sector folks, and academia, and we get together quarterly.
Their role is to advise me on our programs. For decades, we have been taking the community’s pulse by listening, having them understand what we’re up to, and then having them shape what we do.
The things we do are driven mainly by the community. It’s similar to activities NSGIC takes on nationwide and works with federal agencies and private sector partners to understand which ways the industry will shape things.
We proactively go out and understand the demand and what’s next, both nationwide and in the state. Not everyone is part of our community — yet — so we have to do more of this.
I want every developer in the state to understand how to interact with our web services.
We’re not there yet.
It’ll take a lot of outreach.
It’s a combination of understanding the business needs of those we can serve, understanding our role, and understanding whose lane to stay out of.
I’m not wet mapping wetlands that serve the Department of Environmental Conservation. Just because we could, it doesn’t mean we should.
I can, however, build those trust relationships, so when I need wetland mapping, I know who to go to. And when they need ortho imagery, they can come to us. We can both focus on the pieces we are uniquely qualified to line up.
There’s an awful lot of outreach, partnerships, and miles on rental cars necessary to get these partnerships lined up. There’s nothing like showing up at someone’s office.
I really miss that.
I look forward to getting back to the face-to-face meetings, the handshake, the cup of coffee with someone while we sort out how we can both play the best role combination of all those things.
Thereis a trend towards open data.
It’s also been the hallmark of NSGIC strategy over the last several years — to remove some of those barriers.
The passage of the Geospatial Data Act is an exciting piece of legislation that compels federal agencies who invest in data to do that in a way that’s coordinated with the states and leans towards making more data openly available.
We’ll see more and more of this.
The lack of common licensing and the surrounding language has come up as an issue in the last year.
Folks need to understand how exactly you expect your data to be used.
If you put it in a public domain, with your own words around it, attorneys can still say you’re running a commercial mapping activity.
If you have data coming in from all around the world you want to be clean about, do you have permission to use that data?
You could send a license agreement to everyone that contributes and get a license agreement from one of those contributors. Your attorneys might still say, “Aren’t you making that data publicly available? We can’t sign that license”.
There are different versions of Creative Commons licensing, but Creative Commons Zero (CC0) is a standard license worldwide.
If you’ve published data with CC0, no one needs to review it again if they have already looked at the terms.
Being clear and consistent about the terms of a license not only helps remove the need to keep reviewing it, but compliance with those terms goes up.
Imagine if you had 100 partners, and every one of them tweaked the license and gave you different wording on how you should control the data or how you can pass it on to your users and staff.
We need to agree that if we put data in the public domain; we use CC0. Everyone on the same page, you can tell folks about the terms of CC0 and the hundred things that apply to it.
There are some contributors to this ecosystem, like our data ecosystem, which is a commercial venture that can’t just put everything out there for free, else the company will go under in a year. In these cases, their license terms are their responsibility. They need to be clear about how others can and can’t use their data.
It’s the licensing, a key part of data governance, that allows that trust relationship that I talked about to happen. I am clear about how you expect me to use your data, and here’s how I expect you to use my data.
Plus, here are the caveats. We’ve worked hard to remove them, but you’re not going to rely on this as being perfect. For instance, you’re not going to rely on a warranty from me, but go ahead and knock your socks off and use it.
It’s all in the license terms.
That’s the next hurrah for understanding a consistent approach towards publicly available data as well as commercial or somewhat restricted data we can use.
Policy? Tech? Culture?
It’s not technology — we have plenty of technology. Technology will continue to be the straightforward part.
I’m optimistic that the trendis the right direction and that people will get their heads around sharing data more.
There’s a public perception around privacy that we have to be respectful of. We’ll see that there will be a move towards anonymity as opposed to privacy. Sure, your data is part of this, but you’re anonymous in it.
Think of your music streaming service. You have CDs in the basement you own, but you still subscribe to an internet music service and share your playlist with your kids or spouse.
There’s no privacy there.
But are you just as comfortable sharing your spatial data?
So we’ve got to bridge that gap, and while doing so, we’ll need to be deliberate about our decisions. We have to look over the horizon, beyond our office or agency walls, and see what the real impact of that data could be.
A good dose of empathy needs to come in there. People need to understand the benefits in a way that resonates with the things they’re trying to accomplish.
Back to the NY street address example, how many transactions happen atyour address?
What percentage of those are 911 calls?
Hopefully, very low.
But then we need to exercise that data through lots and lots and lots of transactions, absorb the feedback, and use that to make sure that we’ve got your address right.
So that when the 911 call happens, this is not the first time we’re responding or using that address to find it.
Whether we’re collecting census forms or delivering pizzas, we’ve exercised that address and we’ve got it right.
For my first mapping job, I was in a canoe. I was collecting water samples around the clock on some 30 points. We worked 24 hours straight to collect water samples for a study in Lake Champlain, an environmental impact study.
I collected 30 points for six people to use that data. I was really proud of it. I was uniquely qualified to be up all night and paddling around with a headlamp on.
Now, we’re collecting millions of points, and we’re on pace to come close to a billion interactions with our web services in 2021.
We had 75 million hits on our web services last month — some of those may be frivolous, people messing around on the internet, but also people planning their camping trip or setting up a facility.
We need to have faith that there’s a lot of inspiring things people are doing because it scales so widely.
The industry has changed. Players are looking around and noticing the autonomous vehicle industry, the big mapping companies, cell phone companies selling phones partly by the marketing around whose map is better.
Over the last couple of years, NSGIC has pulled that community together in exciting ways.
For instance, an exit number changes on the road. We’re setting up a structure where all commercial companies and commercial map providers can right away see it happen. We also make the road safer because people experience something in the virtual world with their directions that they’re experiencing the same day it happened in the real world.
And those connections are super exciting. I am even more stoked about being part of this industry than when I decided grad school was a thing.
I remember to this day my mission statement for my grad school application,
“I want to learn everything I can about how geographic data is collected, analyzed, and portrayed.”
Just broad enough that it still holds up today. Just focused enough to give that drive.
An empathetic mindset and assuming the role of the translator.
I have to meet people where they are, or else I don’t get the value of their insights, and they don’t get the value of using our stuff.
I talk about RGB values to the graphic artists, APIs to the developers, environmental impact to environmental specialists, and public safety to the police.
I need to understand enough about what they’re trying to do so that we meet them where they are without dumbing things down.
You don’t have to dumb anything down.
You have to make it applicable for somebody who has different things on their mind — such as putting a user interface in a police car.
When police cars are on the side of the road in a dangerous situation, with all the things officers have to keep track of in an intense situation, the interface has to be ready for that. If it’s the slightest bit hard, we’ve missed the mark.
Being empathetic about how my team’s work impacts someone so they can consume it is the most exciting skill I can think of.
Be sure to subscribe to our podcast for weekly episodes that connect the geospatial community.
For more exclusive content, join our email. No spam! Just insightful content about the geospatial industry.
To put it simply, point clouds are a collection of XYZ points that represent some real world object of nearly any scale.They can be generated in a few ways. As geospatial scientists, we mostly work with LAS/LAZ data collected by aerial LiDAR (light detection and ranging) scanners at varying scales, from landscapes, down to project sites. We may also derive point clouds from highly detailed orthoimagery of an area, such as from the products of a drone flight.
As a data scientist, you don’t just go in and solve problems. You make recommendations to multi-faceted issues so that you get a fantastic model in the end. You’ll also be advocating a better use and understanding of the data while you do that.