In this episode, I welcome Jason Gilman, a Principal Software Engineer at Element 84, to explore the exciting world of natural language geocoding.
Key Topics Discussed:
- Introduction to Natural Language Geocoding: Jason explains the concept of natural language geocoding and its significance in converting textual descriptions of locations into precise geographical data. This involves using large language models to interpret a user’s natural language input, such as “the coast of Florida south of Miami,” and transform it into an accurate polygon that represents that specific area on a map. This process automates and simplifies how users interact with geospatial data, making it more accessible and user-friendly.
- The Evolution of AI and ML in Geospatial Work: Over the last six months, Jason has shifted focus to AI and machine learning, leveraging large language models to enhance geospatial data processing.
- Challenges and Solutions: Jason discusses the challenges of interpreting natural language descriptions and the solutions they’ve implemented, such as using JSON schemas and OpenStreetMap data.
- Applications and Use Cases: From finding specific datasets to processing geographical queries, the applications of natural language geocoding are vast. Jason shares some real-world examples and potential future uses.
- Future of Geospatial AIML: Jason touches on the broader implications of geospatial AI and ML, including the potential for natural language geoprocessing and its impact on scientific research and everyday applications.
Interesting Insights:
- The use of large language models can simplify complex geospatial queries, making advanced geospatial analysis accessible to non-experts.
- Integration of AI and machine learning with traditional geospatial tools opens new avenues for research and application, from environmental monitoring to urban planning.
Quotes:
- “Natural language geocoding is about turning a user’s textual description of a place on Earth into a precise polygon.”
- “The combination of vision models and large language models allows us to automate complex tasks that previously required manual effort.”
Additional Resources:
- Element 84 Website
- State of the Map US Conference Talk on YouTube
- Blog Posts on Natural Language Geocoding
Connect with Jason:
- Visit Element 84’s website for more information and contact details.
- Google “Element 84 Natural Language Geocoding” for additional resources and talks.
In Conversation
This episode is a recording of Jason Gilman’s State of the Map US conference talk on natural language geocoding, followed by audience questions.
Answering Big Questions in Natural Language
I’m Jason Gilman, from a small company called Element 84 — a woman-owned company that works with public and private sector clients, commercial and government, building things like geospatial data processing pipelines and applications that help answer big questions about our health, infrastructure, and changing planet. Every geospatial project begins with a quest for answers, and with generative AI we can now start to directly answer the questions users express in natural language. Users have medical questions — “where are the current cases of Lyme disease within 20 miles of Boulder, Colorado?” — real estate questions, urban planning questions, ecological questions. They all have one thing in common: they refer to a specific place on Earth, expressed in natural language. To answer them, we need to take those natural language descriptions and convert them into a real polygon. We came up with a technique to do that, and we call it natural language geocoding.
What Natural Language Geocoding Does
We can take a phrase like “within 10 kilometres of the coast of the Iberian Peninsula” and convert it into exactly that polygon. If you’ve used traditional geocoding in OpenStreetMap — typing in an address and getting a point back — this takes it to the next level. OpenStreetMap also has polygons, and that’s one of the things we rely on to make natural language geocoding work; we use the Nominatim API on the back end. So you can search “Salt Lake City,” or go further: “Salt Lake City east of the airport” — we figure out from context that they mean the Salt Lake City airport. We can do conjunctions — “Salt Lake City and West Valley City” gives the combined area — and buffers, so “within three miles” or “within 5.2 kilometres of Salt Lake City.” It can get more complicated: “within a few miles of Salt Lake City, West Valley City, and Millcreek, except for the airport” — we take the polygons from each place, join them together, add the buffer, cut out the airport, and it works pretty quickly.
How It Works: Directed Graphs and JSON Schemas
We are using large language models, but LLMs on their own aren’t enough. If you go into ChatGPT and ask it for the buffer around the coast of the Iberian Peninsula, it will generate a GeoJSON polygon for you — and I was actually surprised it’s roughly in the right area — but it’s not exactly what we want; the one I tested had a twist where the polygon crossed itself. So, like a lot of things in programming, we represent the solution as a directed graph — nodes and edges, the same way Facebook represents friends and friends of friends. When a user says “within 10 kilometres of the coast of the Iberian Peninsula,” that’s almost like pseudocode: I want a buffer of 10 kilometres around the coast of this area. We execute the graph from the bottom of the tree: look up the area in Nominatim, calculate the coastline, then add the buffer.
To actually get that graph from the user’s words, we lean on what LLMs are best at — generating text, and specifically generating JSON that matches a schema, something they’ve been explicitly trained on. We have a JSON schema that represents all the different spatial operations — place lookups and similar things — and the LLM is pretty good at taking the user’s request and converting it into JSON we can execute. Then we do the calculations on it.
Challenges: Missing Polygons and Ambiguity
Like anything, there are challenges to making this work. One is finding the right thing in OpenStreetMap — things that actually have polygons. Sugarloaf Key returns an area, but larger, more nebulous places like “the Florida Keys” sometimes have just a single point — we see the same with the Rocky Mountains, which is a single point in Wyoming. Some things might be missing, like the Congo River Basin, or there are multiple entries and you have to figure out which one is right — search “Mississippi River” and the first river entry is only part of it. I’m actually a new user to OpenStreetMap and Nominatim, so I’m hoping to learn better ways to use it. Another problem is ambiguity: when a user says “Paris” they probably mean Paris, France — but there’s a Paris, New York; Paris, Ohio; Paris, Kentucky; Paris, Texas; even a place in Las Vegas called Paris.
Solutions: Curated Data and Context
A few solutions we’ve been thinking about. For the terminology our users use in a particular domain, we can curate our own version of OpenStreetMap or Nominatim — it’s open source, so we can deploy it ourselves and add different sources. There are a lot of great government sources — the Bureau of Land Management and other agencies — and datasets like HydroSHEDS, which has things like the Congo River Basin delimited, and a GitHub repo with all the coastlines of the entire world. Depending on who we deploy this for — real estate, geologists — they may have their own terminology for their own areas.
For the ambiguous cases, we use context — “find me ice cream stores between Austin and Paris” probably means Paris, Texas, because you’re on a road trip, while “where can I get some traditional ice cream in Paris” probably means Paris, France. The general training of the large language models is pretty good at figuring that out from context with good prompts. And we can ask the user — the LLM can decide whether it has enough information to answer or should ask a clarifying question — and we display the answer back, showing the spatial area and other things we pulled out, so the user can see if it understood them and adjust.
Part of a Bigger Picture: Queryable Earth
Natural language geocoding is just part of a bigger solution — we want to answer those questions from the beginning, and the spatial parts are only some of it. We’ve built a tool called Queryable Earth where a user can ask something like “show me algae within two kilometres of the coast of Cape Cod.” We break that request apart into different pieces of information: the “algae blooms” part goes to a language model where we’ve previously indexed satellite data broken into chunks, and the natural language geocoding gives us the spatial area to search the database with. Bringing these things together is something we’re really excited about.
Questions from the Audience
Audience: How well do the LLMs do at geographic entity recognition generally?
Jason: Pretty good — if someone types in the name of an area, they tend to be really good, because they’ve been so well trained on so many different names people use for different areas, and I prime them in the prompt with their job. A lot of it is just converting it into a search into Nominatim, and that’s where some of the challenge is — getting the right thing out of Nominatim.
Audience: It looks like you’re using computer vision models as well as LLMs to answer the algae question. Which vision models are you using — proprietary or open source — and which LLMs?
Jason: The vision model in that demo is called SatCLIP, and it’s open source. The way it was trained, people took OpenStreetMap tags and satellite images and used that data to train it — so when you run the model on an image it generates a vector embedding, and you can also pass text to the model and get a vector embedding back. Then we do a semantic similarity search in the database to find images close to what the user wanted. SatCLIP is in Hugging Face, which is a repository for models. For the geocoding LLM, I was using Claude 3.
Audience: Would it be possible to use your software with a written description — say, from a will or a deed — like “a property that starts on the southernmost point of this river and goes to this island”?
Jason: You can definitely do that — extracting information out of documents is something people use LLMs for, so we could use natural language geocoding that way if there’s a description of a spatial area. If an old treaty said the territory surrendered was up to the ridgeline of a certain mountain range, we’d probably need to optimise for that use case — I couldn’t just send “the ridgeline way up high” to Nominatim — but you could certainly build a focused version for that kind of problem. Hopefully folks are connecting the dots: wills and treaties often reference historical points, and as we build up OpenStreetMap, we might be able to solve both.

