How to convert addresses ↔ coordinates at scale using nothing but Python and OpenStreetMap
Context: This guide complements our ArcGIS‑based batch geocoding and reverse‑geocoding articles. If you’re looking for a no‑credit, fully open‑source workflow, read on.
1 Why choose an open‑source stack?
Proprietary locator (ArcGIS, Google) | Nominatim + Geopy |
---|---|
Pay‑as‑you‑go credits or API fees | 100 % free to use* |
Global coverage, consistent quality | Community‑driven OSM data (excellent in cities, variable elsewhere) |
Closed‑source algorithms | Transparent, replicable |
Needs internet or local licence | Can run 100 % offline (self‑hosted) |
*You must respect OpenStreetMap Nominatim usage policy: 1 request / second max to the public API and include a unique User‑Agent
string.
2 Prerequisites
- Python 3.8+ (Anaconda or system install).
- Packages:
geopy
,pandas
,requests
,tqdm
(for progress bars).pip install geopy pandas requests tqdm
- Input data: CSV or Excel file with either an
address
column (forward geocode) orlat
,lon
columns (reverse geocode). - (Optional) Docker Desktop if you plan to self‑host Nominatim for unlimited throughput.
3 Method 1 — Quick one‑liner look‑ups with Geopy
For small jobs (≤ 500 addresses) the 10‑line script below is often enough.
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="my‑gis‑blog/0.1 (+your‑email@example.com)")
location = geolocator.geocode("380 New York St Redlands CA")
print(location.latitude, location.longitude)
Change .geocode
to .reverse("34.056,-117.195", language="en", zoom=18)
for reverse geocoding.
Rate‑limit reminder: Always add
time.sleep(1)
between requests if you are using the public API.
4 Method 2 — Bulk CSV geocoding (public API, respectful)
Below is a fully‑commented script that reads a CSV, geocodes each address, and writes results to a new file.
import pandas as pd, time
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
from tqdm import tqdm
tqdm.pandas()
df = pd.read_csv("customers.csv") # needs column 'address'
geolocator = Nominatim(user_agent="my‑gis‑blog/0.1", timeout=10)
# Wrap with RateLimiter: min 1 sec between calls as per Nominatim policy
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
df["location"] = df["address"].progress_apply(geocode)
df["lat"] = df["location"].apply(lambda loc: loc.latitude if loc else None)
df["lon"] = df["location"].apply(lambda loc: loc.longitude if loc else None)
df.to_csv("customers_geocoded.csv", index=False)
Typical throughput: ~3 500 addresses per hour on the public endpoint.
5 Method 3 — Self‑hosted Nominatim via Docker (high‑volume, offline)
When you need millions of requests or must keep data on‑prem, spin up your own Nominatim server.
5.1 Spin up a container
git clone https://github.com/mediagis/nominatim-docker.git
cd nominatim-docker
# download a country extract (~2 GB for US) via geofabrik.de
wget https://download.geofabrik.de/north-america/us-latest.osm.pbf
# edit .env to point to the PBF file
sudo docker compose up -d
The stack provisions PostgreSQL/PostGIS + the Nominatim service exposed on http://localhost:8080.
5.2 Change your Geopy endpoint
geolocator = Nominatim(domain="http://localhost:8080", scheme="http", user_agent="local-nominatim")
No more rate limits! Add a load‑balancer or additional read‑replicas for extra concurrency.
6 Choosing between public vs self‑hosted
Metric | Public OSM Nominatim | Self‑hosted Nominatim |
---|---|---|
Requests/day | 86 400 (1/s) | Unlimited (hardware‑bound) |
Cost | Free | VPS ≈ $40/mo or on‑prem server |
Latency | 300‑500 ms | 50‑100 ms (local network) |
Data freshness | Planet file updated weekly | You control import schedule |
7 Data quality & troubleshooting
- No result: Address may not exist in OSM; try wider
country_codes
or partial address. - Multiple results: Use
exactly_one=False
and pick top‑score or ask user input. - Timeout errors: Increase
timeout
in Geopy constructor or batch size. - Encoding issues: Ensure UTF‑8; strip emojis.
8 Exporting geocoded data to GIS formats
- CSV → Shapefile/GeoPackage
ogr2ogr -f GPKG customers.gpkg customers_geocoded.csv X_POSSIBLE_NAMES=lon Y_POSSIBLE_NAMES=lat
- Load directly into QGIS, symbolise by match quality.
9 Linking into your ArcGIS workflow
Even if you primarily work in ArcGIS Pro, you can import this open‑source output:
- Use Add Data ➜
customers_gpkg
. - Join to enterprise geodatabase tables for further analysis.
- Combine with the Reverse Geocoding table to build QA dashboards.
FAQ
Q: Is it legal to use Nominatim for commercial projects?
A: Yes, provided you comply with the OSM licence (ODbL) and the Nominatim usage policy. Attribution to OpenStreetMap contributors is required.
Q: How accurate is OSM geocoding compared to paid services?
A: In major cities accuracy is comparable; coverage and address completeness can be lower in rural or under‑mapped regions. Always validate samples.
Q: Can I cache results to avoid duplicate requests?
A: Absolutely. Store the place_id
or lat/lon in a local database and query it before calling the API.
Now you have a full, cost‑free pipeline for geocoding at any scale—pair it with our ArcGIS and reverse‑geocoding guides to choose the right tool for every budget and project.