Mark Varley has been providing geocoding and location intelligence services for insurers since 2015. His UK based company, Addresscloud, recently also started branching out to financial institutions, banks, lenders, and logistics companies.
Mark is yet another accidental geographer. He stumbled upon a GIS project some 20 years ago when the insurer he worked for needed to bring their GIS and geospatial analytics to the front of what they did and embed it in their quote and buy process.
WHAT CAN ELASTICSEARCH OFFER GEOSPATIAL USERS?
Elasticsearch is a mostly open source software. As the name suggests, it does two things.
It’s elastic because it provides users with a piece of software they can run as a server. It’ll let you scale across a cluster of servers and multiple applications. You can even do sharding; breaking your database down into chunks and replicating them across many servers, although some prefer to use a managed service, like AWS, for that.
It’s also a search engine. It comes with a bunch of cool tools, a nice API, and a set of indexes. Plus, other tools to let you do search at scale on the large, nuanced data you may have.
IS THIS A DATABASE OR A SEARCH ENGINE?
Both.
It’s a full database, but not in a Postgres way. It’s more like a distributed document store with JSON documents.
You can download it for free today and start pushing your data. Run a query or two and put the search to the test.
Their inverted index means you don’t have to sit down and plan for weeks. Like the Postgres-type planning where you need to decide on a table with attributes, indexes, and your primary keys before you do anything.
Out of the box, Elasticsearch can do a lot with those JSON documents. It’ll do analysis and identify things like numbers, sets of geometry coordinates, date stamps, or something that looks like text. Unless you tell it not to, it will index everything. Now you’re ready to run your search.
CAN IT BUILD GEOMETRIES AND MAKE A SPATIAL INDEX BASED ON THE INDEXES?
Depending on how you push the data in, it can. When you start out, the power of the inverted index is that it indexes everything. Unless you use it for the most basic use case all the time, you’ll have to tune it and give it more information to help with the indexing.
Elasticsearch supports two spatial indexes ̶ geo point and geo shape ̶ with several tools and ways to use them. (*These terms may be confusing to people who have a Postgres background where the concept in PostGIS is geometry.)
WHAT’S THE DIFFERENCE BETWEEN THIS AND POSTGRES?
Postgres with PostGIS extension is for users who mainly use it for GIS and a bit of search.
Elasticsearch is for those who want to do mainly search and some geo.
I use both for architecting our services, and when choosing technologies we’ll use for each different use case.
Postgres can be generic, a black and white situation. Elasticsearch can cover shades of grey and is more nuanced. It has more power for doing complex search type use cases and problems that would be tricky to solve with Postgres out of the box.
ARE WE STILL TALKING ROWS AND SCHEMA?
Until recently, Elasticsearch had this notion of types, but that was made redundant a couple of versions ago.
You can think of an index as more like a table. If you’re storing credit card transactions, the transaction will go in an index. Then, you might have a separate index for your customer data.
What Elasticsearch doesn’t do very well is handle joins and relations.
If you’re used to a Postgres type view, with lots of different tables you join together, that doesn’t work well in Elasticsearch ̶ or any kind of NoSQL database. Elasticsearch is a specialist subset of NoSQL databases.
You want to think about structuring your data like you want to see it on the output. The ultimate thing that it relates to in the real world. Think of indexes as tables. Then within those indexes, you’ve got your documents, which would be your rows.
WHY IS ELASTICSEARCH A GREAT CHOICE FOR GEOCODING?
Back in 2015, I looked into several technologies. Elasticsearch was recommended to me by a friend. Once I looked at it, I fell in love with it straight away. It was perfect; it felt like the right technology at the right time.
Since then, others have developed geocoders using Elasticsearch. It’s an excellent choice, with a small caution on the side. You’ll still have to put in an additional 20% of work that you can’t immediately get out of the box. You’ll need to understand how it performs well, but if you’re serious about geocoding and you want to give the best experience for your users, it’ll be worth the investment.
What Elasticsearch does really well is combining full text searching.
- Complex search with fuzzy matching, typos, and mistakes. It compresses those together with queries and filters.
- Text search plus a filter that’s almost like a WHERE clause in Postgres.
- Geo stuff
- Autocomplete, or type-ahead.
CAN IT CALCULATE ON THE FLY? OR DOES IT NEED ALL THE INFORMATION TO BE PRE-CALCULATED?
Elasticsearch gets you a relevant and accurate response quickly. It indexes and optimizes in advance, so it’s not particularly good at calculating things on the fly. The means are there to do so if you want to. You can have a go with a scripting language.
What can you do with a new attribute, for example?
Suppose you’re in the UK, and you want to start recording counties. The Royal Mail doesn’t recognize counties, but people still often use them. Add the counties into your Elasticsearch document, and it will recognize and index them based on whatever it thought was the correct setting.
What do you do with people abbreviating? Street to st./str., or road to rd.?
The county of Worcestershire in the UK is often written Worcs. Elasticsearch considers abbreviations and alternatives as synonyms and can add your list of synonyms into your indexing settings. When you add your document in, and it spots the words Worcestershire, road, or street, it stores not only the full but the abbreviated version too (Worcs., rd., st., str.). That gets around the complex ways humans talk about addresses. This would be difficult to do in Postgres.
WHAT ABOUT DOING REALLY COMPLEX THINGS?
I want to search for the best place near me that does Italian dishes. Can I?
Yes, that’s quite a common use case for it. Elasticsearch can show you all the restaurants that have Italian foods within one kilometer and are open at 11pm with at least four stars, sorted by price with the closest first.
A lot is happening there—an element of location and distance, plus Italian food that may be pizza, spaghetti, or lasagna, which are all synonyms of the term Italian dish.
But then, you may be more familiar with how Tinder powers their platform with it. It lets you search for a date or a mate in your area, meeting specific criteria. That’s a great example of how complex the searches can be, combining lots of different search facets with some geo.
SO IT’S A SEARCH ENGINE WITH AN ADDED FILTERING PLUS GEO CAPABILITY
Yes, and more.
Initially, there was limited geo support to Elasticsearch. But in the last 4-5 years, Nick Knize came along. He has a more technical geographic background, and he’s introduced new ideas, like a lot of what you’d see in PostGIS. Things like supporting shapes and queries. It’s not full-spectrum and still not in the same category as PostGIS with its richness of functions. Yet, there’ve been improvements in the performance and the available functions. There’s a lot more you can do today than five years ago.
The log analytics function is used by large logging software companies. It searches, filters, and aggregates millions of records on the fly quickly.
With a new maps interface and maps tool, everyone will find something to like about Elasticsearch.
IS THIS BUILT FOR THE WEB?
Definitely. That’s the idea behind the whole NoSQL movement, anyway. What goes in is what you get out.
Remember IMDb from ten years ago? You needed a lot of relational databases and tables joining together, bringing them through, showing them on the screen, updating them, propagating them, and splitting them up again. That was a lot of work just to publish movie descriptions and reviews.
Today’s idea is representing data as an object in almost the same way it’s going to be presented.
Elasticsearch has tables rather than documents and relations we think of from traditional databases.
WHERE IS ELASTICSEARCH GOING WITH THIS?
They’re still a hidden secret amongst the geo communities. At one point, several Elastic things were coming out, and they were popular. I even taught an Elasticsearch workshop at the FOSS4G in Bonn four years ago. Then the interest seemed to have faded.
I think people may struggle to get their heads around the geo point and the geo shaping. It looks like they’ve introduced a new spatial index in the latest version, and I think it’s a robust toolkit.
I’m working on a use case for the insurance industry and around exposure management with a plan to launch by the end of the year. These companies insure several million properties or buildings across Europe, or even globally.
They need to see on a map where their hotspots are. Exposures are the places where they’d have a lot of properties insured in one place. They intersect those with things like high risk of flooding, crime, or even wildfire. This is quite hard to do at the moment with traditional geospatial.
We’ll be using Elasticsearch and Uber’s H3 index. We bring in the data, enrich it, and tag it with the H3 index. We can deal with millions of documents with this power. We’ll use spatial indexes for things like intersecting with flood or wildfire boundaries. We’ll use aggregations to quickly aggregate up by either Uber’s H3 cell or some admin boundary. It’s exciting, and we’ve already built a demo on it.
I think Elasticsearch is a great tool and if you have lots of data (maybe not the most sophisticated geo stuff in the world, but lots of it), and you want to visualize and aggregate it up, it’s definitely worth a look.
IS THERE AN ELASTICSEARCH ECOSYSTEM?
Elastic is the company that makes Elasticsearch. There’re a couple of sister products of Elasticsearch that you might hear people refer to.
The ELK stack is Elasticsearch, Logstash, and Kibana. They work really well together.
We talked about dashboards. Kibana is like a front end to Elasticsearch. With Elasticsearch out of the box, all you get is a download that runs in the background. If you pull up Port 9200, you’ve got an API there that you can start using. Kibana points to your Elasticsearch installation and gives you a spectrum of powerful dashboarding and widgeting capability in front of that.
It supports geo maps as well. You can start ingesting a bunch of log data or addresses for free. You put a geo point on them, and you get a nice visual view; you get lots of cool graphs and charts, and a map showing and representing some of those aggregations we’ve talked about.
Logstash is an excellent way of building pipelines. Often, Postgres might still be your system of record, and you’ll want to create a pipeline to take your data from Postgres, transform it, tag it along the way, and then put it into Elasticsearch.
Logstash is another part of the toolkit that allows you to do that easily. It has adapters for things like Apache and others. The list is as long as your arm. There are add-ons and plug-ins for any data source.
I have to say, for our own requirements, we found Logstash a little heavy because we move data over weekly or monthly. We wrote our scripts for doing that. But Logstash is an excellent solution, particularly if your data is dynamic and is changing all the time.
Thanks to Mark for his insights on Elasticsearch. I’m sure it’s not the last we hear of them. Also, don’t forget to check out what placekey.io is doing. They’re creating an industry standard for identifying physical places. This could change the way we do spatial joins and combine datasets. It’s worth investigating.