Salta: a small-footprint reverse geocoder

While developing Pixwire, we wanted to add a simple feature: add automatic location tags to pictures from their embedded GPS coordinates. The problem? Existing reverse geocoders are either paid, require expensive infrastructure, or do not allow storing the results in a database. This is why we created Salta: a simple, low-precision, reverse geocoder that is super easy to deploy and operate!

Want to see Salta in action? Join the Pixwire beta!

Pixwire is the chat app for photos that matter. With conversations and photo management all in one place, it is the best way to share high-quality photos and videos with the people you care about!

What is reverse geocoding?

Geocoding is the process of converting a physical address to GPS coordinates, so for example saying that the Eiffel Tower is at coordinates (48.8583, 2.2945). Reverse geocoding is the opposite of that, so transforming GPS coordinates to a physical address.

This is usually useful when you have GPS coordinates that you want to transform into a human-readable format.

Existing reverse geocoders

Multiple open-source and commercial reverse geocoders are available but if you have simple needs they are often too complex or too expensive to operate.

For example, despite them being amazing open-source projects:

  • Nominatim requires a large Postgres database with PostGIS installed and is complex to install and update.
  • Pelias relies on ElasticSearch.

Commercial APIs often include free tiers but limit what you can do with the data, they usually prevent you from permanently storing the returned data. Some examples include HERE (“Storing results from geocodes is not allowed for Location Services products in our Freemium and Pro plans.”) or the Google Maps API (section 10.5.d). Mapbox also excludes this from their free plan.

So, while all of these services are great, none of them was really a good match for our needs.

Introducing Salta

Salta is a low-footprint reverse geocoder that:

  • Has no dependency
  • Is easy to deploy, it only requires a minimal config file
  • Easy to update, just restart the service
  • Low footprint, only uses what’s strictly necessary

In exchange for all of this, the trade-off is low-precision (e.g. at the city level).

logo.png Representation of Atlas, on his trip to New Zealand

So, how does it work?

Who’s on first

Under the hood, Salta uses the who’s on first database, a database containing most places organized by country and stored in Git repositories. This means we can just download the data we need, and do updates with a simple git pull.

Optimising memory usage and start-up time

Salta simplifies the polygons using Visvalingam’s algorithm, this reduces memory usage in exchange for less precision. As that operation is expensive when performed on every polygon, the simplified polygons are then cached on disk.

Salta can be run in two modes, regular or cache only.

In regular mode, on start-up, Salta pulls the git repositories and then computes the hash of every GeoJSON file. When the source hasn’t changed the cached file is used, otherwise, the simplified polygons are recomputed and cached. This allows for a faster startup and it also means multiple instances can share the same cached files.

The cache-only mode is available once Salta has been launched in regular mode at least once, or the cache files have been synced from elsewhere. In this mode, Salta only loads the available cache files, making start-up super fast. In this mode, loading the countries and cities data for the whole world takes only a few seconds.

Reverse Geocoding

Once every polygon is generated and ready to be used, the next step is to load them into some sort of index that will allow us to quickly find all the polygons containing the given point (GPS coordinates). To do this, Salta uses the S2 library. S2 provides fast geometry operations and is designed to deal with spherical shapes (useful when you live on a globe!) and geography databases, just what we need!

Polygons are loaded into a shape index and associated with their location data, when a request comes in the index is queried and returns all polygons containing the requested coordinates, one for each location information (country, region, city, …).

Performance

First, let’s have a look at the load times and memory usage for different countries:

CountryLoad time (with cache)Initial load time (no cache)Memory usageTotal disk usage (cloned repositories/cache files)
New Zealand1s4mn23MB791MB (738MB/53MB)
France3s3mn152MB4GB (2.9G/914MB)
United States6s15mn259MB8.6GB (6.8GB/1.8GB)

Loading all places types. Memory usage after garbage collection. SSD drive, AMD Ryzen 3700X.

When using the cache only mode the initial load time can be significantly faster, when loading the whole world is takes:

  • A few seconds to load countries and localities
  • Around five minutes to load all place types

In terms of memory usage for loading the whole world, depending on the place types loaded:

Place typeMemory usage
Countries73MB
Countries + Localities512MB
Countries + Regions + Localities818MB
All1.3GB

The mean response time for HTTP requests is around 200-300μs.

When should I use Salta?

Salta is a great fit when:

  • You don’t need precision: a city or suburb-level precision is enough.
  • You want an easy deployment with no dependencies
  • You need a lightweight service

Conclusion

While there is still plenty that could be improved, we have been using Salta in production for Pixwire for more than a year now with no issues to report so far. If you need a simple, low-precision, reverse geocoder, give it a try!

Salta is available on GitHub: https://github.com/pixwire/salta

8
Subscribe to my newsletter

Read articles from Sylvain Cleymans directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sylvain Cleymans
Sylvain Cleymans

Founder of Pixwire.