Unlocking the secret: How to Find Nearby Restaurants in Seconds!
In our day to day we use features like find restaurants near me or show dates near 10km, but ever wondered how does it work? Well don't worry because this blog will help you design the most scalable system to find anything within any radius.
Note: We will be using PostgresQL Db and NodeJs to create the proximity server.
Finding distance between 2 Coordinates
We all know about latitudes and longitudes and how they can be used to get the exact location. But what if we have 2 co-ordinates, for example:
Co-ordinate A (72.2, 73.4)
Co-ordinate B (73.2, 74.8)
Any idea how to find the distance between them?
Arithmetically there is Famous Formula called Haversine Formula which is used to find distance between any 2 coordinates on a sphere.
Which goes like,
Distance=2r⋅arcsin(hav(Δlat)+cos(lat1)⋅cos(lat2)⋅hav(Δlon))
You can read more about it here
Database Design
As you can see that we can't use Haversine formula for all the records to find distance since its pretty compute heavy. Well Thank god PostGIS creator to save us.
In PostgreSQL Database, We have PostGIS extension which is a spatial database extender. It enables handling spatial data (geographic information systems or GIS data). It adds support for geographic objects, allowing location queries to be run in SQL.
Some key features and benefits of using PostGIS include:
Geometric Data Types: PostGIS adds support for geometric data types like points, lines, polygons, and other spatial objects, enabling storage and manipulation of such data in the database.
Spatial Indexes: PostGIS provides spatial indexing capabilities, which greatly improve the performance of spatial queries by allowing the database to quickly find spatial objects within a specified area or distance.
Enough with the Theory, Lets start coding!
Setting up Postgres DB and PostGIS!
If setting postgresQL using docker, then PostGIS extension needs to be externally downloaded inside the PostgresQL or you can run a PostGIS image container and connect it to the PostgresQL Image container.
For the blog we will use postgis/postgis Image to run our PostgresQL DB.
But if you are really interested to install postgis as a seperate extension in our postgresDB container then you can follow this blog.
First step, Setup a docker-compose.yml to run postgis/postgis Image
version: "3.8"
services:
postgresqls:
container_name: "postgresqlpostgis"
image: "postgis/postgis"
ports:
- "5433:5432"
volumes:
- db_persdata:/var/lib/postgrespers
environment:
POSTGRES_USER: myuser
POSTGRES_PASSWORD: mypassword
POSTGRES_DB: mydatabase
networks:
- default
networks:
default:
volumes:
db_persdata : {}
Now run up your postgresQL service up using,
docker-compose up -d postgresqls
Lets verify if you have postgis extension in your DB. To do it connect to your postgresQL db using psql or using Dbeaver application just like me.
SELECT name, default_version, installed_version
FROM pg_available_extensions
where name like '%postgis%';
This should output something like this
Setting up Zomato Sample Dataset
Time to get our hands dirty!
We will use the Sample Zomato Dataset to test out our Queriesss!
Note: the dataset might have some duplicate keys, so just skip those rows when importing.
After importing the dataset from CSV to your Postgres DB, it will look like this
Total rows: 24,300.
Finding Distance using Queries
Now to find distance between any 2 coordinates, the query will go like:
select ST_DistanceSphere(
ST_MakePoint(77.046515, 28.410273), -- (long, lat)
ST_MakePoint(77.07077, 28.477117)
)*0.001 as distance_in_km
This query returns me distance in km's but do note that this is the smallest distance between two coordinates on a map. so if you are testing this distance by checking them on google maps then think again!
Lets see how fast our query to find the restaurants that our 10km near me is!
select *
from restaurants r
where ST_DistanceSphere(
ST_MakePoint(longitude , latitude),
ST_MakePoint(77.97858560, 27.20814170) -- this is ur coords
)*0.001 < 10;
The query returns me 200 records out of 24k in 134ms.
That's good speed but we are only catering to 24k records which are less. For a heavy application we will need to optimize it even further.
Optimizing our Query
On hearing optimization, the first thing that would come to any developers mind is to put index. But we have 2 properties here: lat and lang and putting index on both the records wont do justice. Why so?
When we index with latitude and longitude, it covers a very large area which is unnecessary. We will only need the points which lie within the box that covers both latitude and longitude mentioned.
We have many unique ways to index geo spatial data, but 2 common ways are GeoHashing, bounding box. These both are unique in their own ways and serve different purpose.
Bounding box
A bounding box index, also known as an n-dimensional index or spatial index. A bounding box is a rectangular box that completely encloses a geometric object.
When we create a bounding box index on a spatial column(geography column), the database creates these bounding boxes for each geometric object in the column and stores them in the index. This allows the database to quickly determine which objects might intersect or be near a given query geometry without having to perform expensive geometric calculations on all objects in the database.
This is a bounding box that is closed between (min lat, max lat) and (min long, max long).
Implementing Bounding Box using POSTGIS GIST
To implement bounding box in our Zomato dataset, we will need to create a special geography column. The geography
type in postgresDb column is designed to store geospatial data using geographic coordinates (latitude and longitude). This helps in geospatial indexing and thereby helps create optimized queries to find distance, area between coordinates.
Let's update our Zomato dataset now:
Step 1: add geography column named location
ALTER TABLE restaurants ADD COLUMN location geography(Point, 4326);
Step 2: update all the rows to insert the POINT(long, lat) in the location column created.
update restaurants
set location = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)
Note: 4326 here is to tell the DB that its WGS 84 coordinate system.
Testing our queries on update DB:
Now, lets write a new query to fetch all the restaurants near 10km using the geography column
SELECT *
FROM restaurants r
WHERE ST_DWithin(
r.location,
ST_SetSRID(ST_MakePoint(77.97858560, 27.20814170), 4326),
10000
);
The query returns 200 records under 366ms. Ah shucks! That's pretty bad again.
Well, we haven't added any indexing yet. So lets do that first.
Run this query to add indexing on geography column.
CREATE INDEX zomato_geo_idx ON restaurants USING GIST (location);
Lets Run our Query again to see the new faster execution time.
61ms🥳. Well that's a pretty good achievement! and also a good point to end it here.
Bounding box is fast and precise way to create proximity server but they are not very compact in size and can be slow for large scale applications where GeoHashing would be a ideal choice.
For now!
Congrats🎉 on making till here!
Happy coding fellow developers:)
Subscribe to my newsletter
Read articles from vansh Kapoor directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
vansh Kapoor
vansh Kapoor
Developing large scale application for multiple clients in Thoughtworks. Love to share my knowledge in React, Js and clean coding paractices and travel.