Will approach this topic in the form of a case study in order to better understand the practical application.

Title - Scaling Request Routing in an Online Food Delivery Platform

Background:

"QuickEats" is a growing online food delivery platform serving millions of users across different cities. As demand increases, the platform has adopted a microservices architecture deployed across a distributed system.

One of the critical components is the Order Assignment Service, which matches incoming food orders to the most suitable restaurant node (based on location, load, and availability). These restaurant services are deployed across multiple servers for scalability and resilience.

The Challenge:

With increasing traffic and a growing number of restaurant services, QuickEats faces two key problems:

Load Imbalance – Some servers are overloaded with too many requests while others are underutilized.
Frequent Node Changes – When new restaurant service nodes are added or removed (due to scaling or failure), many existing order mappings become invalid, causing:
- Cache invalidation
- Session mismatches
- Disrupted customer experience

Problem Statement:

How can QuickEats design a scalable, fault-tolerant, and efficient load balancing strategy that ensures:

Even distribution of orders across restaurant nodes
Minimal disruption when nodes are added or removed
High availability and low latency

You are tasked with proposing a solution that incorporates:

Load balancing strategies to distribute traffic evenly
Consistent hashing to ensure minimal re-routing of orders during scaling events

Goals:

Demonstrate how traditional hash-based routing leads to inefficiencies. Show how consistent hashing helps minimize remapping

Explain how cnsistent hashing combined with virtual nodes or load-aware mechanisms can ensure balanced and scalable traffic routing.

Solution:

Lets dive deep into traditional hash-based routing to understand how a load balancer works. Suppose we have N requests which have to be balanced evenly on M serveres.

Considering M=4, we will solve the case study to distribute incoming N requests to 4 servers.

Our primary approach is create a mapping between requests and servers. We will use hashing for this purpose. Passing requestId and serverID as the input parameters to a hash fucntion h, we get the Node Index as given below :

Hash of servers	Hash of requests	Hash of requests	Node index calculated per request
h(s1)	h(r1)	h(s1)%M = f1	h(r1)%M = f4
h(s2)	h(r2)	h(s2)%M = f2	h(r2)%M = f3
h(s3)	h(r3)	h(s3)%M = f3	h(r3)%M = f2
h(s4)	h(r4)	h(s4)%M = f4	h(r4)%M = f1

Node index for each request signifies the index of the server it is mapped to.

The requestId carries ome information of the user and in practice, the hash of a request never changes given the hash function is not altered. Hence, to speed up communication, each server has its own cache and some data is cached

The above mechanism works fine until we add new Nodes or removing existing Nodes from the system.

Adding a new node or removing an existing server would mean change in the value of M which would result in re-routing of all requests. As a result, most of the data stored on the cache of each node would also become unusable.

Here we come to our next problem -

Re-routing of all requests leading to M more opreations
Invalidation of cache.

The best solution is to minimize the impact of adding/removing nodes from the system on the number of requests that need re-routing as well as cache.

This is achieved through Consistent Hashing

Consistent Hashing

Consistent Hashing procedure can be divided into three steps -

Create a Hash Ring ( practically 0-2³²).
Evenly distribute servers along the Hash Ring.
Map hash of requestId on the Hash Ring and route it to nearest server.

Step-1 : Create a Hash Ring

For the sake of simplicity, we will consider a Hash Range of 0-100. We already have 4 servers up and running but we now want to introduce a 5th server. We will hash all our servers and incoming requests using our hash function h1 and calculate the node indices using the hashed values.

Step-2 : Evenly Distribute Servers along the Hash Ring

After calculating the Node Indices, now we map each server to its requested Node index as shown below.

s4, s1, s2 and s3 are mapped to indices 1, 25, 50 and 75 respectively.

Step-3 : Map hashed requestId on the Hash Ring and route it to nearest server.

Now we hash all requestIDs and calculate their respective node indices using the formula h(requestId). For mapping the each requestId to a server, we move in clockwise direction and allocate the nearest server to the requestId.

r1, r2, r3 and r4 are requests mapped to their nearest servers.

Now. lets add server s5. Assume that h(s5) equlas 45 and new server s5 is added at 45th posiiton. This will imply that only the requests mapped between 25-45 will be remapped to s5 instead of s2. Rest all requests will remain as it is thereby having minimum impact.Similarly if s3 is removed, all requests mapped between 50-75 will be remapped to s4 thereby not affecting other parts of the Hash Ring.

The above approach solves the problem of adding/ removing nodes but will lead to a skewed distribution of requests which cause one server to overload and crash. So, a workaround for this situtaion is to use Virtual Nodes.

Using multiple hash functions to hash each sevrerId will give us multiple indices for a single node. Hence, the total mapping of servers on the Hash Ring increase and provide a wider spectrum for the requests to navigate. Doing this also ensure the data distribution to be 1/N. Virtual bode are nothing but multiple indices geenrated for each server and give us an illusion of more nodes present.

Demystifying Consistent Hashing — The Backbone of Scalable Distributed Systems

Title - Scaling Request Routing in an Online Food Delivery Platform

Consistent Hashing

Subscribe to my newsletter

Prabhsimran Bajaj

Prabhsimran Bajaj