Consistent Hashing in Load Balancers
Introduction:
With the increase of load on servers pumped by lots of requests and customer urge for instant responses, the load balancer became an important aspect of the distributed system. Along came a chain of problems on how to distribute the requests consistently among the servers, what if we introduce new servers or lose some? The same problem was faced by someone else and they found a way out, so we don't have to. Just read it out.
fig: How does a load balancer look?
Consistent Hashing:
The Problem:
Now, As a simple solution for the even distribution of requests, we can simply create a hash key and use a modular function to distribute among servers.
Example:
4 servers -> (Hash key) % 4
Hash key | (Hash key) % 4 | Server |
123456 | 0 | server 1 |
234560 | 0 | server 1 |
345661 | 1 | server 2 |
456662 | 2 | server 3 |
567563 | 3 | server 4 |
3 servers -> (hash key) % 3
2 servers -> (hash key) % 2
1 server -> No need of distribution
This looks pretty easy and solves the problem, but like all easy-looking solutions, it has loopholes.
What if I remove a server from the 4-server setup, the hash formula be:
(Hash key) % 3.
This distributes the incoming requests well but what about the already mapped requests? They are lost if not mapped again.
The Solution:
Consistent Hashing helps evenly distribute the requests without remapping all the requests but only a fraction of them.
This is a Hash space:
X0 | X1 | ... | Xm |
Now collect both ends and make a ring, this is a Hash ring:
Now we assign places to not only the keys on this ring but the servers too:
So if you want to ask where is the hash key 0 (K0) saved, We read it clockwise and see the next server which is server 0 (S0). This simple change in POV, to see a space as a ring solved many problems.
Now when we add or remove a server, only a fraction of the keys will be remapped. Saving from cache misses, the approach is more consistent than the traditional approach.
Cache misses: refers to a state where data requested by a component (such as a processor) is not found in the cache
Fig: Adding a node: When we add the S4 server, reading clockwise from K0, now K0 is saved in S4 and S0 is free. No other key is affected.
Fig: Removing a node: When we remove the S1 server, reading clockwise from K1, now K1 is saved in S3. No other key is affected.
Problems in approach:
1) Uneven partitions of servers on the ring: Since we assign random space for servers and also servers get removed or added, there is a large possibility of uneven partitions between servers.
Fig: Partition 1 < Partition 2
2) possibility of non-uniform key distribution: There is also the possibility a lot of keys are assigned to a single server while others are free.
Fig: All 3 keys are assigned to S3 while others are free.
To cater to these problems we introduce virtual nodes.
Virtual nodes
Here all the servers and denoted by several virtual nodes. For eg: S0 is denoted by s0_0, s0_1, and s0_2 whereas S1 is denoted by s1_0, s1_1 and s1_2. The partitioning (the edges) is controlled by the server denoted over them,i.e., the S0 edge is controlled by Server 0.
Partitioning in Server: It occurs when the CPUs on a server are separated into individual sections where each section acts as a separate system.
As the number of virtual nodes increases, the keys are more balanced. This is because the standard deviation gets smaller with a larger no of virtual nodes.
Standard deviation: It measures the spreadness of the points on a cartesian plane. The more the standard deviation, the more the spread.
Conclusion:
To wrap it up, Consistent hashing is an effective technique to easily scale your system horizontally. Its benefits include:
Minimal redistribution of keys when servers are added or removed.
Even distribution of data
Consistent hashing is also used in Databases for data partitioning. Between consistency and availability, developers have always made tough choices choosing availability. Even though consistent hashing has brought some C of the CAP theorem, it still cannot be called a 100% consistent system. A system that is consistent and available, imagine what wonders It would bring to Hotstar(a live streaming platform) for IPL(Indian Premier League) seasons. Now this sounds like another interesting blog, but for next time.
Bibliography:
Subscribe to my newsletter
Read articles from Ishita Chauhan directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by