Understanding Consistent Hash Ring: A Simple Explanation

hai nguyenhai nguyen
2 min read

Problem 1: We have a set of servers and need to distribute data and requests to those servers in a manageable, uniform way. If the data size increases (or the number of requests we need to serve increases), we can add or remove servers from the cluster as needed.

Solution: Consistent hashing ring works differently from traditional hashing used in hash tables or dictionaries (in the Python world). Instead of placing data in a normal "array," we place the hash value into a hash "ring".

We hash both servers and requests, then place the results into a ring. In the picture above, request 1 will be served by node 1 (the closest clockwise node).

Problem 2: What happens if we need to add or remove servers from the cluster?

Solution: Simply remove them from the ring. Of course, we need to redistribute data to active nodes. The redistribution process is unavoidable, but the term "consistent" in "consistent hash ring" ensures that the amount of redistribution work is reasonable.

Problem 3: How do we ensure each server gets the same amount of data/requests (balancing property)? And if a server has more capacity than others, how do we allocate more work to that server?

Solution: There is no way to create a perfect balance cluster. However, there is a technique to make the balance "more perfect." This technique involves using multiple hash functions and virtual nodes.

For each server, we use multiple different hash functions (3 in the picture above) and place the results in the ring as "virtual nodes." Server 1 has a larger capacity than Server 2 and Server 3, so we hash Server 1 three times, giving it a higher chance of being allocated more work. Meanwhile, we apply 2 hash functions to Server 2 and Server 3. As a result, the ring will contain 3 virtual nodes for Server 1 and 2 virtual nodes for Server 2 and Server 3.

For requests, we still use only one hash function. The concept of virtual nodes and multiple hash functions applies to data and servers, not to requests. In the example above, Server 2 will now handle request 1 because a virtual node of Server 2 is the closest clockwise node to request 1.

Conclusions: The idea of consistent hash ring algorithms is simple but very effective. It plays a vital role in many distributed systems such as Cassandra, DynamoDB, Riak, and Memcached.

References:

Consistent hashing - Wikipedia

Virtual nodes | DSE 6.7 Arch guide (datastax.com)

System Design: Consistent Hashing | by Vyacheslav Efimov | Towards Data Science

0
Subscribe to my newsletter

Read articles from hai nguyen directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

hai nguyen
hai nguyen