Consistent Mapping | System Design

Arpit SinghArpit Singh
3 min read

Understanding Consistent Hashing in System Design

In the realm of distributed systems, consistent hashing has emerged as a critical technique for managing data distribution across multiple nodes. In this blog, I will guide you through the intricacies of consistent hashing, its implementation, advantages, and how it plays a pivotal role in system design.

What is Consistent Hashing?

Consistent hashing is a hashing scheme that minimizes the amount of data remapping required when nodes are added or removed from a distributed system. Unlike traditional hashing methods, which may require all keys to be reassigned when the number of nodes changes, consistent hashing allows for a more efficient approach. This is achieved by mapping both data objects and nodes onto a hash ring structure.

How Does Consistent Hashing Work?

  1. Hash Function Selection: The first step involves choosing a suitable hash function. This function should produce a uniform distribution of hash values. Common choices include MD5, SHA-1, or SHA-256.

  2. Creating the Hash Ring: We represent the output range of the hash function as a circular structure known as the hash ring. The hash values wrap around, allowing for a continuous space.

  3. Node Assignment: Each node in the system is assigned a position on this ring based on its hash value. For instance, if we have three nodes, they might be placed at different points on the ring corresponding to their hashed identifiers.

  4. Data Assignment: When we want to assign a data item to a node, we hash the data item using the same hash function and find its position on the ring. The data item is then assigned to the first node encountered when traversing clockwise from its position.

  5. Handling Node Changes: When nodes are added or removed, only a small subset of keys needs to be remapped. For example, if a new node is added, it will take over some keys from its neighboring node(s) on the ring without affecting other keys.

Advantages of Consistent Hashing

  • Load Balancing: By distributing data evenly across nodes, consistent hashing helps maintain balanced workloads even as data volume grows.

  • Scalability: It allows systems to scale easily by adding or removing nodes with minimal disruption.

  • Minimal Remapping: Only a fraction of keys need to be reassigned when nodes change, preserving system stability.

  • Fault Tolerance: Data remains accessible even if some nodes fail, thanks to key replication across multiple nodes.

Practical Implementation

To illustrate how consistent hashing can be implemented programmatically, let’s consider a simple Java example:

javaimport java.util.SortedMap;
import java.util.TreeMap;

public class ConsistentHashing {
    private final SortedMap<Integer, String> circle = new TreeMap<>();
    private final int numberOfReplicas;

    public ConsistentHashing(int numberOfReplicas) {
        this.numberOfReplicas = numberOfReplicas;
    }

    public void addNode(String node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            int hash = getHash(node + i);
            circle.put(hash, node);
        }
    }

    public void removeNode(String node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            int hash = getHash(node + i);
            circle.remove(hash);
        }
    }

    public String getNode(String key) {
        if (circle.isEmpty()) return null;
        int hash = getHash(key);
        SortedMap<Integer, String> tailMap = circle.tailMap(hash);
        Integer targetHash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        return circle.get(targetHash);
    }

    private int getHash(String key) {
        return key.hashCode(); // Simplified for demonstration
    }
}

In this code snippet:

  • We create a ConsistentHashing class that maintains a sorted map representing our hash ring.

  • Nodes can be added or removed with minimal impact on existing data assignments.

  • The getNode method retrieves the appropriate node for any given key by finding its position on the ring.

Conclusion

Consistent hashing is an elegant solution for managing data distribution in distributed systems. Its ability to minimize disruption during scaling operations makes it invaluable in real-world applications where uptime and performance are critical. By understanding and implementing consistent hashing effectively, we can build robust systems that gracefully handle changes in load and infrastructure.

Thanks!

0
Subscribe to my newsletter

Read articles from Arpit Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Arpit Singh
Arpit Singh

AI engineer at Proplens AI, a final year student pursuing bachelor's in computer science and engineering.