Shuffle Sharding - the secret sauce behing AWS reliability

Snehasish RoySnehasish Roy
5 min read

In a typical client-server architecture, requests from the client are forwarded to a random Application Server by the Application Load Balancer. When the request is a poison pill i.e. it crashes/hungs the server receiving the request, then all of your application servers will eventually crash/hung one by one if the client keeps on retrying (as the requests will be eventually forwarded to all the servers randomly).

Architecture Diagram with Application Load Balancer and eight worker nodes marked unavailable

Source


Will Simple Sharding help?

One way to isolate this problem is to create virtual shards i.e. isolate requests coming from clients so that they are served only by some specific instances. Generally this is done based on some hashing e.g. modulo / consistent.

In the below diagram, all the requests originating from client names starting from A or B goes to Worker1 and Worker2. So if Alpha request is a poison pill, only two servers will get impacted. Remaining servers will be unaffected as the requests never reach there.

Do note that the Bravo request won't be fulfilled either because both Worker1 and Worker2 have crashed/hung due to the poison pill request from Alpha. One dirty fish has poisoned the entire lake.

This strategy has definitely reduced the failure radius but still requests from Bravo are not being served. Can we do better?

Architecture Diagram with Application Load Balancer, shards and flow of customer to worker

Source


Can increasing the shards help?

The issue with the above was the way we created shards — there were simply too few combinations available — as each instance can be mapped to only one shard.

If we allow each instance to be mapped to multiple shards, then we can increase the number of combinations available and reduce our unavailability.

In the below diagram, we created 8 shards to map 8 clients. Previously we only had 4 shards.

Architecture Diagram with Application Load Balancer, eight shards with two worker nodes per shard, and each worker being assigned to two different shards

Source

Note that each workers are mapped to multiple shards e.g. Worker2 is mapped to Shard1 and Shard2.

Now if requests from Alpha crashes/hungs up Worker1 and Worker2 — requests from Bravo would still continue to work as they are mapped to Worker2 and Worker3. Since Worker3 is still alive, requests from Bravo would still get served.

Customer NameWorkers
AlphaWorker-1 and Worker-2
BravoWorker-2 and Worker-3
CharlieWorker-3 and Worker-4
DeltaWorker-4 and Worker-5
EchoWorker-5 and Worker-6
FoxtrotWorker-6 and Worker-7
GolfWorker-7 and Worker-8
HotelWorker-8 and Worker-1

Let’s Shuffle!

Hope you were able to grasp the fundamentals using the above example. The idea is to create more shards with as few overlaps as possible. Let’s see how we can make a generic solution.

Given n application servers, if we randomly choose k instances and make them part of one virtual shard, then each shard would have k instances. The probability of 2 shards with 100% overlap would drastically go down as we increase the value of k.

To simplify, if we have 10 cards and we want to choose 4 cards among them, then there will be a total of 10 choose 4 combinations = 210 total combinations. If we randomly generate 2 such combinations, then the probability of them having 100% overlap, i.e., all 4 cards being the same, would be (1 / 210) ~= 0.47%.

Now to relate this analogy to our problem statement, given 10 application instances, if we randomly choose 4 instances to create a virtual shard, the probability of 2 shards with 100% overlap would be ~0.47%. This indicates that a poison request can only impact 0.47% of shards. We can further reduce this by increasing the total number of instances or increasing the shard size.


Talk is cheap, show me the code!

Shuffle sharding can be implemented in two ways — stateless or stateful.

Stateless, as the name indicates, does not persist any state data, i.e., shard information in a DB Store. It simply identifies the target application instances from an identifier. The target application instances are the members of the virtual shard mapped to that request.

Stateful sharding goes a bit further and persists the information of the shards to a database, which allows further customization of the way shards are created, e.g., customizing the shard assignment strategy by tuning weights.

Let’s see how we can implement stateless shuffle sharding using a simple strategy of generating multiple hashes from a unique identifier, followed by mapping that hash to a unique node.

public Set<Integer> assignNodes(String customerId) {
    Set<Integer> assignedNodes = new HashSet<>();

    // Need to find 4 nodes
    for (int i = 0; i < 4; i++) {
        // Create a unique input for each Node selection
        String hashInput = customerId + ":" + i;
        // maps a hash to a NodeId
        int nodeId = hashToNodeId(hashInput); 

        // Handle collisions by trying the next available Node
        while (assignedNodes.contains(NodeId)) {
            nodeId = (nodeId + 1) % config.getTotalNodes();
        }
        assignedNodes.add(nodeId);
    }
    return assignedNodes;
}

We can also leverage multiple hash functions, similar to a Bloom Filter, to generate multiple hashes instead of generating multiple inputs from the identifier, in case multiple unique inputs can't be generated.


Shuffle Sharding is a very powerful and versatile technique which is not just used by AWS but in other popular Open Source Projects like Grafana Loki and Grafana Mimir. Thank you for reading. Hope you learnt something new. If you have any questions, please do comment.

Appendix


10
Subscribe to my newsletter

Read articles from Snehasish Roy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Snehasish Roy
Snehasish Roy

Experienced Software developer with a demonstrated history of working in the financial services and product industry. Worked on various projects over the years to improve customer satisfaction by making things faster and better. Proficient with functional and reactive paradigm. Skilled in Java 8, MVC & Spring framework, Distributed Databases (MemSQL, Greenplum, Aerospike) along with Kafka, ElasticSearch and Kibana Stack. Completed Bachelor of Technology (BTech) with Honors focused in IT from IIIT Allahabad with a CGPI of 9.15. Highly interested in solving complex technical/business problems by leveraging distributed systems. Occasionally have found security bugs while pen-testing random android apps e.g. BetterHalf.ai (Did a responsible disclosure). Competitive Programming Stats: LeetCode: Max Contest Rating of 2011, with a worldwide ranking of ~7K out of ~220K users. Best ranking of 228 in LeetCode Biweekly Contest 56. Second Best ranking of 466 in LeetCode Biweekly Contest 60. Third Best ranking of 578 in LeetCode Biweekly Contest 74. CodeForces: Max rating of 1423 (Specialist) CodeChef: Max rating of 1665 GeeksForGeeks: Achieved 27 rank out of ~1200 contestants in GFG Coding Challenge https://practice.geeksforgeeks.org/contest/the-easiest-ever-coding-challenge/leaderboard/ https://drive.google.com/file/d/1YS8GoZtE2nH0dnlcGWqWnjxzZbVt1WFh/view Facebook Hacker Cup 2021 Qualification Round 2021: Rank 746 Worldwide, Rank 149 India Round 1 2021: Rank 1327 Worldwide, Rank 220 India Round 2 2021: Rank 2775 Worldwide, Rank 527 India https://www.facebook.com/codingcompetitions/hacker-cup/2021/certificate/661693404384805