Etsy optimised 1,000,000,000,000's of cached data handlings 1,000,000's lookups/sec

Story of Consistent Hashing with a secret sauce.

Etsy, uses memcached (to cache db queries, expensive calculations etc) and Varnish (to cache internal HTTP requests) to improve performance and reduce load.

They have a pool of caching servers, each running memcached or Varnish.

How to distribute Cached data across Servers?

1. Modulo Hashing

It is implemented as follows

Hash the key to a number
Modulo it by the size of cache pool to store key value pair to a specific server

Example

We need to store some_key and some_value.
We have 3 host servers numbered as 0, 1, 2
We map some_key to a number (via a hash function) let's say number is 4
Taking modulo 4 % 3 == 1, hence we will store some_key to "1" server

Pros

Each cached value is stored on a single host

Cons

Changing the size of the cache pool will cause most cache keys to hash to a new server leading to tons of cache misses.
- In above example, if we add a fourth server.
- Taking modulo 4 % 4 == 0, some_key will now be cached at "0" server.
- Worsening the cache hit rate

2. Consistent hashing

It is implemented as follows

Fix the Hash space and divide it into large contiguous buckets
One (or more) buckets are assigned to a cache host
Keys that hash to a Range are looked up on the matching host

Example

Let's say, Hash Space = 1,000
We divide it into 3 buckets of [0, 333), [333, 666), [666, 999]
We have two hosts,
- host 0 is assigned two buckets viz [0, 333), [333, 666)
- host 1 is assigned one bucket viz [666, 999]
We need to store some_key and some_value.
we map some_key to a number (via a hash function) let's say number is 44.
As "44" falls in first bucket range, it is stored in host 0

Pro

Changing the pool size means moving few buckets from existing server to a new server. Hence it moves only a small number of Keys
- In above example, if we add another server host 2 and move [333, 666) bucket to it
- Only 333 keys will be moved to a new server i.e. 33% of key movement.
- We can play around with bucket size to further decrease key redistribution.

They leverage Ketama to implement Consistent Hashing

How to serve Hot keys?

Certain cache items are read and written more than others.

Ideally, a good hash function shall distribute the "hot keys" among many servers

"Celebrity" Key

At Etsy, they've seen keys that are hit quite often, and stores a large enough value, that it saturate the network interface of their cache host.

The large number of clients are collectively generating a higher read rate than the cache server can provide.

To fix this they explored following solutions

Solution 1: Horizontal Scaling

Adding more cache hosts doesn't help because it only changes the distribution of keys to hosts i.e. it would only move the problem key and saturate a different host

Solution 2: Faster network cards

It will have a substantial hardware cost.

Solution 3: Cache Smearing

They used consistent hashing and added a small amount of entropy to certain hot cache keys for all reads and writes.

This effectively turns one key into several, each storing the same data and letting the hash function distribute the read and write volume for the new keys among several hosts.

Example

Let's say celebrity_key is a Hot Key
Cache smearing appends a random number in a small range (let's say, [0, 5)) to the key before each read or write. So for e.g. celebrity_key will be transformed as celebrity_key_1, celebrity_key_3, celebrity_key_0
As celebrity_key_1, celebrity_key_3, celebrity_key_0 keys are different, they will be hashed to different hosts, sharing the load among the pool.

Tradeoffs in choosing the range of random number

Too-large range duplicates the cached data more than necessary to reduce load, and can lead to data inconsistencies
A too-small range doesn't spread the load effectively among the cache pool (and could even result in all the smeared keys still being hashed to the same pool member)

Conclusion

In summary, combination of consistent hashing and cache smearing has been a great combination of efficiency and practicality

Allowing multiple hosts to serve requests for a Hot key, prevent any one host from being overloaded, increasing the overall read rate of the cache pool.

At Etsy, they manually added cache smearing to hottest keys, with entropy in a small range like 0 to 8 or 0 to 16.

Understanding the Foundational Principles is the key to Effective Learning!

Follow along to Improve System Design Skills.

Reference and Image Credit

How Etsy caches: hashing, Ketama, and cache smearing

Scaling Caching at Etsy: A Story of handlings 1,000,000's of lookups / sec

Table of contents

How to distribute Cached data across Servers?

1. Modulo Hashing

Example

Pros

Cons

2. Consistent hashing

Example

Pro

How to serve Hot keys?

"Celebrity" Key

Solution 1: Horizontal Scaling

Solution 2: Faster network cards

Solution 3: Cache Smearing

Conclusion

Reference and Image Credit

Subscribe to my newsletter

Kamran Ali

Kamran Ali