A cache stampede is what can happen in a system that handles a massive number of concurrent requests when a cache entry expires and multiple requests try to re-read it from the database at the same time.

There are ways of mitigating this effect, probabilistic early expiration being the most ingenious IMO.

How Does A Cache Stampede Occur

A common caching pattern looks like this:

Attempt to read an entry from the cache.
If the entry is missing, recompute it (e.g. fetch from a database) and cache the result.

In C# it could be something like this:

public T ThroughCache<T>(string cacheKey, Func<T> lookup) where T : class
{
    T cached = _theCache.Get<T>(cacheKey);
    if (cached != null)
    {
        return cached;
    }

    cached = lookup(); // fetch from the database

    if (cached != null)
    {
        _theCache.Add(cacheKey, cached);
    }

    return cached;
}

There is an obvious problem here, which becomes much more pronounced when there are multiple threads or processes executing the same logic concurrently. If the cache entry expires, each thread or process independently attempts to recompute the value, causing a flood of database queries until one process successfully updates the cache. This sudden load can overwhelm the database, leading to potential outages.

Strategies to Mitigate Cache Stampede

Locking

One almost obvious mitigation is to use pessimistic locking:

Attempt to read the cache entry.
If it’s a miss, attempt to acquire a lock for the cache entry key.
If the lock is acquired, recompute the cache entry value and store it in the cache.
Release the lock.

This way only the process that acquired the lock will go to the database to repopulate the cache.

Some issues related to pessimistic locking apply here, of course, like, for example: what if the process recomputing the cache entry dies before releasing the lock (you need a timeout on the lock).

Locking is the mechanism used by .NET 9’s HybridCache.

Async Recomputation Process

Another way to mitigate the risk of a cache stampede is to have an asynchronous process dedicated to regenerating entries in the cache.

Instead of having each thread/process that reads from the cache potentially do the recomputation, you run a separate process that does it.

The dedicated process can do the regeneration:

when cache expiration approaches
periodically
when there's a cache miss

Each of those options has its own problems, e.g. you have to access the cache to know when expiration approaches; when you’re recomputing periodically, you might be doing it unnecessarily; when there’s a cache miss — isn’t that too late?

So here we get to a sort of a hybrid of the two options with an ingenious twist…

Probabilistic early expiration

We go back to the way where each process can trigger the recomputation. But, we make it so that there is a low chance that two processes will do the recomputation at the same time.

The idea is this:

Each process may decide to recompute a cache entry value before its expiration by "rolling a dice": it generates a random number and regenerated the cache entry if then number is above a certain threshold. It’s best if the probability increases as the cache entry’s TTL shortens.

This approach will favor early recomputations when the traffic rate increases (more processes rolling the dice).

From wikipedia I borrowed this clever formula for the “dice”:

time() - delta * beta * log(rand(0,1))) ≥ expiry)

time() is the current time, delta is how long it takes to recompute the value.

With this formula:

As the TTL shortens, the chance of recomputation increases.
Under heavy traffic, more processes are likely to “roll the dice,” ensuring early recomputation without overwhelming the database.

If you set beta to a value greater than 1, you’ll skew the probability toward earlier recomputations.

Why It’s Brilliant

Probabilistic early expiration is a self-regulating mechanism that aligns recomputation frequency with traffic load. Under heavy traffic, the likelihood of recomputation increases naturally, reducing the risk of a cache stampede while avoiding unnecessary recomputations during low traffic periods. This ingenious strategy is both simple to implement and highly effective, and it requires no locks!

Cover photo courtesy of Peter Robbins, CC license.

Cache Stampede