Let’s say you’re building a backend service. Things are going great — until traffic picks up. Suddenly, you notice something odd.

You’re seeing multiple identical requests hitting your database or API. They all ask for the same thing and they all make separate, expensive calls.

Sound familiar?

If you’ve ever wondered, “Why are we doing the same thing five times in parallel?”, you’re not alone. This is a classic performance anti-pattern — and request coalescing can fix it.

So, What’s the Problem?

Imagine multiple users open the same product page at the same time. Or dozens of threads try to load the same user profile. They all hit your service with the same request—say, getUser("123")—within milliseconds of each other.

Now here’s the kicker: instead of sharing the work, each thread fires off its own request to the database or a remote API.

Why? Because your service has no idea that others are doing the same thing.

Let’s break it down:

Thread A: fetch("user123") → starts DB/API call
Thread B: fetch("user123") → starts DB/API call
Thread C: fetch("user123") → starts DB/API call

That’s three expensive calls… for the same data.

The Thundering Herd Problem

This scenario becomes even worse when a cache expires or a cold-start occurs. Suddenly, thousands of requests for the same key hit your backend simultaneously. This is known as the thundering herd problem.

In systems with shared caching or batch jobs, thundering herds can:

Spike traffic to your database or upstream API
Cause rate-limiting, timeouts, or failures
Lead to cascading issues in downstream services

Why does this happen? Because each thread/client sees a cache miss and rushes to fetch the data independently—without knowing others are doing the same.

Can We Do Better?

What if you could say:

“Hey, I see someone is already fetching this. Let me just wait and use their result.”

That’s what request coalescing is all about.

What Is Request Coalescing?

Request coalescing is a technique where multiple concurrent requests for the same key are merged into one. Instead of all threads doing the same thing, only the first one does the work. The others just wait—and then reuse the result.

Here’s how it looks with coalescing:

Thread A: fetch("user123") → starts DB/API call
Thread B: fetch("user123") → waits for result from A
Thread C: fetch("user123") → waits for result from A

Only one call goes through. Everyone else benefits.

Where Should You Use It?

Request coalescing makes sense when

The same key is requested often (e.g., trending topics, popular users).
The backend call is expensive.
You’re dealing with cold cache or frequent expirations.
You use TTL-based caching and care about stability.

Let's Build !

public class User {
    String name;

    public User(String name) {
        this.name = name;
    }
}

This is a simple POJO representing a User. In real systems, this might come from a database or remote API.

public class UserDao {
    // use synchronized to simulate a even higher load by allowing only one thread to go through
    public synchronized User fetchByName(String name) {
        // simulate db fetch which takes 0.5 sec
        Stopwatch started = Stopwatch.createStarted();
        LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(500));
        User user = new User(name);
        long elapsed = started.elapsed(TimeUnit.MILLISECONDS);
        log.info("Took {} ms to fetch user", elapsed);
        return user;
    }
}

This class simulates a costly database fetch:

Uses synchronized to throttle concurrent access (imitating heavy load).
Sleeps for 500ms to simulate latency.

This is important to see the benefit when coalescing kicks in — only one of these slow fetches should happen!


public class UserController {
    private final boolean isCoalescingEnabled;
    UserDao userDao;
    RequestCoalescer<User> requestCoalescer;

    UserController(UserDao userDao, boolean isCoalescingEnabled) {
        this.userDao = userDao;
        this.requestCoalescer = new RequestCoalescer<>();
        this.isCoalescingEnabled = isCoalescingEnabled;
    }

    public User lookupName(String name) {
        if (isCoalescingEnabled) {
            return requestCoalescer.subscribe(name, () -> userDao.fetchByName(name));
        } else {
            return userDao.fetchByName(name);
        }
    }
}

This class controls how requests are handled:

You can toggle coalescing on/off using isCoalescingEnabled.
If enabled, the controller delegates to the coalescer.
If disabled, it simply hits the DAO each time — resulting in multiple slow, redundant fetches.

This makes it easier to benchmark and demonstrate the benefits of coalescing.

public class RequestCoalescer<T> {
    Map<String, CompletableFuture<T>> inFlightRequests = new ConcurrentHashMap<>();
    public T subscribe(String key, Supplier<T> supplier) {
        CompletableFuture<T> future = getOrCreateFuture(key, supplier);
        return future.join();
    }

    private CompletableFuture<T> getOrCreateFuture(String key, Supplier<T> supplier) {
        CompletableFuture<T> future = inFlightRequests.get(key);
        if (future != null) {
            return future;
        }
        CompletableFuture<T> newFuture = new CompletableFuture<>();
        CompletableFuture<T> oldFuture = inFlightRequests.putIfAbsent(key, newFuture);
        if (oldFuture != null) {
            return oldFuture;
        } else {
            CompletableFuture.supplyAsync(() -> {
                try {
                    T result = supplier.get();
                    newFuture.complete(result);
                    inFlightRequests.remove(key, newFuture);
                    return result;
                } catch (Exception e) {
                    newFuture.completeExceptionally(e);
                    inFlightRequests.remove(key, newFuture);
                    // return value is unused - newFuture is actually used.
                    return null;
                }
            });
            return newFuture;
        }
    }
}

Step 1: Check if a fetch for this key is already happening. If so, return the existing future.

Step 2: Try to insert a new future. If another thread beat us to it, we return their future instead.

Step 3: If we won the race, start the fetch in a new thread. Once it’s done, we:

Complete the future
Remove the entry from the map (This is important to prevent memory leaks and avoid returning stale data)

@Slf4j
public class UserControllerTest {
    @ParameterizedTest
    @ValueSource(booleans = {true, false})
    public void testLookupName(boolean isCoalescingEnabled) throws InterruptedException {
        UserController userController = new UserController(new UserDao(), isCoalescingEnabled);
        CountDownLatch latch = new CountDownLatch(10);
        Stopwatch timer = Stopwatch.createStarted();
        for (int i = 0; i < 10; i++) {
            CompletableFuture.runAsync(() -> {
                userController.lookupName("test");
                latch.countDown();
            });
        }
        boolean await = latch.await(10, TimeUnit.SECONDS);
        Assertions.assertTrue(await);
        long seconds = timer.elapsed(TimeUnit.SECONDS);
        log.info("Took {} seconds", seconds);
        if (isCoalescingEnabled) {
            Assertions.assertTrue(seconds <= 1);
        } else {
            Assertions.assertTrue(seconds >= 5);
        }
    }
}

No code is completed until its tests are written.
In this test suite, we make 10 concurrent requests to lookup the user details for user with name test. When coalescing is disabled, the test takes 0.5 × 10 ~ 5 seconds to finish.

When coalescing is enabled, the test is finished in ~0.5 seconds because while the results of the first request is getting computed, the remaining requests are virtually short-circuited.

Gotchas

Memory leaks: Always remove entries from the in-flight map after use.
Timeouts: What if the fetch never finishes? Add appropriate timeouts.
Error sharing: If the request fails, make sure others don’t cache a bad result.
Over-coalescing: Don’t block forever; design with concurrency limits.

Thank you for reading. Hope you learnt something new today.

Building a Request Coalescer from Scratch