Building a Request Coalescer from Scratch


Let’s say you’re building a backend service. Things are going great — until traffic picks up. Suddenly, you notice something odd.
You’re seeing multiple identical requests hitting your database or API. They all ask for the same thing and they all make separate, expensive calls.
Sound familiar?
If you’ve ever wondered, “Why are we doing the same thing five times in parallel?”, you’re not alone. This is a classic performance anti-pattern — and request coalescing can fix it.
So, What’s the Problem?
Imagine multiple users open the same product page at the same time. Or dozens of threads try to load the same user profile. They all hit your service with the same request—say, getUser("123")
—within milliseconds of each other.
Now here’s the kicker: instead of sharing the work, each thread fires off its own request to the database or a remote API.
Why? Because your service has no idea that others are doing the same thing.
Let’s break it down:
Thread A: fetch("user123") → starts DB/API call
Thread B: fetch("user123") → starts DB/API call
Thread C: fetch("user123") → starts DB/API call
That’s three expensive calls… for the same data.
The Thundering Herd Problem
This scenario becomes even worse when a cache expires or a cold-start occurs. Suddenly, thousands of requests for the same key hit your backend simultaneously. This is known as the thundering herd problem.
In systems with shared caching or batch jobs, thundering herds can:
Spike traffic to your database or upstream API
Cause rate-limiting, timeouts, or failures
Lead to cascading issues in downstream services
Why does this happen? Because each thread/client sees a cache miss and rushes to fetch the data independently—without knowing others are doing the same.
Can We Do Better?
What if you could say:
“Hey, I see someone is already fetching this. Let me just wait and use their result.”
That’s what request coalescing is all about.
What Is Request Coalescing?
Request coalescing is a technique where multiple concurrent requests for the same key are merged into one. Instead of all threads doing the same thing, only the first one does the work. The others just wait—and then reuse the result.
Here’s how it looks with coalescing:
Thread A: fetch("user123") → starts DB/API call
Thread B: fetch("user123") → waits for result from A
Thread C: fetch("user123") → waits for result from A
Only one call goes through. Everyone else benefits.
Where Should You Use It?
Request coalescing makes sense when
The same key is requested often (e.g., trending topics, popular users).
The backend call is expensive.
You’re dealing with cold cache or frequent expirations.
You use TTL-based caching and care about stability.
Let's Build !
public class User {
String name;
public User(String name) {
this.name = name;
}
}
This is a simple POJO representing a User
. In real systems, this might come from a database or remote API.
public class UserDao {
// use synchronized to simulate a even higher load by allowing only one thread to go through
public synchronized User fetchByName(String name) {
// simulate db fetch which takes 0.5 sec
Stopwatch started = Stopwatch.createStarted();
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(500));
User user = new User(name);
long elapsed = started.elapsed(TimeUnit.MILLISECONDS);
log.info("Took {} ms to fetch user", elapsed);
return user;
}
}
This class simulates a costly database fetch:
Uses
synchronized
to throttle concurrent access (imitating heavy load).Sleeps for 500ms to simulate latency.
This is important to see the benefit when coalescing kicks in — only one of these slow fetches should happen!
public class UserController {
private final boolean isCoalescingEnabled;
UserDao userDao;
RequestCoalescer<User> requestCoalescer;
UserController(UserDao userDao, boolean isCoalescingEnabled) {
this.userDao = userDao;
this.requestCoalescer = new RequestCoalescer<>();
this.isCoalescingEnabled = isCoalescingEnabled;
}
public User lookupName(String name) {
if (isCoalescingEnabled) {
return requestCoalescer.subscribe(name, () -> userDao.fetchByName(name));
} else {
return userDao.fetchByName(name);
}
}
}
This class controls how requests are handled:
You can toggle coalescing on/off using
isCoalescingEnabled
.If enabled, the controller delegates to the coalescer.
If disabled, it simply hits the DAO each time — resulting in multiple slow, redundant fetches.
This makes it easier to benchmark and demonstrate the benefits of coalescing.
public class RequestCoalescer<T> {
Map<String, CompletableFuture<T>> inFlightRequests = new ConcurrentHashMap<>();
public T subscribe(String key, Supplier<T> supplier) {
CompletableFuture<T> future = getOrCreateFuture(key, supplier);
return future.join();
}
private CompletableFuture<T> getOrCreateFuture(String key, Supplier<T> supplier) {
CompletableFuture<T> future = inFlightRequests.get(key);
if (future != null) {
return future;
}
CompletableFuture<T> newFuture = new CompletableFuture<>();
CompletableFuture<T> oldFuture = inFlightRequests.putIfAbsent(key, newFuture);
if (oldFuture != null) {
return oldFuture;
} else {
CompletableFuture.supplyAsync(() -> {
try {
T result = supplier.get();
newFuture.complete(result);
inFlightRequests.remove(key, newFuture);
return result;
} catch (Exception e) {
newFuture.completeExceptionally(e);
inFlightRequests.remove(key, newFuture);
// return value is unused - newFuture is actually used.
return null;
}
});
return newFuture;
}
}
}
Step 1: Check if a fetch for this key is already happening. If so, return the existing future.
Step 2: Try to insert a new future. If another thread beat us to it, we return their future instead.
Step 3: If we won the race, start the fetch in a new thread. Once it’s done, we:
Complete the future
Remove the entry from the map (This is important to prevent memory leaks and avoid returning stale data)
@Slf4j
public class UserControllerTest {
@ParameterizedTest
@ValueSource(booleans = {true, false})
public void testLookupName(boolean isCoalescingEnabled) throws InterruptedException {
UserController userController = new UserController(new UserDao(), isCoalescingEnabled);
CountDownLatch latch = new CountDownLatch(10);
Stopwatch timer = Stopwatch.createStarted();
for (int i = 0; i < 10; i++) {
CompletableFuture.runAsync(() -> {
userController.lookupName("test");
latch.countDown();
});
}
boolean await = latch.await(10, TimeUnit.SECONDS);
Assertions.assertTrue(await);
long seconds = timer.elapsed(TimeUnit.SECONDS);
log.info("Took {} seconds", seconds);
if (isCoalescingEnabled) {
Assertions.assertTrue(seconds <= 1);
} else {
Assertions.assertTrue(seconds >= 5);
}
}
}
No code is completed until its tests are written.
In this test suite, we make 10 concurrent requests to lookup the user details for user with name test
. When coalescing is disabled, the test takes 0.5 × 10 ~ 5 seconds to finish.
When coalescing is enabled, the test is finished in ~0.5 seconds because while the results of the first request is getting computed, the remaining requests are virtually short-circuited.
Gotchas
Memory leaks: Always remove entries from the in-flight map after use.
Timeouts: What if the fetch never finishes? Add appropriate timeouts.
Error sharing: If the request fails, make sure others don’t cache a bad result.
Over-coalescing: Don’t block forever; design with concurrency limits.
Thank you for reading. Hope you learnt something new today.
Subscribe to my newsletter
Read articles from Snehasish Roy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Snehasish Roy
Snehasish Roy
Experienced Software developer with a demonstrated history of working in the financial services and product industry. Worked on various projects over the years to improve customer satisfaction by making things faster and better. Proficient with functional and reactive paradigm. Skilled in Java 8, MVC & Spring framework, Distributed Databases (MemSQL, Greenplum, Aerospike) along with Kafka, ElasticSearch and Kibana Stack. Completed Bachelor of Technology (BTech) with Honors focused in IT from IIIT Allahabad with a CGPI of 9.15. Highly interested in solving complex technical/business problems by leveraging distributed systems. Occasionally have found security bugs while pen-testing random android apps e.g. BetterHalf.ai (Did a responsible disclosure). Competitive Programming Stats: LeetCode: Max Contest Rating of 2011, with a worldwide ranking of ~7K out of ~220K users. Best ranking of 228 in LeetCode Biweekly Contest 56. Second Best ranking of 466 in LeetCode Biweekly Contest 60. Third Best ranking of 578 in LeetCode Biweekly Contest 74. CodeForces: Max rating of 1423 (Specialist) CodeChef: Max rating of 1665 GeeksForGeeks: Achieved 27 rank out of ~1200 contestants in GFG Coding Challenge https://practice.geeksforgeeks.org/contest/the-easiest-ever-coding-challenge/leaderboard/ https://drive.google.com/file/d/1YS8GoZtE2nH0dnlcGWqWnjxzZbVt1WFh/view Facebook Hacker Cup 2021 Qualification Round 2021: Rank 746 Worldwide, Rank 149 India Round 1 2021: Rank 1327 Worldwide, Rank 220 India Round 2 2021: Rank 2775 Worldwide, Rank 527 India https://www.facebook.com/codingcompetitions/hacker-cup/2021/certificate/661693404384805