You’ve been there.

That one late night, logs flooding in, thread count shooting past 2,000. CPU barely touched, but the app’s crawling. GC’s gasping. Your service dashboard looks like a heart monitor in flatline mode. And there it is—java.util.concurrent.RejectedExecutionException.

You stare. You sigh. And you mutter what every Java engineer has, at some point, whispered under their breath:

"Why the hell does Java need so many threads to do so little?"

1> What This Episode Is About

For two decades, we’ve built high-concurrency systems on top of OS-managed threads, pretending they were cheap. They weren’t. So we compensated:

Thread pools with timeouts
Reactive frameworks to dodge blocking
Custom queue backpressure hacks
And prayers. Lots of them.

This episode is about understanding the original sin of Java concurrency: the heavyweight nature of platform threads — and the web of complexity it forced us to build around them.

Virtual threads might be the solution, but before we can celebrate them in Part 2, we need to know what exactly they’re saving us from.

2> Why Java Threads Were Never Lightweight

Let’s clear something up: Java threads were never cheap. We just got used to paying the cost and calling it “normal.”

Every time you did:

new Thread(() -> {}).start();

you weren’t creating some magical lightweight thing. You were asking the operating system for a native thread. That’s a heavyweight resource — and the JVM made no attempt to hide it.

What did you really get?

A 1:1 mapped OS-level thread
Roughly 1 MB of stack memory reserved by default
An expensive context switch every time the CPU scheduler juggled between threads
Zero awareness of whether your thread was doing real work or just sitting around waiting for I/O

Now, if your service handled a few dozen users, no big deal. But the moment you needed to serve thousands of concurrent requests — most of which spent their time waiting on a database, remote API, or disk — you hit a wall. Fast.

The illusion of "scalable" Java

Here’s the trap most of us fell into:

Requests come in.
Each one gets a thread.
Some threads wait.
You add a thread pool.
You queue requests.
The queue fills.
You get RejectedExecutionException.

And suddenly, you're tuning your corePoolSize at 3 AM like it's a sacred number from a Mayan prophecy.

So why didn’t we feel the pain earlier?

Because CPUs were fast. Servers were big. And honestly, we weren’t dealing with the scale that exposed how much of a lie “just use a thread” really was.

But as traffic scaled and latency expectations dropped, the cost became impossible to ignore. We weren’t bottlenecked on CPU — we were bottlenecked on threads that weren’t even doing anything.

That’s when things started to get reactive… in all senses of the word.

3> The Scalability Wall

You never forget the first time your app collapsed under load because the threads simply ran out.

It starts subtle:

A few slow requests
Some GC activity
Maybe a harmless-looking spike in I/O

Then boom:

java.util.concurrent.RejectedExecutionException

Your thread pool is saturated. Your queues are full.
And your users? They're staring at spinning loaders while you scramble through dashboards.

Why did this happen?

Because we were using platform threads like currency, spending one per request — even when most of those requests were just waiting.

Waiting on:

A database call (SELECT * FROM users WHERE patience > 0)
A REST call to another microservice
A file read, or worse, a synchronous HTTP client

Each of those actions blocked an entire thread.

Now imagine:

You’ve got 10,000 users.
Each holds a connection for 2 seconds.
You need at least 10,000 threads to handle them concurrently.

Oops.
JVM dies. Context switching goes wild. CPU does more thread juggling than actual work.

The Thread Pool Band-Aid

So we invented thread pools.

You know the drill:

ExecutorService pool = Executors.newFixedThreadPool(200);

200 threads. Nice and safe.
Except… what happens when the 201st request comes in?

You queue it.
Then you limit the queue.
Then that fills up.
And now you reject incoming requests with a custom error message that says:
"We value your business, please try again later."
(while your logs silently cry inside.)

But wait — aren’t threads supposed to help us scale?

Yes — if you're doing CPU-bound work.
But for I/O-heavy workloads (which most backend services are), platform threads become expensive babysitters — just sitting idle, waiting for something to respond, while holding onto precious memory and scheduling overhead.

So we pooled. We tuned. We hacked.

And in the process, we turned “scalable Java” into a thread micromanagement nightmare.

4> The Reactive Spiral

So we gave up.

We looked at our thread pools, our max queue sizes, our rejected tasks — and we finally said:

“Fine. If blocking is the problem, let’s just never block.”

And that’s how we entered the reactive spiral.

The Promise

Reactive frameworks offered us a way out.
No threads idling. No blocking calls. Just non-blocking everything, end-to-end.

Enter:

CompletableFuture
Project Reactor
RxJava
Netty and its infamous event loop model

You stopped writing this:

String response = restTemplate.getForObject(url, String.class);

And started writing this:

Mono<String> response = webClient.get().uri(url).retrieve().bodyToMono(String.class);

On paper, it looked clean. Under the hood, it was context-switching gymnastics.

The Reality

You lost something valuable: linearity.
You lost the ability to step through a request like a story.

Now, everything was callbacks, chained lambdas, and error branches.

.map()
.flatMap()
.thenCompose()
.onErrorResume()
.doOnNext()
.subscribe()
.block() (wait, what?)

Debugging this wasn’t “hard” — it was existential.

Stack traces? Gone.
Breakpoints? Hopeless.
Context? Maybe… if you passed it around manually like a cursed talisman.

You wanted throughput. You got cognitive overload.

It wasn’t all bad...

To be fair, reactive systems scaled.
If you were building low-latency, high-throughput gateways or stream processing engines, reactive was the only way to survive.

But here’s the dirty secret:

Most services didn’t need full-blown reactive pipelines.
They just needed to wait without burning a thread.

The Trade You Didn’t Realize You Were Making

We built an entire new paradigm just to avoid the cost of blocking — not because we loved reactive, but because threads were too expensive to use naively.

And that’s the tragedy.

We gave up:

Stack traces
Readability
Simplicity
Onboarding sanity

All to escape the monster Java itself had created.

5> And Still… We Blocked

Here’s the twist in this saga:
Even after going fully reactive, we couldn’t stop blocking.

Despite all the Mono, Flux, CompletableFuture, and the emotional damage caused by .flatMap(), you eventually hit a wall of truth:

“Some libraries just don’t care about your non-blocking dreams.”

The Usual Suspects

Let’s name names:

JDBC drivers → blocking by default.
Legacy HTTP clients → still blocking under the hood.
XML parsers, logging libraries, file I/O → all designed for classic threads.

You’d wire up a reactive flow, and then somewhere inside, a rogue .get() or .executeQuery() would stall your event loop — and with it, the entire reactor thread.

One blocking call. One frozen system.

And guess what? Debugging that?
Yeah — good luck tracing it through onNext chains and scheduler hops.

The Hybrid Hell

To deal with this, teams started mixing paradigms:

Block where you must, go reactive where you can
Use dedicated thread pools to quarantine the blocking stuff
Pass around Schedulers.elastic() like it’s holy water

Now you’ve got:

Reactive in the controller
Thread pools in the DAO
And no one on the team fully understands how context flows anymore

Congratulations — you’ve achieved accidental complexity at scale.

You Know It’s Bad When...

You create @Async wrappers around blocking code just to avoid freezing your event loop.
Your observability stack starts warning about blocked Netty threads.
New joiners ask where the business logic is and you send them a sequence diagram instead of code.

We didn’t fix the problem — we redecorated it.

So here we are:

Platform threads are too heavy.
Reactive is too complex.
Blocking is still necessary.

Is there a middle ground?
Yes. And it’s not a workaround — it’s a new primitive.

6> Wrap-Up: The Cost of Pretending

For over two decades, we convinced ourselves that platform threads were “just fine.”

We patched them with pools.
We outsmarted them with callbacks.
We tolerated their cost, their complexity, and their refusal to scale with the times.

And every time we tried to fix the problem, we ended up rewriting the way we wrote Java itself.

But here’s the hard truth:

Thread-per-request wasn’t the mistake. The mistake was assuming platform threads could handle it.

What we needed was never “more abstractions.”
We needed a better foundation.

In Episode 2: A New Hope, we’ll meet virtual threads — the comeback Java desperately needed.

Thread Wars: Episode 1 – The Thread Menace

Table of contents