Deep Dive: How a Node.js HTTP Server Handles Many Concurrent Requests

Utpal PatiUtpal Pati
6 min read

Node.js achieves concurrency by decoupling “doing work” from “waiting for I/O.” Even though JavaScript runs on a single call stack, the runtime orchestrates thousands of overlapping request lifecycles via the event loop, libuv, and a set of specialized queues.

The Complete Lifecycle of Many Requests

Consider an Express or native HTTP server with many concurrent requests arriving.

  1. Kernel Networking and libuv I/O readiness
  • The OS kernel receives packets for the listening TCP socket.

  • Node’s libuv watches that socket using the platform’s I/O APIs (e.g., epoll/kqueue/IOCP).

  • When a connection is accepted and data is readable, libuv marks an event as ready.

  1. Event loop poll phase → your JS callback
  • The event loop enters the poll phase and notices “readable/connection” events.

  • Node wires these to the JavaScript layer by enqueuing the user-facing callback (the HTTP request handler) to run soon.

  • When the call stack is free, the event loop invokes your handler (one at a time) on the single JS thread.

  1. Synchronous section of the handler
  • Your route’s synchronous code runs immediately: parsing, small computations, quick conditionals, “log 1 / log 2”, etc.

  • If you schedule timers (setTimeout/setInterval), those are registered with libuv’s timers system.

  • If you initiate async I/O (DB calls, HTTP calls, file reads) through nonblocking drivers, those are registered with the underlying async I/O and will signal completion later.

  1. Yielding on await or callbacks
  • If you use async/await, the function “pauses” at await without blocking the thread. The remainder of the function is captured as a continuation (state machine). Control returns to the event loop immediately, freeing it to run other requests.

  • If you use callbacks or promises, the same principle applies: you schedule work; the main thread moves on.

  1. Completions come back via queues
  • When timers expire, their callbacks enter the timers macrotask queue.

  • When network/file/DB I/O completes, their callbacks go to appropriate macrotask queues.

  • When promises resolve, their continuations go to the microtask queue; in Node, process.nextTick runs even before microtasks from promises.

  1. Priority and execution ordering
  • The JS call stack must be empty to run queued work.

  • Ordering rules after every chunk of JS:

    • First: nextTick queue (Node-specific) runs to empty.

    • Then: microtask queue (Promises/queueMicrotask) runs to empty.

    • Then: proceed with the current loop phase (e.g., timers, I/O callbacks, check/setImmediate, close).

  • This priority means await continuations typically resume as soon as possible after their promise resolves, often before timers and other macrotasks.

  1. Compose results and respond
  • When the required async pieces complete, your code resumes (e.g., the code after await) and you write the consolidated response (res.json/res.end).

  • Each request finishes independently; response order depends on completion timing, not arrival order.

Key insight: At any instant, only one JavaScript callback runs. Concurrency emerges from quick handoffs: schedule work, yield control, resume later.


Event Loop Phases: What runs where

A single loop tick visits these major phases (simplified for clarity):

  • Timers: Expired setTimeout/setInterval callbacks.

  • Pending callbacks: Certain I/O callbacks deferred from previous cycles.

  • Idle/prepare: Internal use.

  • Poll: The main I/O phase; pulls in new I/O events and may run their callbacks.

  • Check: setImmediate callbacks.

  • Close: Close events (e.g., socket “close”).

Between every phase transition (and after any JS execution), Node drains:

  • process.nextTick queue (highest priority in Node)

  • then the microtask queue (Promises, queueMicrotask)

Implications:

  • Await continuations (promises) resume before most macrotasks.

  • setImmediate runs in the check phase; it tends to run before timers scheduled for 0ms if scheduled from within I/O cycles.

  • Timers are not precise; they fire “no earlier than” their delay, and actual run time depends on loop load.


Request State: Where is partial synchronous work stored when waiting for timer/promise to resolve?

  • Local variables in your handler scope (closures) hold intermediate results.

  • You may attach data to req/res (e.g., res.locals) for passing between middleware.

  • When awaiting, the async function’s local state is preserved in an internal state machine until it resumes; you don’t need to manually store it.


DB calls inside a api call: Worker Thread or Not?

  • Most modern DB drivers use async non-blocking sockets; they do not run JS in worker threads. They rely on the event loop and OS I/O to notify completion.

  • The libuv threadpool may be used by some native modules (fs, crypto, zlib) for blocking operations, but your DB query’s JavaScript logic resumes on the main thread when results arrive.


If i hit 10, 100, or 10,000 simultaneous requests then what will happen?

  • Arrival: The kernel and libuv note many sockets with readable data.

  • Dispatch: Each request’s handler runs its synchronous part swiftly, schedules async work, and yields.

  • Interleaving: As async work completes per request, its continuations run in priority order (nextTick, microtasks, then macrotasks).

  • Throughput scales because the main thread spends very little time waiting; it constantly alternates between ready-to-run callbacks across many requests.

What could reduce throughput?

  • CPU-bound synchronous code (e.g., large loops, heavy JSON parsing, compression) blocks the loop.

  • Excessive microtask chains can starve macrotasks; use judiciously.

  • Big payload parsing on the main thread—consider streams or offloading.


CPU-Bound Work: Three strategies

  1. Keep it tiny and synchronous
  • If it’s micro-fast (sub-millisecond), do it inline.
  1. Yield with batched chunks (stay single-threaded)
  • Split big loops into chunks and schedule the next chunk with setImmediate to let the loop process other work in-between.

  • Great for moderate CPU tasks when latency matters and you want to avoid thread management.

  1. True parallelism with Worker Threads
  • For substantial CPU work (image/video processing, massive transforms, crypto, ML inference on CPU), offload to worker threads.

  • Use a pool sized ~ number of cores; reuse threads to avoid creation overhead.

  • Pass data via message passing or transferables; consider SharedArrayBuffer/Atomics for low-latency coordination.

  • Keep the main thread focused on I/O and lightweight logic; respond when workers post results.


Practical Patterns and Pitfalls

  • Keep handlers fast: Do synchronous lightweight work only.

  • Prefer async/await over nested callbacks for clarity; errors with try/catch are straightforward.

  • Use streaming (HTTP and file I/O) to avoid buffering huge payloads in memory.

  • Backpressure and timeouts:

    • Apply per-request timeouts.

    • Implement queue depth limits if you hand tasks to a worker pool.

  • Avoid blocking libraries:

    • Use async equivalents (e.g., fs.promises) or streams.

    • Beware of synchronous JSON operations on extremely large objects—consider streaming parsers.

  • Observability:

    • Monitor event loop lag and utilization.

    • Track worker pool saturation and queueing time.

    • Log timers vs microtasks behavior if you observe starvation.


Detailed Example Scenarios

A) Timer + Logs per Request (10 concurrent)

  • For each request:

    • “1” logs immediately.

    • setTimeout(…, 1000) scheduled.

    • “2” logs immediately.

  • About 1s later, each timer callback enters the timers queue; the event loop executes them one by one, printing “3”.

  • Order of “3” logs depends on when callbacks are dequeued; not guaranteed to match arrival order.

B) Await a DB Query per Request (10 concurrent)

  • Each request runs “1”, awaits db.query(), then runs “2” and responds.

  • While 10 DB requests are in-flight, the event loop runs other callbacks (e.g., from other requests).

  • As each query resolves, its promise continuation lands in the microtask queue and resumes quickly, typically responding with low latency.

C) CPU-heavy Loop per Request: Batching

  • For each request, instead of a 10M-iteration synchronous loop, break into batches of, say, 100k and schedule the next batch with setImmediate.

  • This keeps the loop responsive; other requests get time slices between batches.

  • If even batched work is too slow, move to worker threads.

0
Subscribe to my newsletter

Read articles from Utpal Pati directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Utpal Pati
Utpal Pati