How to make Kubernetes & Node.js microservices a perfect match?


It’s peak traffic time. Your Node.js application is handling thousands of requests per second.
Suddenly, your Kubernetes cluster starts randomly killing pods. Latency spikes. Requests pile up. Your dashboard lights up like a Christmas tree—but nobody knows why.
Your microservices are choking on their own interdependencies.
Sound familiar?
If you’ve ever run a Node.js service in Kubernetes, you’ve likely encountered some version of this nightmare. The truth is, Kubernetes and Node.js don’t always get along by default.
In our latest webinar, we discussed this issue and how to work past it.
\We answered Q&A from the webinar at the end of this blog!*
The Fundamental Mismatch Between Kubernetes and Node.js
Kubernetes is an infrastructure-first platform, designed to manage workloads by allocating physical resources like CPU and memory. It assumes applications will scale predictably based on resource consumption, treating workloads as isolated units that can be replicated as demand increases. Node.js, however, follows a completely different execution model, prioritizing event-driven, non-blocking operations within a single-threaded environment.
One of the most common problems we see is how Kubernetes scales applications. Traditionally, Kubernetes relies on CPU and memory usage to determine when to scale up or down. However, Node.js doesn’t work like a typical CPU-bound process. It can handle thousands of concurrent requests efficiently within a single thread, meaning CPU usage is often a misleading indicator of when a Node.js service actually needs to scale.
Instead of relying solely on CPU metrics, we discussed the importance of using event loop utilization as a better measure for scaling. This helps capture when Node.js is truly overloaded before performance degrades. CPU spikes can often be misleading—garbage collection, async operations, or a temporary burst of activity may increase CPU usage without actually signaling that more pods are needed.
By monitoring event loop utilization, teams can gain a clearer picture of when a Node.js service is truly struggling and should be scaled. This helps avoid unnecessary pod creations, reduces infrastructure costs, and prevents performance degradation caused by misleading scaling triggers.
The Problems with of Assigning Micro CPUs to Node.js Processes
Another mistake we see frequently is assigning fractional CPUs to Node.js processes. Kubernetes allows teams to allocate fractions of a CPU to a pod (e.g., 0.2 CPUs), but this often results in suboptimal performance for Node.js applications. Node.js and its underlying V8 engine rely on multiple background threads for garbage collection, optimization, and async I/O. Restricting CPU availability can cause unpredictable behavior and degraded performance.
A better approach is to allocate at least 1 to 1.5 CPUs per Node.js pod, allowing for smoother execution of background tasks.
How Microservices Architectures Impact Node.js Performance
Microservices and Kubernetes often go hand-in-hand, but they introduce additional challenges for Node.js applications. Many teams assume breaking down an application into dozens (or hundreds) of small services will lead to better scalability and resilience. However, this often results in:
Excessive inter-service communication: Each microservice call introduces network overhead, adding latency and increasing the risk of failures.
Over-provisioning of resources: When each microservice runs in its own Kubernetes pod with pre-allocated memory and CPU, resource usage skyrockets.
Complex debugging and observability: With so many moving parts, understanding bottlenecks becomes far more challenging.
While microservices are useful in some cases, we emphasized that blindly following this architecture isn’t always the best approach for Node.js applications.
Real-World Examples of Kubernetes Setups That Work (And Ones That Don’t)
Throughout our conversation, we discussed real-world Kubernetes misconfigurations that have caused major issues for teams running Node.js applications—and how to fix them.
DNS overload issues
Matteo shared a recurring issue he has seen in multiple consulting engagements: Node.js does not cache DNS lookups by default. This means that every new HTTP request between microservices results in a fresh DNS resolution request, which can overload Kubernetes' internal DNS resolver, particularly if the system is under high traffic. Worse, if teams are using Alpine-based containers, they lack a built-in DNS cache, making the problem even more severe.
👉 The Fix: Use a local DNS cache in Kubernetes nodes to reduce the number of unnecessary lookups and improve performance.
Inefficient scaling strategies
Many teams scale their Node.js applications based on CPU and memory thresholds, but as we discussed, Node.js can handle high concurrency without consuming excessive CPU. Kubernetes' autoscaler might react to temporary CPU spikes—such as garbage collection running—by spinning up new pods that aren’t actually needed. This leads to unnecessary pod churn, higher infrastructure costs, and unpredictable scaling.
👉 The Fix: Instead of relying on CPU and memory alone, teams should monitor event loop utilization to determine when a service is actually struggling under load.
Kubernetes killing pods due to memory mismanagement
One of the biggest mistakes teams make is setting aggressive memory limits on Node.js pods. Because Node.js and V8 manage memory lazily, the garbage collector may hold onto memory even when it’s no longer needed. Kubernetes, seeing high memory usage, might kill a pod preemptively, even though it could have recovered without intervention. This cycle leads to excessive pod restarts and degraded performance.
👉 The Fix: Be cautious when setting memory limits. Avoid low kill thresholds and allow Node.js to manage memory dynamically within reasonable constraints.
Wrapping up
Kubernetes and Node.js can work together effectively, but only when configured with an understanding of how Node.js handles concurrency, memory, and CPU allocation.
If you’re struggling with running Node.js in Kubernetes, start by reconsidering how you scale, allocate resources, and handle inter-service communication. And if you missed the webinar, be sure to check out the full recording here.
Q&A from the webinar
We answered several questions during the webinar, but unfortunately ran out of time to cover them all. Below are the questions we received, and their answers.
1. How do you configure max old space size and max semi space size? You can configure max old space and max semi space sizes with a few Node.js CLI flags (you can browse all v8 options by running node --v8-options
:
--min-semi-space-size (min size of a semi-space (in MBytes), the new space consists of two semi-spaces)
type: size_t default: --min-semi-space-size=0
--max-semi-space-size (max size of a semi-space (in MBytes), the new space consists of two semi-spaces)
type: size_t default: --max-semi-space-size=0
--semi-space-growth-factor (factor by which to grow the new space)
type: int default: --semi-space-growth-factor=2
--max-old-space-size (max size of the old space (in Mbytes))
type: size_t default: --max-old-space-size=0
--max-heap-size (max size of the heap (in Mbytes) both max_semi_space_size and max_old_space_size take precedence. All three flags cannot be specified at the same time.)
2. Why is it bad if Kubernetes spins up a new pod due to CPU use, even if Node.js could handle it? Spinning up new pods based on CPU can lead to unnecessary resource allocation and increased infrastructure costs. Node.js can handle high traffic without consuming excess CPU due to its event-driven nature. Instead of CPU-based scaling, event loop utilization provides a better indication of when to scale.
3. Is there a Platformatic tool for developers to build an app with Next.js? Yes! Our Watt framework is designed to optimize Node.js applications, including those built with Next.js. It helps improve performance, auto-scales dynamically, and simplifies deployment.
4. What’s the best practical configuration for scaling based on event loop utilization? Event loop utilization metrics should be collected and fed into Kubernetes' Horizontal Pod Autoscaler (HPA). Using tools like Platformatic autoscaler, you can dynamically adjust scaling policies based on event loop workload rather than CPU.
5. How can I configure autoscaling to prevent latency issues at peak traffic? Instead of CPU-based scaling, track event loop utilization and incoming request rates. Combining these metrics ensures that scaling occurs only when Node.js is actually struggling to keep up, rather than due to temporary CPU spikes.
6. When do I actually need a buffer in Node.js? Buffers are needed when working with binary data, such as file streams, network protocols, or image processing. If you’re dealing with structured JSON data, standard JavaScript objects are preferable.
7. How do I prevent socket hangups with Keep-Alive set to true in Node 19? We recommend switching to Undici, a high-performance HTTP client that efficiently handles Keep-Alive connections. Additionally, ensure connection pooling and retry mechanisms are in place to mitigate transient failures.
Next week’s webinar
Join us next week while we discuss whether or not serverless is an illusion. We’ll be touching on:
✅Hidden costs & complexities of serverless architectures
✅ Why serverless doesn't actually mean no servers
✅ When serverless makes sense
✅ How to balance simplicity, cost & performance
Subscribe to my newsletter
Read articles from Luca Maraschi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
