Building a High-Performance Streaming Service in Kubernetes: WebSockets at Scale


WebSockets have been around for years, yet they remain one of the trickiest technologies to scale effectively in distributed systems. Initially designed for real-time applications like chat, WebSockets now power everything from live dashboards to gaming and recommendation engines.
But how do you build a high-performance WebSocket service in Kubernetes that scales to thousands—or even millions—of users?
We recently tackled this topic in a webinar. Let’s break down the key takeaways.
Here’s the recap:
The Fundamentals: Why WebSockets?
WebSockets provide a persistent, bidirectional communication channel between clients and servers, making them an excellent choice for real-time applications. Unlike traditional request-response models, where clients must repeatedly poll for updates, WebSockets allow data to be pushed instantly as events occur.
However, WebSockets introduce scalability challenges, particularly in distributed cloud environments, where managing connections and message delivery becomes complex.
The Challenges of Scaling WebSockets in a Distributed Environment
1. Connection Management and Memory Overhead
A WebSocket connection isn’t just an open pipe—it consumes memory at multiple levels.
Each connection creates a small object in the Node.js process (~100–200 bytes).
TCP socket buffers allocate ~16 KB per connection (8 KB for send + 8 KB for receive).
At 100,000 concurrent connections, memory usage can exceed 1.6 GB, before application logic is even considered.
Since WebSocket buffers are managed by the operating system’s network stack, default kernel settings often limit TCP memory allocation (e.g., 200–300 MB in many cloud environments). Without proper kernel tuning, WebSocket connections may be throttled or dropped prematurely.
2. The Problem of Distributed State
In a monolithic WebSocket setup, things are simple: a client connects to a single server, and all messages flow through that connection. However, in a distributed system with multiple WebSocket servers, things get complicated.
If a client connects to Server A but later reconnects to Server B, Server B doesn’t automatically know about messages that were sent while the client was offline.
Without a state-sharing mechanism, messages can be lost or duplicated.
To solve this, systems typically use:
Consistent Hashing for Load Balancing – Ensures that a reconnecting client is routed to the correct server.
Cross-Server Communication – Messages need to be passed between WebSocket servers to ensure continuity for users, regardless of which server they land on.
Without one of these approaches, a distributed WebSocket service risks losing messages, duplicating updates, or failing to reconnect users properly.
3. Handling Reconnections and Message Delivery Guarantees
WebSocket connections will drop—whether due to network failures, cloud autoscaling, or the client closing the app. The challenge isn’t just maintaining connections but ensuring no messages are lost when a client reconnects.
A common mistake is assuming WebSockets guarantee message delivery—they don’t. If a connection drops, messages sent during that time vanish unless the system accounts for them.
To prevent message loss, a WebSocket service should:
Track the last message received by each client
When a client reconnects, it sends its last known message ID to the server.
The server can then determine which messages were missed.
Replay missing messages from a historical data store
Messages sent while the client was offline should be stored in a database or persistent event queue.
The server retrieves and resends missed messages before resuming live updates.
Implement an acknowledgment system
- Borrowing from MQTT’s QoS 1 (at least once) model, clients and servers can track unacknowledged messages and retry them as needed.
This approach ensures an at-least-once delivery guarantee while keeping the WebSocket service stateless and scalable.
Best Practices for Running WebSockets in Kubernetes
1. Stateless WebSocket Servers with Consistent Hashing
WebSockets are stateful, making them challenging to scale in distributed systems. A common mistake is using sticky sessions, where a client must always connect to the same server. This doesn’t scale well because:
If a server goes down, all WebSocket connections on that server are lost.
Clients may reconnect to a different server, which doesn’t have their previous state.
Instead, consistent hashing should be used to distribute WebSocket connections dynamically across multiple servers, without requiring sticky sessions.
This approach is how large-scale systems like Uber handle real-time tracking—by ensuring state is distributed efficiently across multiple servers.
2. Handling Cross-Node Communication
When WebSockets run across multiple servers, messages must be shared across all relevant clients, even if they reconnect to a different server.
If a client reconnects to a different node, it won’t automatically receive updates unless the system is designed to pass messages between WebSocket servers.
Without cross-server communication, WebSockets become unreliable at scale.
3. Tuning Kernel and Network Settings
WebSockets rely on TCP connections, which are managed by the operating system’s network stack—not the application itself. By default, most operating systems limit TCP memory allocation, which can bottleneck WebSocket scalability if not tuned properly.
Tuning OS-level settings is critical—without it, WebSockets will hit system limits long before reaching application bottlenecks.
Comparing WebSockets with Other Real-Time Protocols
Not all real-time applications require WebSockets. In some cases, Server-Sent Events (SSE) or MQTT might be a better fit.
WebSockets: Best for full-duplex communication where clients also send frequent updates.
SSE: A simpler option for unidirectional updates (e.g., stock tickers, live news feeds).
MQTT over WebSockets: A lightweight alternative with built-in message delivery guarantees—useful in IoT and messaging applications.
Q&A: Practical WebSocket Scaling Challenges
1. How can I ensure WebSocket messages are acknowledged?
WebSockets don’t have built-in acknowledgments like REST APIs, so you often need to implement a custom acknowledgment system. One approach is:
Assigning each message a unique ID.
Requiring the client to acknowledge receipt.
If no acknowledgment is received within a timeout period, the server resends the message.
For a more structured approach, MQTT supports built-in message acknowledgment levels (QoS 1, QoS 2) that can be adapted for WebSockets.
2. Can WebSockets scale without a message queue?
For stateless applications like multiplayer games (e.g., Fortnite), where missing a message doesn’t matter, you can skip a queue and rely on consistent hashing to route messages to the right server.
However, if message consistency matters (e.g., chat applications, financial transactions), a queue is essential for delivering messages reliably across distributed nodes.
3. When should I stop using WebSockets for chat applications?
Never—WebSockets are the best tool for the job. The real question is how to structure WebSockets properly to avoid scalability pitfalls. If polling (e.g., long polling, SSE) causes unnecessary overhead, stick with WebSockets.
4. When should I use WebSockets vs. Server-Sent Events (SSE)?
Use WebSockets when you need bidirectional communication (e.g., collaborative editing, gaming, chat).
Use SSE when the server only pushes updates (e.g., news feeds, live notifications).
5. Does SSE work like ChatGPT, where you can’t type until a response is received?
Yes, SSE is a great fit for response-streaming applications, such as ChatGPT. Since the server is just pushing data, the client doesn’t need to send frequent messages, making SSE more lightweight than WebSockets for this use case.
Wrapping up
WebSockets are a powerful tool for real-time applications, but they require careful scalability planning in a distributed system. By using message queues, consistent hashing, and proper reconnection strategies, you can build a high-performance WebSocket service that handles thousands (or millions) of concurrent connections in Kubernetes.
While WebSockets may seem simple at first glance, getting them right at scale is an entirely different challenge—one that requires tuning every layer, from kernel settings to distributed event queues.
Got a WebSocket challenge of your own? Share your experience, and let's keep the conversation going! Send an us an email (hello@platformatic.dev) with any feedback or questions you have.
Subscribe to my newsletter
Read articles from Luca Maraschi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
