Pub-sub pattern
In my previous articles, we built a messaging system using RabbitMQ. One limitation of that approach was that each message was delivered to only one consumer. For instance, if we had two consumers promotional.Email
and promotional.Sms
there would be two separate queues. That means, if we add n number of promotional consumer there would n number of queues. Another drawback of message queue was, publisher requires explicit routing to a specific consumers. This means that the publisher must know the exact queue or exchange a consumer is bound to. Instead of this setup, we'll now build a system where the publisher can send a message to multiple consumers simultaneously. This is known as the pub-sub (publish-subscribe) pattern. Here's how it can be visualized:
Diagram source: aws
Introduction
As the name suggests, "pub-sub" its an architectural pattern where a publisher sends out messages, and subscribers receives them. The cool part is that the publisher doesn't need to know who the subscribers are or how many of them are receiving the message. There are some terms to understand in pub sub:
Publisher: The entity that publishes message.
Message: Data published (sent) by publisher.
Topic: A central hub to manage similar queues. Subscribers of topic will receive messages related to this topic.
Subscribers: The entity that consumes and process message.
Publishers send messages to a central hub (topic), without any knowledge of who will receive the message. Any services interested in this publisher will subscribe to the topic where publisher publishes message. Subscribers express interest in one or more topics and receive messages related to those topics.
Implementation
There are several ways to implement the pub-sub pattern in a distributed system. Popular options include AWS services like SQS and SNS, which are commonly used for decoupling services and handling asynchronous messaging. SNS (Simple Notification Service) broadcasts messages to multiple subscribers, while SQS (Simple Queue Service) queues messages for processing. Another option is Apache Kafka, a distributed streaming platform renowned for its durability and fault tolerance, making it ideal for high-throughput scenarios. Kafka is well-documented and widely used in large-scale data heavy systems. NATS is another lightweight and high-performance messaging system that is well-suited for cloud-native applications. It supports the pub-sub pattern with low latency and minimal overhead, making it an excellent choice for distributed system where fast, reliable messaging is crucial. Azure offers similar solutions, such as Azure Service Bus for reliable messaging between services and Azure Event Grid for event-based pub-sub communication across various Azure services.
Implementing Pub-sub with websockets
Here, I will be using SignalR to implement pub-sub pattern. SignalR simplifies real-time communication between the server and clients, allowing you to send messages to multiple connected clients simultaneously. About system:
publisher.api
service acts as a publisher, broadcasting weather data to its subscribers.weathersub
is a service that consumes the weather data to perform specific processing tasks.weathersub01
is another service that also consumes the weather data for its own processing needs.
When i publish a weather message:
Subscribers receive messages:
Source code: Github
You might notice that weathersub01
receives the message twice. This happens because whenever a new subscriber connects, it triggers the publisher.api
to send out the weather data to all connected clients. As a result, every time a subscriber connects, weathersub01 and other subscribers receive a copy of the broadcasted message.
Using SignalR to introduce pub-sub in distributed system is very straightforward and fast. It simplifies real-time communication, making it easy to broadcast messages to multiple connected clients simultaneously. SignalR can be effective for certain use cases in a distributed system, but its suitability depends on specific requirements. If your system is just adapting pub-sub, then SignalR is a good choice due to its ease of integration and real-time messaging capabilities. It’s particularly useful for scenarios where real-time updates are needed and message persistence is not critical.
Disadvantages of implementing pub-sub using SignalR
One disadvantage of using SignalR for pub-sub is the potential for message duplication. When a new connection is established, it can trigger the publisher to resend data, causing multiple copies of the same message to be received by subscribers.
Another drawback is that if a subscriber is offline or unavailable when a message is sent, it will miss that message entirely. Unlike some other pub-sub systems that provide message persistence or retry mechanisms, SignalR doesn’t inherently store messages for offline subscribers, so any missed messages are not retried or delivered once the subscriber reconnects.
Handling disadvantages
To handle this, you can add retry mechanism and idempotency to every messages. This approach involves ensuring that every message is processed only once, even if it is received multiple times, and implementing strategies to retry messages if a subscriber is temporarily offline. However, implementing these solutions from scratch can be complex and time-consuming.
Instead of reinventing the wheel, it’s often more practical to use existing tools and systems designed to handle these challenges efficiently. For larger or more complex distributed systems where message durability, high throughput, and guaranteed delivery are crucial, specialized tools like Apache Kafka, NATS, AWS SQS and SNS, or Azure Service Bus and many more such tools might be more appropriate. These tools offer robust solutions for handling large volumes of messages, ensuring message delivery (retry mechanism), and providing fault tolerance.
Summary
This article explored the pub-sub (publish-subscribe) pattern, which allows a publisher to send messages to multiple subscribers without needing to know who they are. I explained various tools for implementing pub-sub, such as AWS SNS for broadcasting messages, Apache Kafka for durable, high-throughput streaming, NATS for lightweight, high-performance messaging, and Azure services like Service Bus and Event Grid for reliable messaging and event-based communication.
In the implementation part, i used SignalR to implement pub-sub with WebSockets, where publisher.api
broadcasts weather data to multiple services like weathersub
and weathersub01
. A noted disadvantage of SignalR is the potential for message duplication and the risk of missed messages if a subscriber is offline. To address these issues, you could implement custom retry mechanisms and idempotency to handle duplicates and missed messages. However, using specialized tools and systems that already handle these challenges efficiently is often more practical and less complex.
References
Subscribe to my newsletter
Read articles from Sushant Pant directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by