System Design: Message Queue Patterns and Anti-Patterns

It was a Tuesday afternoon when the pager went off. Not the gentle, informational alert that a pod had restarted, but the screaming, all-hands-on-deck siren that meant revenue was actively being lost. The team had launched a new video processing service for their flagship product a month prior. For "V1," they'd kept it simple: an API endpoint accepted video uploads, dropped a job message into a single RabbitMQ queue, and a fleet of workers picked up jobs and encoded them.

It scaled beautifully in staging. During the beta, it was flawless. But today, their first enterprise customer had just signed on and was running a bulk import of thousands of tiny, five-second "preview" clips. Simultaneously, a handful of their free-tier users were uploading their two-hour-long 4K drone footage. The result? The enterprise customer's critical, time-sensitive jobs were stuck in a massive backlog behind a handful of enormous, low-value jobs. The system hadn't fallen over; it had ground to a halt in a way that was far more insidious.

The team's first instinct was a classic one: "We need more consumers! Let's scale the worker deployment to 50 replicas!" It's the engineering equivalent of hitting a machine to make it work. It feels productive, but it rarely addresses the root cause. This knee-jerk reaction stems from a common but deeply flawed belief that I see in teams time and time again.

My thesis is this: Most message queue failures are not problems of throughput; they are problems of classification. We treat queues like simple pipes, focusing only on how much we can push through them. The real leverage, however, comes from treating them like intelligent sorting systems. Throwing more undifferentiated resources at an unclassified workload is the most expensive and least effective way to scale.

Unpacking the Hidden Complexity

The team's "quick fix" of scaling out the consumers did, in a way, work. The queue eventually drained. But the victory was hollow, and the cost was hidden. Let's dissect why this approach is a trap.

First, the economic cost was obvious. They were now paying for a massive fleet of workers that were only necessary during pathological workload spikes. For 95% of the day, most of these expensive pods would sit idle, consuming resources and contributing to cloud bill bloat.

Second, the "thundering herd" problem. When the long jobs finally finished, dozens of newly freed workers simultaneously tried to grab the next message, putting a sharp, unnecessary load spike on the message broker itself. More insidiously, this pattern can extend downstream. Imagine if each video encoding job required writing several records to a central database. Scaling from 10 to 50 workers without considering the database's connection pool limits or write capacity is a recipe for cascading failure. You've simply moved the bottleneck, not solved it.

The most critical failure, however, is one of architecture and product. The system had no concept of "fairness" or "priority." It was a first-in, first-out (FIFO) lottery. By failing to classify the work, they had created a system where a low-value user could, by pure chance, inflict a terrible user experience on a high-value one. This is not just a technical failure; it is a business failure.

The Airport Security Analogy

Think of a message queue like the security checkpoint at an airport. The naive approach is to have one single, massive line for every passenger. When the line gets too long, the solution is to open more identical screening lanes. This is scaling the consumers.

Does it help? A little. But what happens when someone at the front of the line has packed poorly and needs a full bag search? The entire line behind them waits. It doesn't matter if you have 100 lanes open; the head-of-line blocking problem persists within each lane.

A well-architected system, however, looks like a modern airport. There are separate, dedicated lines: one for first-class passengers, one for TSA PreCheck, and one for general boarding. This is classification. It ensures that high-priority, quick-to-process passengers (your enterprise customer's short videos) are not stuck behind the family of five checking six oversized bags (your free-tier user's 4K drone footage). It’s not just about speed; it's about providing a predictable and appropriate level of service for different classes of work.

This table compares the naive "one big queue" approach with a more sophisticated, classification-first mindset.

Architectural Concern	Naive Approach (One Big Queue)	Classification-First Approach
Performance	Subject to head-of-line blocking. A single slow job can stall all others.	High-priority work is isolated and processed quickly. Predictable latency for critical tasks.
Scalability	Scales poorly. Requires over-provisioning all consumers for the worst-case scenario.	Allows for targeted scaling. Can scale the "high priority" consumer pool independently.
Cost	High operational cost due to over-provisioning compute resources for the entire workload.	Cost-efficient. Resources are allocated based on the value and requirements of the work.
Resilience	A "poison pill" message (a malformed message causing a consumer to crash) can halt the entire system.	A poison pill in a low-priority queue only affects that specific workload, not critical operations.
Cognitive Load	Simple to set up initially, but extremely difficult to debug and reason about under load.	Requires more upfront design, but is far easier to monitor, debug, and manage in production.

The lesson is clear. The initial simplicity of a single queue is a siren song that leads to operational chaos. True simplicity is a system you can reason about, and that means starting with classification.

The Pragmatic Solution: Core Queueing Patterns

Let's move from theory to practice. Building a robust, scalable asynchronous system isn't about finding a magical new technology. It's about applying a few battle-tested patterns correctly. These are the blueprints that avoid the traps we've discussed.

1. The Competing Consumers Pattern

This is the most fundamental pattern and the one the team in our story started with. Multiple consumers listen on the same queue, and the message broker distributes messages among them. When one consumer receives a message, it's locked and hidden from the others.

This pattern is the workhorse for horizontal scalability. If you have a thousand independent tasks to process, you can spin up a thousand consumers to process them in parallel. It's simple, effective, and built into every major message broker like RabbitMQ, SQS, or Kafka consumer groups.

But its power is also its weakness. It provides no guarantee of processing order, and as we saw, it's vulnerable to head-of-line blocking if the work items vary dramatically in processing time or importance. It's the right pattern for workloads of homogenous, low-variance tasks. Think thumbnail generation for images of a similar size. It's the wrong pattern for a mix of critical and non-critical work.

2. The Priority Queue Pattern

This is the direct solution to our opening story's dilemma. Instead of one queue, you create multiple queues, each representing a different priority level.

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#e3f2fd", "primaryBorderColor": "#1976d2", "lineColor": "#333"}}}%%
flowchart TD
    subgraph Producers
        P1[API Endpoint]
    end

    subgraph "Message Broker"
        direction LR
        Q1[Queue high_priority]
        Q2[Queue default_priority]
        Q3[Queue low_priority]
    end

    subgraph "Consumer Pools"
        direction TB
        subgraph "High Priority Pool"
            C1[Worker]
            C2[Worker]
        end
        subgraph "Default Priority Pool"
            C3[Worker]
            C4[Worker]
            C5[Worker]
        end
        subgraph "Low Priority Pool"
            C6[Worker]
        end
    end

    P1 -- "Job Type Enterprise" --> Q1
    P1 -- "Job Type Pro" --> Q2
    P1 -- "Job Type Free" --> Q3

    Q1 --> C1
    Q1 --> C2

    Q2 --> C3
    Q2 --> C4
    Q2 --> C5

    Q3 --> C6

This diagram illustrates the Priority Queue pattern. The API Endpoint acts as a producer, but it's now intelligent. It inspects the incoming request (e.g., based on the user's subscription tier) and routes the message to the appropriate queue: high_priority, default_priority, or low_priority. We then have separate pools of consumers, each dedicated to a specific queue. Notice we can provision resources intelligently: a small, dedicated pool for the high-priority queue ensures immediate processing, a larger pool for default traffic, and perhaps only a single, opportunistic worker for the low-priority queue. This architecture guarantees that a thousand "free tier" jobs can never block a single "enterprise" job.

Some brokers (like Amazon SQS) offer a limited number of priority levels within a single queue, but I've found that using separate, dedicated queues is often a cleaner and more flexible architectural choice. It makes monitoring, alerting, and scaling far more explicit.

3. The Claim Check Pattern

What about the content of the messages themselves? In our video encoding example, the job message might contain user ID, encoding settings, and other metadata. But what about the video file itself? A common mistake is to try and stuff large payloads directly into the message. Most message brokers are optimized for small, fast messages, typically under 256KB. Pushing multi-megabyte or gigabyte payloads through them is inefficient and can bring the broker to its knees.

This is where the Claim Check pattern comes in. Instead of putting the data in the message, you put the message in the data.

flowchart TD
    classDef storage fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px
    classDef broker fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
    classDef service fill:#e3f2fd,stroke:#1565c0,stroke-width:2px

    Producer[Producer Service]
    S3[Object Storage S3]
    Queue[Message Queue]
    Consumer[Consumer Service]

    Producer -- "1 Uploads large file" --> S3
    S3 -- "2 Returns file_url" --> Producer
    Producer -- "3 Enqueues message {file_url}" --> Queue
    Queue -- "4 Delivers message" --> Consumer
    Consumer -- "5 Downloads file using url" --> S3
    Consumer -- "6 Process file" --> Consumer

    class Producer,Consumer service
    class S3 storage
    class Queue broker

This diagram shows the Claim Check pattern in action.

The Producer Service first uploads the large payload (the video file) to a robust, high-throughput storage system like Amazon S3 or Google Cloud Storage.
The storage system returns a stable identifier or URL for that object. This is the "claim check."
The producer then enqueues a very small, lightweight message containing this claim check and any other necessary metadata.
The Consumer Service receives this small message almost instantly.
It reads the claim check (the URL) from the message.
It then uses that URL to download the large payload directly from object storage for processing.

This pattern keeps your message broker lean, fast, and focused on what it does best: signaling. It delegates the job of storing large blobs to systems designed for that exact purpose. This is a perfect example of applying the Single Responsibility Principle to your infrastructure components.

Traps the Hype Cycle Sets for You: Common Anti-Patterns

For every powerful pattern, there is an equally tempting anti-pattern. These are the architectural shortcuts that feel clever at the time but create brittle, unmaintainable systems down the road.

1. The Monolithic Queue

This is the anti-pattern from our opening story. A single queue is used for disparate types of work: sending emails, processing videos, generating reports, etc. It’s the asynchronous equivalent of a monolith's "God object." It violates the principle of separation of concerns and guarantees that unrelated workloads will interfere with each other. A spike in report generation jobs can suddenly delay critical password reset emails. The fix is always classification and separation, as seen in the Priority Queue pattern.

2. Request-Reply over Queues (RPC over MQ)

This is one of the most dangerous and seductive anti-patterns. An engineer, wanting to make a service call "resilient," decides to replace a simple synchronous HTTP call with a message queue. The flow looks like this: Service A puts a message on a request queue, then immediately starts listening on a reply queue for the response from Service B.

Why is this so bad? You've taken the simplicity and immediate feedback of a synchronous call and replaced it with the complexity of asynchronous messaging, all without gaining any of the real benefits of decoupling.

sequenceDiagram
    participant A as Service A
    participant MQ as Message Broker
    participant B as Service B

    A ->> MQ: 1. Publish to request_queue {correlation_id}
    Note right of A: Now I wait... and wait...
    MQ ->> B: 2. Deliver message
    B ->> B: 3. Process work
    B ->> MQ: 4. Publish to reply_queue_{correlation_id}
    MQ ->> A: 5. Deliver reply
    Note left of A: Finally! Now I can proceed.

This sequence diagram reveals the awkward reality of RPC over a message queue.

Service A has to generate a unique correlation_id and publish a message.
It then has to block or implement a complex state machine while it waits for a response on a dedicated reply queue (which itself can be a management nightmare; do you use one reply queue or one per request?).
Service B processes the request.
It then has to publish a response to the correct reply queue using the correlation_id.
Service A receives the message, matches the correlation_id to its original request, and finally continues.

You've introduced massive latency, two points of failure (the two queue interactions), and significant code complexity for correlation. All for what? If Service A needs an immediate response to continue its work, it is, by definition, synchronously coupled to Service B. A simple, well-instrumented HTTP call with a proper retry mechanism and circuit breaker is a far superior, simpler, and more honest architecture for this use case. Use queues for fire-and-forget events and decoupling workflows, not for faking synchronous calls.

3. Ignoring Idempotency

"Exactly-once" delivery is the holy grail of messaging systems, and for most practical purposes in distributed systems, it's a myth. Most brokers can only realistically guarantee "at-least-once" delivery. This means that under certain failure conditions (a consumer crashes after processing but before acknowledging, a network partition, etc.), your consumer will receive the same message more than once.

An anti-pattern is to build your consumers with the optimistic assumption that this will never happen. A consumer that is not idempotent might, for example, charge a customer's credit card twice or send the same welcome email five times.

The only robust solution is to design for idempotency from day one. This means ensuring that processing the same message multiple times has the exact same effect as processing it once. Common techniques include:

Using a unique transaction ID from the message to check if the work has already been done (e.g., INSERT ... ON CONFLICT DO NOTHING).
Designing your business logic to be naturally idempotent (e.g., setting a user's status to active is an idempotent operation).
Using distributed locks or database transactions to gate the critical operation.

Never trust the network. Assume every message will arrive more than once.

Architecting for the Future

We've journeyed from a simple, failing system to a set of robust, principle-guided patterns. The core lesson is not about RabbitMQ versus Kafka, or SQS versus Pub/Sub. The specific technology is secondary to the architectural intent. The most elegant solutions are not complex; they are thoughtfully simple. They don't fight the nature of distributed systems; they embrace it.

Your message queues are not just pipes. They are a critical tool for controlling the flow, pace, and priority of work in your entire system. They are the shock absorbers that decouple services, allowing them to scale and fail independently. When you fail to classify the work you put into them, you are not using a shock absorber; you are using a megaphone to amplify chaos.

Your First Move on Monday Morning

Go look at your dashboards. Find your busiest message queue. Ask yourself and your team three questions:

What different types of work are flowing through this single queue?
Is it possible for a low-priority, long-running task to delay a high-priority, short task?
If we had to triple the throughput of just one of those work types, could we do it without tripling the resources for all of them?

The answers to these questions will tell you if your queues are strategic assets or ticking time bombs. They will show you where a simple act of classification can save you from the next all-hands-on-deck pager alert.

So, I'll ask you directly: are your queues a well-organized airport, or are they a single, chaotic line where everyone is waiting on the slowest person in front?

TL;DR: Too Long; Didn't Read

Core Idea: Stop treating message queues like simple pipes. Most queueing problems are from a lack of classification, not a lack of throughput.
The Problem: A single queue for all work (a "monolithic queue") leads to head-of-line blocking, where low-priority, long jobs make high-priority, short jobs wait. Scaling consumers is an expensive, ineffective fix.
Pattern 1: Priority Queue. The solution. Use multiple queues for different priority levels (e.g., enterprise_queue, free_tier_queue). This isolates workloads and allows for targeted, cost-effective scaling.
Pattern 2: Claim Check. Don't put large payloads (files, big JSON blobs) in messages. Store the payload in object storage (like S3) and put a reference (the "claim check") in the message. This keeps your broker fast and lean.
Anti-Pattern 1: RPC over MQ. Don't use queues for synchronous request-reply communication. It's slow, complex, and brittle. If you need a response now, use a direct HTTP call with a circuit breaker.
Anti-Pattern 2: Ignoring Idempotency. Your consumers will receive duplicate messages. Design them to be idempotent (processing a message twice has the same result as processing it once) to avoid dangerous side effects like double-billing.

Message Queue Patterns and Anti-Patterns

Table of contents