Optimizing Celery for Production Tasks

When deploying Celery in production, using the default configuration can quickly become a bottleneck. It’s tempting to spin up a few workers with a fixed concurrency and assume the job is done. But as the number of background tasks grows, so do the challenges: blocked queues, resource spikes, and unpredictable performance. In this article, I’ll walk you through how I structured Celery for production, focusing on queue isolation, autoscaling, and smarter load handling.

The Default Celery Setup

Most tutorials show you how to get Celery running with a single worker and a shared queue. That works for a while. But in practice, this is what you're really getting:

One or two Celery workers with a fixed number of processes
Tasks from all queues are being pulled in a round-robin fashion
No prioritization, critical tasks, and background jobs are treated the same

This might be fine at the beginning. Tasks get processed, and everything seems okay. However, once your application grows and starts generating hundreds or thousands of background jobs, problems appear fast.

The fixed workers continue to pull from all queues, blindly, regardless of task importance. If you suddenly have a spike in long-running tasks like reports, it can block urgent operations like order processing or payment confirmation. Your most critical tasks now have to wait in line behind non-essential ones, and that’s a problem.

Worse, if all tasks are routed into just one or two queues and those queues feed into a single shared worker, there’s no way to prioritize one over the other. In this kind of setup, your users will feel the impact. Delays become visible, responsiveness drops, and user satisfaction, one of the top priorities for any software engineer, starts to suffer.

This is exactly the kind of setup I started with, and exactly what I wanted to improve.

What I Changed and Why

Before diving into the worker configuration, I focused first on getting the queues set up properly. Since all tasks are pushed into queues and workers simply pull from them, it’s essential to ensure that each task lands in the right queue. Think of it this way: we have multiple queues, let’s call them Q1, Q2, Q3, and Q4, and it’s the application’s responsibility to assign tasks to the appropriate ones.

Each queue should be dedicated to a specific type of workload. For example, report-related tasks should go to Q2, background or low-priority tasks to Q3 and Q4, and all other general-purpose tasks to Q1 by default.

With task routing in place, I then moved to the worker configuration. The goal was to ensure that each group of related tasks is processed by a dedicated worker, optimized for the nature and expected volume of tasks in that queue.

I created three separate workers:

default: Handles high-priority operational tasks like (orders, payments, etc...), (tasks in Q1).
reports: Reserved for generating large or time-consuming reports (tasks in Q2).
others: Manages low-priority or background jobs (e.g., syncing, logs) (tasks in Q3 & Q4).

This separation ensures that no queue can interfere with another. For example, a burst in report generation won’t delay time-sensitive updates from the default queue, and that would make things way faster.

So here is the entire architecture shown in the image below:

Each worker is assigned only the queues it’s responsible for:

default handles only Q1
reports handles only Q2
others handles Q3 and Q4

This explicit routing prevents low-priority queues from blocking more urgent ones and keeps the system predictable.

Now that each worker is assigned to specific queues, it only processes tasks from those queues, and nothing else. But what happens when there are no tasks available for a given worker? Simply put: nothing.

The worker becomes idle, it doesn't consume unnecessary CPU or memory beyond what’s needed to stay alive. This is a huge advantage of queue isolation, each worker is scoped to a known workload, and it's not competing or grabbing tasks from unrelated queues. This keeps the system organized and minimizes unnecessary load.

But there’s more to it. What if you suddenly get a spike in one queue, say a flood of report generation requests? You’d want the system to respond dynamically, without you manually increasing concurrency or restarting workers. That’s exactly where the --autoscale flag becomes valuable.

The --autoscale option allows each worker to scale its concurrency between a minimum and maximum number of worker processes, depending on task volume. In practice, it means Celery will spin up more child processes when there's a queue buildup and scale them back down when demand drops.

Here’s how I configured autoscaling for each worker:

default: 1 to 4 concurrent processes
reports: 1 to 4 concurrent processes
others: 1 to 2 concurrent processes

This gives each worker enough headroom to handle short-term spikes without wasting resources during quiet periods. Instead of running 4 workers all the time, we start with 1 and let Celery increase concurrency only when needed. It’s a balance between responsiveness and efficiency.

Managing Multiple Workers with `celery multi`

With queue isolation and autoscaling in place, the next challenge is: how do you manage multiple workers efficiently?

You could run each worker with a separate systemd service file, but that quickly becomes tedious and error-prone. Instead, I opted to use celery multi, which is built specifically for running and controlling multiple named Celery workers as a group. It gives you a way to start, stop, and monitor all your workers together using a single service.

Each worker is defined by name and configured independently through celery multi, including its queues, concurrency, autoscaling settings, and individual PID/log files.

Here’s how I configured this using systemd:

`celery.service`

[Unit]
Description=Celery Multi-Worker Service
After=network.target

[Service]
Type=forking
User=ubuntu
Group=ubuntu
EnvironmentFile=/etc/conf.d/celery
WorkingDirectory=/<Path to your project>
ExecStart=/bin/bash -c "${CELERY_BIN} -A ${CELERY_APP} multi start default reports others --pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} -Q:default Q1 --autoscale:default=4,1 --time-limit:default=3000 -Q:reports Q2 --autoscale:reports=4,1 --time-limit:reports=3000 -Q:others Q3,Q4 --autoscale:others=2,1 --time-limit:others=10000"
ExecStop=/bin/bash -c "${CELERY_BIN} multi stopwait default reports others --pidfile=${CELERYD_PID_FILE}"
ExecReload=/bin/bash -c "${CELERY_BIN} -A ${CELERY_APP} multi restart default reports others --pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} --loglevel=${CELERYD_LOG_LEVEL} -Q:default Q1 --autoscale:default=4,1 --time-limit:default=3000 -Q:reports Q2 --autoscale:reports=4,1 --time-limit:reports=3000 -Q:others Q3,Q4 --autoscale:others=2,1 --time-limit:others=10000"
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

This starts three named workers: default, reports, and others. Each worker is isolated and independently configured:

--pidfile / --logfile: Uses %n in the config file /etc/conf.d/celery to assign a unique file per worker.
-Q:<worker name> <queue name>: Each worker is assigned to specific queues.
--autoscale:X,Y: Enables dynamic scaling of concurrency between Y (min) and X (max).
--time-limit: Sets a hard timeout (in seconds) for task execution. If a task exceeds this time, the worker is killed and the task is marked as failed.

Why This Works Well in Production

With this setup:

Each worker has its own name, queue bindings, autoscale policy, and log/PID files.
You can manage all workers together using simple systemd commands (start, stop, restart).
You don’t need multiple .service files, which keeps your deployment clean.
It scales seamlessly with your queue design, you can always add another named worker later.
Because you are using one service now, “self-healing“ with Restart=always and RestartSec=5 works perfectly, so you will not face any problem in production related to celery not being active for some reason.

This structure not only improves reliability and performance, but also simplifies operational overhead when you need to update or monitor your task workers.

I didn’t want to write yet another article on how to install Celery. Instead, this post dives into the real improvements that make a difference when your system starts scaling. By isolating queues, autoscaling intelligently, and assigning proper limits, Celery becomes a stable and predictable part of your infrastructure.

That's all for now. If you're interested in more advanced content about WSGI, Apache, or other DevOps topics, there are many articles in the DevOps series that you might find interesting.

Mostafa-DE Fayyad

Software Engineer

Setting Up Celery for Production: Isolated Queues, Autoscaling, and Smarter Task Management

The Default Celery Setup

What I Changed and Why

Managing Multiple Workers with `celery multi`

`celery.service`

Why This Works Well in Production

Subscribe to my newsletter

Mostafa-DE

Mostafa-DE

Setting Up Celery for Production: Isolated Queues, Autoscaling, and Smarter Task Management

The Default Celery Setup

What I Changed and Why

Managing Multiple Workers with celery multi

celery.service

Why This Works Well in Production

Subscribe to my newsletter

Mostafa-DE

Mostafa-DE

Managing Multiple Workers with `celery multi`

`celery.service`