Scaling Sidekiq Workers on Kubernetes with KEDA

bgroupebgroupe
9 min read

Note: This article was originally published in April, 2020

Sidekiq + k8s

Running Rails applications with Sidekiq in Kubernetes allows for the decoupling of background and web processes to take advantage of Kubernetes’ inherent scalability. A typical implementation would look something like this:

apiVersion: v1
kind: Secret
metadata:
    name:  redis-secret
type: Opaque
data:
  REDIS_PASSWORD: Zm9vYmFy
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cool-rails-app-web
  labels:
    app.kubernetes.io/name: cool-rails-app-web
    workload-type: web
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: cool-rails-app-web
  template:
    metadata:
      labels:
        app.kubernetes.io/name: cool-rails-app-web
    spec:
      containers:
      - name: cool-rails-app
        image: bgroupe/cool-rails-app:latest
        command: ["bundle"]
        args:
          - "exec"
          - "puma"
          - "-b"
          - "tcp://0.0.0.0:3000"
          - "-t"
          - "1:1"
          - "-w"
          - "12"
          - "--preload"
        env:
          - name: REDIS_HOST
            value: redis
          - name: REDIS_ADDRESS
            value: redis:6379
          - name: REDIS_PASSWORD
            valueFrom:
              secretKeyRef:
                name: redis-secret
                key: REDIS_PASSWORD          
        ports:
        - name: http
          containerPort: 3000
          protocol: TCP
          livenessProbe:
            httpGet:
              path: /lbcheck
              port: http
          readinessProbe:
            httpGet:
              path: /lbcheck
              port: http 
      imagePullSecrets:
        - name: mysecretkey
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cool-rails-app-sidekiq
  labels:
    app.kubernetes.io/name: cool-rails-app-sidekiq
    workload-type: sidekiq
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: cool-rails-app-web-sidekiq
  template:
    metadata:
      labels:
        app.kubernetes.io/name: cool-rails-app-sidekiq
    spec:
      containers:
      - name: cool-rails-app
        image: bgroupe/cool-rails-app:latest
        command: ["bundle"]
        environment: 
        args:
          - "exec"
          - "sidekiq"
          - "-q"
          - "cool_work_queue"
          - "-i"
          - "0"
        env:
          - name: REDIS_HOST
            value: redis
          - name: REDIS_ADDRESS
            value: redis:6379
          - name: REDIS_PASSWORD
            valueFrom:
              secretKeyRef:
                name: redis-secret
                key: REDIS_PASSWORD
      imagePullSecrets:
        - name: mysecretkey

Sidekiq is multi-threaded and the default number of threads is 25, which is ample for most use cases. However, when throughput increases, we may need to scale the number of processes out horizontally to leave the currently processing jobs undisturbed. Kubernetes provides the Horizontal Pod Autoscaler controller and resource out-of-the-box to scale pods based on cluster-level metrics collected by enabling the kube-state-metrics add-on. It also supports custom metrics by way of adapters, the most popular being the Prometheus adapter. Using this technique, if we wanted to scale our Sidekiq workers based on aggregated queue size we would typically need to:

  1. Install and configure Prometheus somewhere.

  2. Install and configure a Prometheus Redis exporter pointed to our Sidekiq Redis instance, and make sure it exposes the key and list length for each Sidekiq queue we want to monitor.

  3. Install the k8s-prometheus-adapter in our cluster and configure it to adapt metrics from our Prometheus server.

  4. Deploy an HPA spec with a custom metric targeted at our Prometheus metrics adapter.

As a general rule, it’s wise to set up Prometheus monitoring, but it’s a considerable amount of work, and there are many moving pieces to maintain to achieve the use-case of periodically checking the length of a Redis list.

Enter: KEDA

KEDA, or Kubernetes-based Event Driven Autoscaling, is a lightweight operator for HPAs that acts as a metrics server adapting a whole host of events from a myriad of data sources. Sidekiq stores enqueued jobs in a Redis list and, luckily, there is a KEDA adapter specifically for scaling based on the length of a Redis list. To use KEDA, you create a CRD called a ScaledObject, with a dead-simple spec. The KEDA operator generates an HPA targeting your deployment when the ScaledObject is registered. These are considerably fewer pieces to achieve the same effect.

KEDA is fairly straightforward to install, and there is very little customization required. I prefer to install with Helm, but you can also install via the manifest examples provided in the KEDA Github repo:

git clone https://github.com/kedacore/keda && cd keda

kubectl apply -f deploy/crds/keda.k8s.io_scaledobjects_crd.yaml
kubectl apply -f deploy/crds/keda.k8s.io_triggerauthentications_crd.yaml
kubectl apply -f deploy/

This will install the operator, register the ScaledObject CRD, and an additional CRD called TriggerAuthentication, which is used for providing auth mechanisms to the operator and reusing credentials between ScaledObjects.

The Setup

Creating the scalers

apiVersion: keda.k8s.io/v1alpha1
kind: TriggerAuthentication
metadata:
  name: redis-auth
spec:
  secretTargetRef:
  - parameter: password
    name: redis-secret
    key: REDIS_PASSWORD
---
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: sidekiq-worker  
  labels:
    app: cool-rails-app-sidekiq
    deploymentName: cool-rails-app-sidekiq
spec:
  pollingInterval: 30
  cooldownPeriod:  300
  minReplicaCount: 1
  maxReplicaCount: 10
  scaleTargetRef:
    deploymentName: cool-rails-app-sidekiq
  triggers:
  - type: redis
    metadata:
      address: REDIS_ADDRESS      
      listName: queue:cool_work_queue
      listLength: "500"
    authenticationRef:
      name: redis-auth
---

Let’s say we have a shared Redis database that multiple applications connect to, and this database is protected with a password. Authentication can be provided directly on the ScaledObject, but if we store our credentials in a Kubernetes secret, then we can use a TriggerAuthentication object to delegate auth and share the same auth mechanism between multiple scaling resources. Here, our TriggerAuthentication resource references a secret, called redis-secret , which contains a REDIS_PASSWORD key, which is basically all we need to authenticate to Redis. In the ScaledObject, we reference the TriggerAuthentication resource with the authenticationRef key.

Now for the ScaledObject: KEDA supports scaling both Kubernetes deployment and job resources. Since Sidekiq is run as a deployment, our ScaledObject configuration is very simple:

# The amount of time between each conditional check of the data source. 
pollingInterval: 30
# The amount of time to wait after scaling trigger has fired to scale back down to the minimum replica count.
cooldownPeriod: 300  
# The minimum number of replicas desired for the deployment (Note: KEDA supports scaling to/from 0 replicas)
minReplicaCount: 1 
#The maximum number of replicas to scale
maxReplicaCount: 10 
# The name of the deployment we want to scale.
scaleTargetRef: "deploymentName"

The trigger portion contains our data source and scaler type. Here is where you would also be able to add a Redis password for authentication. This is the only tricky part: these sensitive values must be env vars referenced by the container of the target deployment.

triggers:  
  - type: redis    
    metadata:      
      address: REDIS_ADDRESS # host:port format             
      listName: queue:cool_work_queue      
      listLength: "500"    
    authenticationRef:      
      name: redis-auth

The key that Sidekiq writes for the queue list is prefixed with queue: and the queue name is declared when the Sidekiq process is started. Let’s say our jobs are relatively fast, so we only need to start scaling when our queue hits 500. List length must be declared as a quoted string or the CRD validations will fail on creation.

Let’s create the CRDs and watch the KEDA operator generate an HPA on our behalf:

kubectl apply -f scaled-object-with-trigger-auth.yaml
kubectl get hpa
NAME                      REFERENCE                                     TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-sidekiq-worker   Deployment/cool-rails-app-sidekiq   0/500 (avg)   1         10        1          10s

That’s it. We now have an HPA managed by KEDA which will scale our Sidekiq worker on queue length. Any changes to the HPA, like editing the list length, are done by applying the ScaledObject — it’s that simple.

Testing

To see it in action, we can generate load on our Sidekiq instance using a fake job. Our job will be acked, and then will sleep a random amount of time and print a message.

class AckTest
  include Sidekiq::Worker
  include Sidekiq::Lock::Worker
  sidekiq_options :queue => :cool_work_queue
  def perform(msg)
    puts "ACK:"
    sleep rand(180).to_i
    puts "MSG: #{msg}"
  end
end

To run this, open a Rails console in your web pod and paste the class definition. Then enqueue a large number of them:

1000.times { AckTest.perform_async("Let's scale up") }

Within the 30 second interval provided, you should see the HPA fire up a handful of extra Sidekiq pods which should start pulling work off of the queue. If the work is not performed by the end of the cool-down period (in this case, 5 minutes), then the additional pods will remain for 5 more minutes until the queue is polled by KEDA again.

Tuning

Now that we can spin up Sidekiq workers roughly based on throughput, we now have a situation where our worker pods will be spinning up and tearing down dynamically. The algorithm used by the HPA for scaling is as follows:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

It stands to reason that depending on how long the average job takes to complete, as the queue drains, the HPA will begin scaling workers back down. To ensure that we are not terminating processes in the middle of performing work, we need to add some buffer time to the shutdown. We have a couple of options:

  1. We can wait an arbitrary amount of time for the worker to finish processing before receiving a SIGTERM. This is achieved by adding a terminationGracePeriodSeconds field on the deployment spec, and using our best guess to determine how long to delay the termination signal.

  2. The preferable option is to delegate shutdown to Sidekiq’s internal mechanism. In Kubernetes, this is done by adding a pre-stop hook. We can tell a Sidekiq process to stop accepting jobs from the queue and only attempt to complete the jobs it is currently performing for a given amount of time. We can also abide by a work timeout set on the OS level of the container on startup.

Our deployment previously started Sidekiq like this:

spec:      
  containers:      
    - name: cool-rails-app        
      image: bgroupe/cool-rails-app:latest        
      command: ["bundle"]        
      args:          
        - "exec"          
        - "sidekiq"          
        - "-q"          
        - "cool_work_queue"          
        - "-i"          
        - "0"

We need to add a few more options. The first is the timeout option to specify how long we should allow our workers to finish jobs when shutting down. Let’s set it to 60 seconds. The second option is a pidfile path. Since we only one run Sidekiq process per container, specifying the name and path of the pidfile allows us to reference it later in shutdown process without having to search the file system.

...
- "-P"
- "/tmp/sidekiq.pid"
- "-t"
- "60"

Let’s add the pre-stop hook, under the lifecycle options of the container spec:

spec:      
  containers:      
    - name: cool-rails-app        
      image: bgroupe/cool-rails-app:latest
      lifecycle:
        preStop:
          exec:
            command: 
              - "sidekiqctl" 
              - "stop"
              - "/tmp/sidekiq.pid"
              - "120"

The final argument supplied to the sidekiqctl stop command is the kill_timeout, which is the overall timeout that stops the Sidekiq process. This obviously needs to be longer than the timeout option supplied at startup, or else the process will be killed while the jobs are still working. In this example, we’ve set it to twice the amount of the timeout (which also happens to be the default Kubernetes termination grace period). Now we can ensure that we are allowing the maximum amount of time for work to be completed. If your app has long-executing jobs, then you can tweak these timeouts as you see fit. From the Sidekiq docs:

Any workers that do not finish within the timeout are forcefully terminated and their messages are pushed back to Redis to be executed again when Sidekiq starts up.

Epilogue

Many other asynchronous work queues inspired by Sidekiq utilize Redis list-based queues in a similar fashion, making this scaling pattern applicable outside of a Rails context. In recent versions, a more specific metric for determining worker throughput called “queue latency” was made available. It works by determining the amount of time the oldest job in the queue was enqueued, giving a better idea of how long jobs are taking to complete. To determine this value, some computation is required, making this particular pattern we’ve just implemented an insufficient one. Luckily, KEDA supports writing custom scaler integrations, and rolling your own is fairly straightforward. I will cover building this scaler in a future article.

KEDA is a wonderfully simplified framework for leveraging Kubernetes autoscaling features and supports a whole host of other event sources. Give it a try.

0
Subscribe to my newsletter

Read articles from bgroupe directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

bgroupe
bgroupe

writing about devops-stuff, infrastructure and automation