Next up in my quest to Learn All The Things™ on the CNCF graduated projects page, we’re going to take a look at a lesser known Argo project: Argo Rollouts. In a nutshell, Argo Rollouts are a drop-in replacement for the Deployment object that provide much more automation of common progressive deployment patterns such as Canary and Blue-Green. Rollouts can also optionally integrate with ingress controllers and service meshes, and can even query and interpret metrics via APIs to drive their autonomous behaviour.

Deployment Strategies

The Kubernetes Deployment object is probably the resource we use most often, at least until we start building more advanced clusters that leverage service meshes. It’s a rock-solid native Kubernetes object that helps us declare what a set of Pods should look like that run a workload. It provides the famous Kubernetes control loop that keeps the Pods we’ve declared running, and it also offers some basic functionality for safely updating a workload.

Recall that a Deployment, under the hood, is managing ReplicaSets, which you can think of as versions of our workload configuration. When we make a change to that configuration, a new ReplicaSet is created, and a previous ReplicaSet is ultimately deprecated. Deployments attempt to do this safely using one of two methods:

RollingUpdate (the default): The new ReplicaSet is gradually scaled up to the desired number of Pods, as the old ReplicaSet is gradually scaled down. We can also influence this type of update by limiting how many Pods we’ll tolerate as unavailable (with MaxUnavailable) and how many additional Pods we will allow during the update (with MaxSurge).
Recreate: With this strategy, the entire existing ReplicaSet is scaled down and terminated before the new one is created. This is sometimes helpful if you want a clean cut off of traffic between different versions of your workload.

Often when we’re first trying Kubernetes, we learn how to implement versions of the canary and blue-green patterns by combining multiple Deployments with a Service object.

For example, we can run a blue Deployment and a green Deployment, and switch between them easily with a Service selector. Or we can run a canary Deployment with a smaller number of Pods, and let the Service object select this along with a larger production Deployment. But these techniques don’t scale well, and they require constant manual intervention to manage. This is where Argo’s automation can help.

The Rollout Object

Argo provides us with a new custom resource definition (CRD): the Rollout.

Essentially this object combines everything we can declare in a Deployment object with a much more advanced strategy definition. Within the strategy we can now describe the steps required to successfully rollout updates using the canary or blue-green patterns, including traffic splitting and approval steps. Let’s walk through a basic example to see how this works!

Prerequisites

To follow along, you’ll need access to a Kubernetes cluster. I’m normally a fan of Kind, or even Minikube, but when writing this post I struggled to get local forwarding of the LoadBalancer to work reliably enough to actually demonstrate traffic splitting. You might have more success than me and you’re welcome to try! But full disclosure, I spun up a GKE cluster in the end.

Installing Argo Rollouts

To set up Argo Rollouts we’ll create a namespace for the Argo controller, and we’ll install the other CRDs we need:

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

We’ll also install the Rollouts plugin for kubectl, which will give us access to the kubectl argo rollouts sub-commands. You can obtain this from the releases page, or if you’re using Homebrew just run:

brew install argoproj/tap/kubectl-argo-rollouts

Creating a Rollout

We’re going to create a Rollout object that uses the rather excellent Argo Rollouts web app. This app gives us a really nice visualisation of what’s happening as we release or rollback updates. We’ll also create a LoadBalancer object so we can access the app in a browser. Let’s start by creating the rollout.yaml file below:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  replicas: 5
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {}
      - setWeight: 40
      - pause: {duration: 10}
      - setWeight: 60
      - pause: {duration: 10}
      - setWeight: 80
      - pause: {duration: 10}
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: rollouts-demo
  template:
    metadata:
      labels:
        app: rollouts-demo
    spec:
      containers:
      - name: rollouts-demo
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:
          requests:
            memory: 32Mi
            cpu: 5m

As you can see, most of this spec looks very much like a Deployment object. The big difference is the strategy section, which is specific to the Rollout CRD. In this section we specify the canary pattern, and then define the steps that we want for a successful rollout as a list. These are basically the automation instructions the controller will follow when we want to rollout an update.

First we set the weight of the canary to 20. In other words, we ask for 20% of the available Pod replicas to match the canary definition. Elsewhere in the spec we can see there are 5 Pod replicas, so 1 of them will match the canary. Next we have an empty pause definition, which means an indefinite pause; in other words, manual intervention will be required here to promote the rollout and continue with the next steps.
We then proceed with the rest of the steps in the canary. We set the weight to 40% (2 of 5 Pods), and wait for 10 seconds. Then we set the weight to 60% (3 of 5 Pods) and wait for 10 seconds. Then 80% and another 10 seconds, and finally the canary process will complete and all Pods in the Rollout will match the new definition.

A cognitive hurdle I had to get over here is to figure out why we only have a single Pod spec. After all, if we’re defining a canary pattern, shouldn’t there be separate production and canary deployments? And of course, this is the beauty and simplicity of the Argo Rollout.

Every rollout starts as a canary, and eventually becomes production.

We’ll see this in a moment, when the first time we create this object we just skip to having all of our Pods running the rollouts-demo:blue container, but when we perform the first change, we’ll see the canary logic in action.

Okay, next we need a Service object so we can access the workload. This is just a plain old regular LoadBalancer we’ll save as service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: rollouts-demo
spec:
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http
  selector:
    app: rollouts-demo
  type: LoadBalancer

Once we’ve applied both of these objects to our cluster, we can watch the status of our Rollout object with this command:

kubectl argo rollouts get rollout rollouts-demo --watch

Because this is the initial creation of the object, we immediately scale up to 100% of the replicas running the rollouts-demo:blue container. Remember - the canary logic is only applied to updates, not to the initial creation.

I mentioned earlier that the demo wep app supplied by Argo Rollouts is actually very good, and that’s because it provides a very nice visualisation of the requests being made by a web browser and the version of the Pod that’s serving them.

Grab the external IP of the rollouts-demo service with kubectl get svc and hopefully, you’ll see something like this:

Updating a Rollout

Now it’s time to do our first update! Just like with a Deployment object, a Rollout is managing versions of our Pods using a ReplicaSet object. Right now we just have a single ReplicaSet, and if we make a change, a new ReplicaSet will be created. So let’s patch our Rollout object and change the container image:

kubectl argo rollouts set image rollouts-demo \
  rollouts-demo=argoproj/rollouts-demo:yellow

This is where the Rollout logic comes in. The update will be progressively applied based on the logic we specified earlier. So first we’ll get a new ReplicaSet that will represent 20% of the total Pods. And we’ll pause there, requiring some manual intervention to proceed.

If you’re still running the previous watch command, you can see the updated state of the Rollout:

From this detail we can also see that our Rollout is at step 1 of 8, and is currently paused.

Jump back into your web browser, and you should eventually start to see the occasional request being served by a yellow Pod instead of a blue one. (Note, you may need to reload the page if it gets “stuck” making requests to the same Pods over and over again)

Like I said, a pause step with no duration defined will just remain paused indefinitely, so we must promote the rollout for it to continue to the next step:

kubectl argo rollouts promote rollouts-demo

Now we can observe the Rollout continue through the rest of its defined canary steps, slowly increasing the weight of the update until finally all Pods are running the new version. You can observe this in the output of kubectl argo rollouts get, but it’s much prettier to watch it on the demo web app:

Aborting a Rollout

The canary pattern is of course about letting us try an update with a small subset of production traffic. So when we’re at the manual intervention stage, we can abort the rollout instead of promoting it, which will return the Rollout to its previous state.

Give this a try yourself, by first updating from the yellow container to the red one:

kubectl argo rollouts set image rollouts-demo \
  rollouts-demo=argoproj/rollouts-demo:red

At this point you’ll have a canary running the red version (weighted at about 20%). Run the following command to abort, rather than promote, this rollout:

kubectl argo rollouts abort rollouts-demo

Now you can watch everything rollback to the previous version.

This, however, puts our Rollout in a degraded state. This is the definition of an abort as opposed to a rollback, and we can see this detail in the watch view:

To fix this we need to “re-declare” the state we want to match the state we currently have. If our code specified the rollouts-demo:yellow container we could simply re-apply the object. In our case, it’s quicker to patch the object again:

kubectl argo rollouts set image rollouts-demo \
  rollouts-demo=argoproj/rollouts-demo:yellow

No actual changes to Pods are required because we’re already running 100% yellow containers, we’re just reconciling the current state of the cluster with what should be running. This means the state of the Rollout will immediately change to healthy.

Summary

This has been a very short tour of Argo Rollouts, where really we’ve just demonstrated how the Rollout object serves as a more advanced drop-in replacement for a Deployment. But by doing this, hopefully I’ve helped demystify how this project works, and you can start to appreciate how useful it can be.

Stay tuned for the final stop on our journey through the Argo project - Argo Events!

What are Argo Rollouts?