What are Argo Rollouts?


Next up in my quest to Learn All The Things™ on the CNCF graduated projects page, we’re going to take a look at a lesser known Argo project: Argo Rollouts. In a nutshell, Argo Rollouts are a drop-in replacement for the Deployment
object that provide much more automation of common progressive deployment patterns such as Canary and Blue-Green. Rollouts can also optionally integrate with ingress controllers and service meshes, and can even query and interpret metrics via APIs to drive their autonomous behaviour.
Deployment Strategies
The Kubernetes Deployment object is probably the resource we use most often, at least until we start building more advanced clusters that leverage service meshes. It’s a rock-solid native Kubernetes object that helps us declare what a set of Pods
should look like that run a workload. It provides the famous Kubernetes control loop that keeps the Pods
we’ve declared running, and it also offers some basic functionality for safely updating a workload.
Recall that a Deployment
, under the hood, is managing ReplicaSets
, which you can think of as versions of our workload configuration. When we make a change to that configuration, a new ReplicaSet
is created, and a previous ReplicaSet
is ultimately deprecated. Deployments
attempt to do this safely using one of two methods:
RollingUpdate
(the default): The newReplicaSet
is gradually scaled up to the desired number ofPods
, as the oldReplicaSet
is gradually scaled down. We can also influence this type of update by limiting how manyPods
we’ll tolerate as unavailable (withMaxUnavailable
) and how many additionalPods
we will allow during the update (withMaxSurge
).Recreate
: With this strategy, the entire existingReplicaSet
is scaled down and terminated before the new one is created. This is sometimes helpful if you want a clean cut off of traffic between different versions of your workload.
Often when we’re first trying Kubernetes, we learn how to implement versions of the canary and blue-green patterns by combining multiple Deployments
with a Service
object.
For example, we can run a blue Deployment
and a green Deployment
, and switch between them easily with a Service
selector. Or we can run a canary Deployment
with a smaller number of Pods
, and let the Service
object select this along with a larger production Deployment
. But these techniques don’t scale well, and they require constant manual intervention to manage. This is where Argo’s automation can help.
The Rollout Object
Argo provides us with a new custom resource definition (CRD): the Rollout
.
Essentially this object combines everything we can declare in a Deployment
object with a much more advanced strategy definition. Within the strategy we can now describe the steps required to successfully rollout updates using the canary or blue-green patterns, including traffic splitting and approval steps. Let’s walk through a basic example to see how this works!
Prerequisites
To follow along, you’ll need access to a Kubernetes cluster. I’m normally a fan of Kind, or even Minikube, but when writing this post I struggled to get local forwarding of the LoadBalancer
to work reliably enough to actually demonstrate traffic splitting. You might have more success than me and you’re welcome to try! But full disclosure, I spun up a GKE cluster in the end.
Installing Argo Rollouts
To set up Argo Rollouts we’ll create a namespace for the Argo controller, and we’ll install the other CRDs we need:
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
We’ll also install the Rollouts plugin for kubectl
, which will give us access to the kubectl argo rollouts
sub-commands. You can obtain this from the releases page, or if you’re using Homebrew just run:
brew install argoproj/tap/kubectl-argo-rollouts
Creating a Rollout
We’re going to create a Rollout
object that uses the rather excellent Argo Rollouts web app. This app gives us a really nice visualisation of what’s happening as we release or rollback updates. We’ll also create a LoadBalancer
object so we can access the app in a browser. Let’s start by creating the rollout.yaml
file below:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: rollouts-demo
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 20
- pause: {}
- setWeight: 40
- pause: {duration: 10}
- setWeight: 60
- pause: {duration: 10}
- setWeight: 80
- pause: {duration: 10}
revisionHistoryLimit: 2
selector:
matchLabels:
app: rollouts-demo
template:
metadata:
labels:
app: rollouts-demo
spec:
containers:
- name: rollouts-demo
image: argoproj/rollouts-demo:blue
ports:
- name: http
containerPort: 8080
protocol: TCP
resources:
requests:
memory: 32Mi
cpu: 5m
As you can see, most of this spec looks very much like a Deployment
object. The big difference is the strategy
section, which is specific to the Rollout
CRD. In this section we specify the canary
pattern, and then define the steps
that we want for a successful rollout as a list. These are basically the automation instructions the controller will follow when we want to rollout an update.
First we set the weight of the canary to
20
. In other words, we ask for 20% of the availablePod
replicas to match the canary definition. Elsewhere in the spec we can see there are 5Pod
replicas, so 1 of them will match the canary. Next we have an emptypause
definition, which means an indefinite pause; in other words, manual intervention will be required here to promote the rollout and continue with the next steps.We then proceed with the rest of the steps in the canary. We set the weight to 40% (2 of 5
Pods
), and wait for 10 seconds. Then we set the weight to 60% (3 of 5Pods
) and wait for 10 seconds. Then 80% and another 10 seconds, and finally the canary process will complete and all Pods in the Rollout will match the new definition.
A cognitive hurdle I had to get over here is to figure out why we only have a single Pod
spec. After all, if we’re defining a canary pattern, shouldn’t there be separate production and canary deployments? And of course, this is the beauty and simplicity of the Argo Rollout
.
Every rollout starts as a canary, and eventually becomes production.
We’ll see this in a moment, when the first time we create this object we just skip to having all of our Pods
running the rollouts-demo:blue
container, but when we perform the first change, we’ll see the canary logic in action.
Okay, next we need a Service object so we can access the workload. This is just a plain old regular LoadBalancer we’ll save as service.yaml
:
apiVersion: v1
kind: Service
metadata:
name: rollouts-demo
spec:
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: rollouts-demo
type: LoadBalancer
Once we’ve applied both of these objects to our cluster, we can watch the status of our Rollout
object with this command:
kubectl argo rollouts get rollout rollouts-demo --watch
Because this is the initial creation of the object, we immediately scale up to 100% of the replicas running the rollouts-demo:blue
container. Remember - the canary logic is only applied to updates, not to the initial creation.
I mentioned earlier that the demo wep app supplied by Argo Rollouts is actually very good, and that’s because it provides a very nice visualisation of the requests being made by a web browser and the version of the Pod
that’s serving them.
Grab the external IP of the rollouts-demo service with kubectl get svc
and hopefully, you’ll see something like this:
Updating a Rollout
Now it’s time to do our first update! Just like with a Deployment
object, a Rollout
is managing versions of our Pods
using a ReplicaSet
object. Right now we just have a single ReplicaSet
, and if we make a change, a new ReplicaSet
will be created. So let’s patch our Rollout
object and change the container image:
kubectl argo rollouts set image rollouts-demo \
rollouts-demo=argoproj/rollouts-demo:yellow
This is where the Rollout
logic comes in. The update will be progressively applied based on the logic we specified earlier. So first we’ll get a new ReplicaSet
that will represent 20% of the total Pods. And we’ll pause there, requiring some manual intervention to proceed.
If you’re still running the previous watch command, you can see the updated state of the Rollout
:
From this detail we can also see that our Rollout is at step 1 of 8, and is currently paused.
Jump back into your web browser, and you should eventually start to see the occasional request being served by a yellow Pod
instead of a blue one. (Note, you may need to reload the page if it gets “stuck” making requests to the same Pods
over and over again)
Like I said, a pause
step with no duration defined will just remain paused indefinitely, so we must promote the rollout for it to continue to the next step:
kubectl argo rollouts promote rollouts-demo
Now we can observe the Rollout
continue through the rest of its defined canary steps, slowly increasing the weight of the update until finally all Pods
are running the new version. You can observe this in the output of kubectl argo rollouts get
, but it’s much prettier to watch it on the demo web app:
Aborting a Rollout
The canary pattern is of course about letting us try an update with a small subset of production traffic. So when we’re at the manual intervention stage, we can abort the rollout instead of promoting it, which will return the Rollout
to its previous state.
Give this a try yourself, by first updating from the yellow container to the red one:
kubectl argo rollouts set image rollouts-demo \
rollouts-demo=argoproj/rollouts-demo:red
At this point you’ll have a canary running the red version (weighted at about 20%). Run the following command to abort, rather than promote, this rollout:
kubectl argo rollouts abort rollouts-demo
Now you can watch everything rollback to the previous version.
This, however, puts our Rollout
in a degraded state. This is the definition of an abort as opposed to a rollback, and we can see this detail in the watch view:
To fix this we need to “re-declare” the state we want to match the state we currently have. If our code specified the rollouts-demo:yellow
container we could simply re-apply the object. In our case, it’s quicker to patch the object again:
kubectl argo rollouts set image rollouts-demo \
rollouts-demo=argoproj/rollouts-demo:yellow
No actual changes to Pods
are required because we’re already running 100% yellow containers, we’re just reconciling the current state of the cluster with what should be running. This means the state of the Rollout
will immediately change to healthy.
Summary
This has been a very short tour of Argo Rollouts, where really we’ve just demonstrated how the Rollout object serves as a more advanced drop-in replacement for a Deployment
. But by doing this, hopefully I’ve helped demystify how this project works, and you can start to appreciate how useful it can be.
Stay tuned for the final stop on our journey through the Argo project - Argo Events!
Subscribe to my newsletter
Read articles from Tim Berry directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
