Nginx Ingress Controller Upgrade

This is what I planned when I thought about this for my current company and team and our infrastructure. Funnily, I was told later that we had to upgrade only standalone nginx instances and not the nginx ingress controller. But I still kept this plan in case it comes in handy some day ;) :)

Google Docs copy of this post

Things to ensure:

  • Understand if you want to upgrade the nginx too or not. Look at compatibility between the nginx ingress controller versions and the nginx versions

  • Anything and everything related to Nginx Upgrade in general, in the case of Standalone Nginx. Refer Nginx Upgrade or the Nginx Upgrade post

  • Everything works as is and smoothly, with the latest version

    • Metrics to understand everything is working as is or better, with no degradation in performance and NO errors. At least NO new errors. For any existing / old errors, while we have an older version - let’s ensure we fix those or get rid of them or ignore them. Preferably fix them or get rid of them (in case of any dead code related errors)
  • Do Load Testing to check performance and do benchmarking. Do it on both the new version and the old version and check the difference in numbers (performance numbers) when it comes to performance. Ensure no issues like memory leaks etc or unnecessary huge increase in resource usage - CPU, RAM, Disk, and Network too

  • As minimal downtime as possible - preferably 0 downtime. Since nginx and the nginx ingress controllers are stateless services, 0 downtime should be possible. More like, MUST be possible. Through rolling update deployment strategy. And automation and observability helps here

    • Metrics to understand any possible downtime - from the client’s perspective. The client can be a human, or a service too. So, get metrics from the client perspective too (usage perspective) and NOT just server perspective
  • Smooth and Easy upgrade. It should be easy to do the upgrade. Simple and easy. Automation helps here

  • Everything automated. Any and all steps automated

  • No gaps or space for human errors to occur. Testing ensures this and automation helps here

  • Thorough testing before doing upgrade

    • In testing environments and staging (which is close to production environment)
  • Understand the criticality of each of the nginx ingress controller instances

    • This will help us to rollout the upgrade for less critical ones first and check / test for any issues / errors / problems / bugs

    • This will help us form a plan of - in which order we do the upgrade

  • Smooth and Easy downgrade. It should be easy to downgrade. Simple and easy. Automation helps here

  • Have all kinds of data related to the existing nginx and nginx ingress controllers and the new nginx and the new ingress controller version we are going to upgrade to

    • Find Versions of all the existing nginx ingress controllers

      • This probably involves two versions - one is the nginx ingress controller itself and then the nginx. Since nginx is also available as a standalone software. Understand any nuances here and understand if it’s just one version - the version of the nginx ingress controller or if different versions of nginx ingress controller can be used with different versions of nginx (understand compatibility between nginx ingress controller and nginx)
    • Find Version of the nginx ingress controller we are going to upgrade to

      • This probably involves two versions - one is the nginx ingress controller itself and then the nginx. Since nginx is also available as a standalone software. Understand any nuances here and understand if it’s just one version - the version of the nginx ingress controller or if different versions of nginx ingress controller can be used with different versions of nginx. So, check this clearly (understand compatibility between nginx ingress controller and nginx)

      • Choose this carefully and meticulously. We need a new or latest version but also a stable version and something that has some amount of long term support

      • Ensure the stability of the new version by checking any data around it - issues raised, including security issues, performance issues, correctness issues.

    • Check compatibility between the versions of the different softwares involved at play over here

      • Nginx Ingress Controller version

      • Kubernetes version (API version, control plane version etc)

      • Nginx version

        • Versions of the Nginx plugins

        • Versions of the Nginx Modules

      • Nginx Ingress Controller Image’s Base image Operating System version

        • Compatibility between nginx and the base image operating system version

        • Compatibility between ingress controller and the base image operating system version

      • Helm Chart version

    • Difference between the older versions (the ones we have, the existing versions) and the new version we want to upgrade to

      • This involves both the nginx ingress controller version and the nginx version
    • Also, check the difference between the older versions (the ones we have, the existing versions) and the newer versions (a few recent ones if not all) apart from just the one we want to upgrade to, just to understand what kind of changes are happening - to be aware

    • Data around the nginx ingress controllers

      • How many ingresses (ingress configurations) are there?

      • How many nginx ingress controllers are there?

      • Who uses the nginx ingress controllers?

      • Who manages the nginx ingress controllers?

      • How many ingresses are being managed by each of the nginx ingress controllers?

      • Are there any other ingress controllers other than nginx? Are they managing any ingresses?

      • Understand how the ingress is connected to the ingress controller using configuration

        • This can be through annotations - which mentions ingress class name

        • This can be through ingress class name field in the spec

        • [EXTRA] Try to change the ingresses too if they use annotations to mention ingress class name and don’t have any ingress class name field in the spec. This is important for the future because depending on annotations to mention critical information is bad

      • Check how many default ingress controllers are there

        • [EXTRA] Ensure there’s only one default ingress controller. Or, there are no default ingress controllers. Maybe no default ingress controllers is a good idea - that way, users are required to put a particular ingress class name or else their ingress will never be used. But for this to happen, we need to fill up any missing fields in existing ingresses which map to the default since they don’t have any annotations to mention the ingress class name, or any ingress class name spec field
      • Understand what features of the nginx ingress controller we use - this refers to both the ingress controller and the nginx

        • For example, nginx has features to run plugins to extend nginx, using lua programming language source code etc. Check if any nginx plugins are being used

        • nginx has features to extend nginx using nginx modules, written in C programming language. Check if any nginx modules are being used

        • Understand any customisations done to nginx, any custom features used in nginx etc

      • Understand what features of the ingress resource we use

      • Understand the ingress resource version we use. Version as in - API Group and it’s Version, that is, the apiVersion field in Kubernetes resources. For example networking.k8s.io API Group and v1 Version, under which we have Ingress or ingress resource, that is, kind field in Kubernetes resources

  • Metrics around the nginx ingress controllers

    • How much traffic is currently coming

      • Requests per second
    • Network usage of the nginx - inbound and outbound traffic

    • Resource Usage apart from Network Usage - CPU and RAM. And any Disk Usage

      • Ensure that there’s NO disk usage (internal/container/ephemeral/volatile or non-volatile/external-disk etc) proportional to the traffic - as much as possible. Ideally the disk usage should stay almost constant, given it’s a stateless service. For logs etc, it should log to standard output and that should be enough. No log files etc

Modern Trends

  • Understand the modern trends. For the future

    • For example, people are moving away from Ingress resource and Ingress Controllers to Gateway API

      • [EXTRA] We can see if we need Gateway API or if we want it, and accordingly see if we can / want to migrate the Ingress resources to Gateway API resources
0
Subscribe to my newsletter

Read articles from Karuppiah Natarajan directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Karuppiah Natarajan
Karuppiah Natarajan

I like learning new stuff - anything, including technology. I love tinkering with new tools, systems and services, especially open source projects