Kubernetes Cluster Upgrade: Achieving Zero Downtime in Production

Kubernetes releases a new version every three months, and it only supports the latest three versions at any given time. Staying up to date is essential for security, stability, and new feature enhancements.

๐Ÿ”— Latest Kubernetes Versions: Kubernetes Releases
As of March 2025, the latest Kubernetes version is 1.32, meaning Kubernetes officially supports 1.32, 1.31, and 1.30.

In this article, letโ€™s explore how to upgrade an Amazon EKS cluster (AWS Managed Kubernetes) while ensuring zero downtime in production.

Key Considerations Before Upgrading

๐Ÿ”น Cordon Nodes โ€“ Prevent new workloads from being scheduled on nodes during the upgrade.
๐Ÿ”น Communicate with the Application Team โ€“ Ensure they avoid deployments during the upgrade.
๐Ÿ”น Review Kubernetes Release Notes & API Changes โ€“ If deprecated APIs arenโ€™t updated, workloads may fail post-upgrade.
๐Ÿ”น No Downgrade Option โ€“ AWS does not support rolling back an upgrade, so proceed with caution.
๐Ÿ”น Ensure Version Consistency โ€“ The Control Plane, Node Groups, Kubelet, and Cluster Autoscaler must be on the same version.
๐Ÿ”น Upgrade Lower Environments First โ€“ Test the upgrade in Dev/Staging, monitor for at least a week, then proceed to Production.

Step-by-Step EKS Upgrade Process

1๏ธโƒฃ Upgrade the Control Plane (โณ ~30 mins)

  • The EKS Control Plane does not upgrade automatically.

  • Upgrade using AWS Console or eksctl.

  • AWS manages the Control Plane HA, DR, Security, and API Requests, but the upgrade needs manual intervention.

2๏ธโƒฃ Upgrade Node Groups / Nodes / Fargate (โณ 2-3 Hours, Depending on Node Count)

  • Managed Node Groups follow a rolling update (one node at a time).

  • If using Custom Launch Templates or Custom AMIs, you must update them manually.

  • Ensure new nodes have the same labels and taints as older ones to avoid scheduling issues.

3๏ธโƒฃ Upgrade Kubernetes Add-ons (VPC CNI, Kube-Proxy, CoreDNS, etc.)

  • Add-ons ensure networking, DNS resolution, and API communication function properly.

  • Use eksctl utils update-cluster to update EKS-managed add-ons.


How to Test the Upgrade?

โœ”๏ธ Run Functional Tests โ€“ Validate application performance and stability.
โœ”๏ธ Verify Logging & Monitoring โ€“ Check CloudWatch, Prometheus, and other monitoring solutions.
โœ”๏ธ Confirm Autoscaling & Networking โ€“ Ensure pods are scaling correctly and network policies work as expected.
โœ”๏ธ Test Rollbacks (If Needed) โ€“ If issues arise, be prepared to revert workloads to the previous version.


Best Practices for EKS Upgrades

โœ… Always upgrade Control Plane โ†’ Node Groups โ†’ Add-ons in order.
โœ… Keep lower environments (Dev/Staging) at least one week ahead of Production.
โœ… Validate ingress controllers, network policies, and storage compatibility before upgrading.
โœ… Upgrade during a planned maintenance window to avoid unexpected disruptions.

Refer AWS Documentation: https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
https://gist.github.com/iam-veeramalla/7e32999189f4aa9064334d1d27bd877c


Upgrading Kubernetes is a critical but manageable process with proper planning. Have you recently upgraded your EKS cluster? Share your insights and challenges in the comments! ๐Ÿš€

#AWS #Kubernetes #EKS #Cloud #DevOps #K8sUpgrade #KubernetesUpgrades

3
Subscribe to my newsletter

Read articles from Mahesh Velicheti directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mahesh Velicheti
Mahesh Velicheti

With over 9 years of experience in the IT industry, I have built a career centered on driving innovation in DevOps and Cloud Engineering. My journey with Tata Consultancy Services was marked by delivering cutting-edge automation solutions and enhancing cloud service delivery, leveraging tools like Terraform and other infrastructure-as-code technologies. Through strategic process optimization, I contributed to elevating operational efficiency, streamlining workflows, and achieving consistent service excellence.