EKS Upgrade to v1.31 – Fixing OIDC Issues and Upgrading Node Groups

Recently, I upgraded our Amazon EKS cluster from v1.30 to v1.31, and along the way, I encountered a critical Identity Provider issue that halted the upgrade. Here's a breakdown of the issue, how I resolved it, and the best practices I followed for upgrading the cluster and node groups.

🧩 Issue Faced: OIDC Identity Provider Conflict

After triggering the upgrade:

aws eks update-cluster-version --name <cluster-name> --kubernetes-version 1.31

I received the following error:

Cluster has incorrect Identity Provider URL configuration. 
The Identity Provider URL cannot be the same as the OpenID Connect (OIDC) issuer URL.
Please fix the Identity Provider configuration before updating the cluster.

This occurs when the OIDC Identity Provider (IdP) configuration for your EKS cluster incorrectly uses the same URL for the IdP and OIDC issuer, causing a validation conflict during upgrades.

✅ Resolution

To resolve this, I disassociated the Identity Provider using the AWS CLI:

aws eks disassociate-identity-provider-config \
  --cluster-name <cluster-name> \
  --region <region-name> \
  --identity-provider-config type=oidc,name=cnext-staging-eksconfig

This removed the conflicting IdP config. I then used this command to monitor the update process:

aws eks describe-update \
  --name <cluster-name> \
  --update-id <update-id-from-previous-output> \
  --region <region-name>

Once the control plane was successfully upgraded to 1.31, I proceeded to upgrade the worker nodes.

🧱 Node Group Upgrade Strategy (Blue/Green Approach)

Rather than upgrading existing node groups in-place, I opted for a blue/green deployment pattern:

Created new managed node groups from the upgraded control plane (via AWS Console).
- These new node groups automatically inherited the latest Kubernetes version (v1.31).

Drained the old node groups to gracefully evict pods and shift workloads:

 eksctl drain nodegroup --cluster=<cluster-name> --name=<old-nodegroup-name>

Confirmed pod rescheduling to the new node groups, ensuring zero downtime and safe rollout.
Decommissioned the old node groups once everything was stable.

🧠 Key Commands & Tips

# Check kubectl versions
kubectl version
kubectl get nodes

# List all clusters
aws eks list-clusters

# Describe cluster version
aws eks describe-cluster --name <cluster-name> --query "cluster.version"

# Upgrade control plane
aws eks update-cluster-version --name <cluster-name> --kubernetes-version 1.31

# Monitor cluster upgrade
aws eks describe-cluster --name <cluster-name> --query "cluster.status"

# List all node groups
aws eks list-nodegroups --cluster-name <cluster-name>

# Drain old node groups
eksctl drain nodegroup --cluster=<cluster-name> --name=<nodegroup-name>

🎯 Takeaways

Ensure that your OIDC Identity Provider is correctly configured before control plane upgrades.
Prefer blue/green node group upgrades to reduce risk and enable zero-downtime deployments.
Use eksctl drain for smooth pod rescheduling across node groups.
Always monitor the update using AWS CLI to track progress and issues in real-time.

Have you faced similar issues during EKS upgrades? Let's connect and share our war stories! 💬

#AWS #EKS #DevOps #Kubernetes #CloudNative #Infrastructure #Terraform #OIDC #Containers

🚀 Successfully Upgraded Amazon EKS Cluster to v1.31 — Lessons Learned + Identity Provider Fix 🛠️