Volume Mount Issues in AWS EKS Cluster due to driver version compatibility

When managing Kubernetes workloads in Amazon EKS, encountering errors during the mounting of EFS volumes can disrupt application functionality. Recently, we faced an issue where an application pod failed to mount an EFS volume due to CSI driver-related errors. This blog outlines the problem, investigation, resolution steps, and tips for avoiding similar issues in the future.

Problem Statement

Team reported that their application pod was failing to mount an EFS volume in their EKS cluster. The error logs pointed to issues with the CSI drivers, showing messages like "connection refused" and "no such file or directory."

Error Logs

  • Pod Scheduling: Successfully assigned extract-prod/ocr-12345678-66jg2 to ip-ip.us-west-1.compute.internal.

  • Mount Failure:

    • MountVolume.SetUp failed for volume "data-pv": Connection error to /var/lib/kubelet/plugins/efs.csi.aws.com/csi.sock with "connection refused".

    • MountVolume.SetUp failed for volume "extract-secrets": Connection error to /var/lib/kubelet/plugins/csi-secrets-store/csi.sock with "no such file or directory".

Investigation Findings

  1. Cluster Details:

    • The client’s EKS cluster: arn:aws:eks:us-west-1:accountid:cluster/prod_eks_master.

    • Kubernetes version: v1.29.

  2. EFS Addon Status:

    • The EFS addon was in an UPDATE_FAILED state.
  3. Driver Versions:

    • EFS CSI Driver: aws-efs-csi-driver:v2.1.4.

    • Secrets Store CSI Driver: An outdated version was installed.

How the Problem Was Solved

1. Updated CSI Drivers

  • Upgraded the aws-efs-csi-driver from v2.1.4 to v2.1.6 using the AWS Console.

  • Upgraded the secrets-store-csi-driver to v1.4.6 for compatibility.

  • Verified the updates using:

  •   kubectl get pods -n kube-system -l app=efs-csi-controller
    

2. Checked Addon Status

  • Confirmed that the EFS addon transitioned from UPDATE_FAILED to ACTIVE after the updates.

3. Validated the Fix

  • Monitored pod logs using:

  •    kubectl logs ocr-12345678-66jg2 -n extract-prod
    
  • Verified that no further mount errors occurred.

Outcome

After updating the CSI drivers, the client’s application pod successfully mounted the EFS volume without requiring a restart of worker nodes—an important consideration for their production environment.

Tips for Avoiding Similar Issues

  1. Regular Updates:

    • Keep CSI drivers updated to their latest stable versions.
  2. Monitor Addon Status:

    • Regularly check addon statuses in the AWS Management Console to avoid UPDATE_FAILED states.

By following these best practices, you can minimize disruptions and ensure a smooth experience when using EFS volumes in your Kubernetes workloads.

0
Subscribe to my newsletter

Read articles from Animesh Srivastava directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Animesh Srivastava
Animesh Srivastava