Azure AKS Troubleshooting Hands-On - Pod Failing to Pull Image from Azure Container Registry (ACR)
📝Introduction
In this hands-on lab, we will guide for troubleshooting a real scenario in Azure Kubernetes Service (AKS) for a common issue: a Pod Failing to Pull Image from Azure Container Registry (ACR).
Learning objectives:
In this module, you'll learn how to:
Identify the issue
Resolve the issue
📝Log in to the Azure Management Console
Using your credentials, make sure you're using the right Region. In my case, I am using the region uksouth
in my Cloud Playground Sandbox.
📌Note: You can also use the VSCode tool or from your local Terminal to connect to Azure CLI
More information on how to set it up is at the link.
📝Prerequisites:
Update to PowerShell 5.1, if needed.
Install .NET Framework 4.7.2 or later.
Visual Code
Web Browser (Chrome, Edge)
Azure CLI installed
Azure subscription
Docker installed
📝Setting an Azure Storage Account to Load Bash or PowerShell
- Click the Cloud Shell icon
(>_)
at the top of the page.
- Click PowerShell.
- Click Show Advanced Settings. Use the combo box under Cloud Shell region to select the Region. Under Resource Group and Storage account(It's a globally unique name), enter a name for both. In the box under File Share, enter a name. Click ***Create storage (***if you don't have any yet).
📝Create an AKS Cluster
Create an AKS cluster using the
az aks create
command, but before storing the name of the cluster inside a variable named CLUSTERNAME.Copy
CLUSTERNAME=<AKSClusterName> az aks create -n $CLUSTERNAME -g $RG --node-vm-size Standard_D2s_v3 --node-count 2 --generate-ssh-keys
📝 Connect to AKS Cluster
Use the Azure Cloud Shell to check your AKS Cluster resources, by following the steps below:
Go to Azure Dashboard, and click on the Resource Group created for this Lab, looking for your AKS Cluster resource.
On the Overview tab, click on Connect to your AKS Cluster**.**
-
A new window will be opened, so you only need to open the Azure CLI and run the following commands:
az login
az account set subscription <your-subscription-id>
az aks get-credentials -g <nameRersourceGroup> -n <nameAKSCluster> --overwrite-existing
After that, you can run some Kubectl commands to check the default AKS Cluster resources.
📝Simulate the Issue:
Deploy a Sample Application: Create a deployment YAML file (
nginx-deployment.yaml
) with an image from ACR:apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: <your-acr-name>.azurecr.io/nginx:latest
Apply the Deployment:
kubectl apply -f nginx-deployment.yaml
📝Identify the Issue:
Check Pod Status:
kubectl get pods
Describe the Pod:
kubectl describe pod <pod-name>
Look for events indicating why the pod is not starting. You might see messages like “ErrImagePull” or “ImagePullBackOff”.
📝Troubleshoot the Issue:
Check ACR Authentication: Ensure the AKS cluster has access to the ACR. You can use Azure AD integration or service principal:
az aks update -n <AKSCluster-name> -g <ResourceGroup-name> --attach-acr <your-acr-name>
Check ACR Firewall Rules: Ensure that the ACR firewall rules allow access from the AKS cluster.
Check Image Name and Tag: Verify that the image name and tag are correct in the deployment YAML file.
📝Resolve the Issue:
Reapply the Deployment:
kubectl apply -f nginx-deployment.yaml
Check Pod Status Again:
kubectl get pods
Describe the Pod:
kubectl describe pod <pod-name>
Ensure there are no error messages and the pod is running.
📌Note - At the end of each hands-on Lab, always clean up all resources previously created to avoid being charged.
Congratulations — you have completed this hands-on lab covering the basics of Troubleshooting an AKS Pod Failing to Pull Image from Azure Container Registry (ACR).
Thank you for reading. I hope you understood and learned something helpful from my blog.
Please follow me on Cloud&DevOpsLearn and LinkedIn, franciscojblsouza
Subscribe to my newsletter
Read articles from Francisco Souza directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Francisco Souza
Francisco Souza
I have over 20 years of experience in IT Infrastructure and currently work at Microsoft as an Azure Kubernetes Support Engineer, where I support and manage the AKS, ACI, ACR, and ARO tools. Previously, I worked as a Fault Management Cloud Engineer at Nokia for 2.9 years, with expertise in OpenStack, Linux, Zabbix, Commvault, and other tools. In this role, I resolved critical technical incidents, ensured consistent uptime, and safeguarded against revenue loss from customers. Additionally, I briefly served as a Technical Team Lead for 3 months, where I distributed tasks, mentored a new team member, and managed technical requests and activities raised by our customers. Previously, I worked as an IT System Administrator at BN Paribas Cardif Portugal and other significant companies in Brazil, including an affiliate of Rede Globo Television (Rede Bahia) and Petrobras SA. In these roles, I developed a robust skill set, acquired the ability to adapt to new processes, demonstrated excellent problem-solving and analytical skills, and managed ticket systems to enhance the customer service experience. My ability to thrive in high-pressure environments and meet tight deadlines is a testament to my organizational and proactive approach. By collaborating with colleagues and other teams, I ensure robust support and incident management, contributing to the consistent satisfaction of my customers and the reliability of the entire IT Infrastructure.