📝Introduction

In this hands-on lab, we will guide for troubleshooting a real scenario in Azure Kubernetes Service (AKS) for a common issue: a Pod failing to start due to insufficient resources.

Learning objectives:

In this module, you'll learn how to:

Identify the issue
Resolve the issue

📝Log in to the Azure Management Console

Using your credentials, make sure you're using the right Region. In my case, I am using the region uksouth in my Cloud Playground Sandbox.

📌Note: You can also use the VSCode tool or from your local Terminal to connect to Azure CLI

More information on how to set it up is at the link.

📝Prerequisites:

Update to PowerShell 5.1, if needed.
Install .NET Framework 4.7.2 or later.
Visual Code
Web Browser (Chrome, Edge)
Azure CLI installed
Azure subscription
Docker installed

📝Setting an Azure Storage Account to Load Bash or PowerShell

Click the Cloud Shell icon (>_) at the top of the page.

Click PowerShell.

Click Show Advanced Settings. Use the combo box under Cloud Shell region to select the Region. Under Resource Group and Storage account(It's a globally unique name), enter a name for both. In the box under File Share, enter a name. Click ***Create storage (***if you don't have any yet).

📝Create an AKS Cluster

Create an AKS cluster using the az aks create command, but before storing the name of the cluster inside a variable named CLUSTERNAME.

Copy

  CLUSTERNAME=<AKSClusterName>
  az aks create -n $CLUSTERNAME -g $RG --node-vm-size Standard_D2s_v3 --node-count 2 --generate-ssh-keys

📝 Connect to AKS Cluster

Use the Azure Cloud Shell to check your AKS Cluster resources, by following the steps below:

Go to Azure Dashboard, and click on the Resource Group created for this Lab, looking for your AKS Cluster resource.
On the Overview tab, click on Connect to your AKS Cluster.
A new window will be opened, so you only need to open the Azure CLI and run the following commands:

az login
az account set subscription <your-subscription-id>
az aks get-credentials -g <nameRersourceGroup> -n <nameAKSCluster> --overwrite-existing

After that, you can run some Kubectl commands to check the default AKS Cluster resources.

📝Deploy the Application to AKS

Simulate the Issue:

Deploy a Sample Application: Create a deployment YAML file (nginx-deployment.yaml) with resource requests that exceed the available resources on the node:

 apiVersion: apps/v1
 kind: Deployment
 metadata:
   name: nginx-deployment
 spec:
   replicas: 1
   selector:
     matchLabels:
       app: nginx
   template:
     metadata:
       labels:
         app: nginx
     spec:
       containers:
       - name: nginx
         image: nginx:latest
         resources:
           requests:
             memory: "2Gi"
             cpu: "2"
           limits:
             memory: "2Gi"
             cpu: "2"

Apply the Deployment:

 kubectl apply -f nginx-deployment.yaml

Identify the Issue:
- Check Pod Status:

    kubectl get pods

Describe the Pod:

    kubectl describe pod <pod-name>

Look for events indicating why the pod is not starting. You might see messages like “Insufficient cpu” or “Insufficient memory”.

Troubleshoot the Issue:

Check Node Resources:

kubectl top nodes

Verify the available CPU and memory on the nodes.

Check Resources Quotas (if any):

kubectl get resourcequotas

Check Cluster Autoscaler: Ensure the cluster autoscaler is enabled and configured correctly:

  az aks show -g <nameRersourceGroup> -n <nameAKSCluster> --query "agentPoolProfiles[].enableAutoScaling"

Resolve the Issue:

Scale Up the Cluster: If the cluster autoscaler is not enabled or not sufficient, maybe manually scale up the cluster is the solution:
```
  az aks scale -g <nameRersourceGroup> -n <nameAKSCluster> --node-count <new-node-count>
```

Adjust Resource Requests: Modify the deployment YAML file to request fewer resources:

  resources:
    requests:
      memory: "1Gi"
      cpu: "1"
    limits:
      memory: "1Gi"
      cpu: "1"

Reapply the Deployment:

  kubectl apply -f nginx-deployment.yaml

Verify the Resolution:

Check Pod Status Again:

kubectl get pods

Describe the Pod:
```
  kubectl describe pod <pod-name>
```

Ensure there are no error messages and the pod is running.

Check Node Resources:
```
  kubectl top nodes
```

Verify that the nodes have sufficient resources and the pod is running smoothly.

📌Note - At the end of each hands-on Lab, always clean up all resources previously created to avoid being charged.

Congratulations — you have completed this hands-on lab covering the basics of Troubleshooting an AKS Pod failing to start due to insufficient resources.

Thank you for reading. I hope you understood and learned something helpful from my blog.

Please follow me on Cloud&DevOpsLearn and LinkedIn, franciscojblsouza

Azure AKS Troubleshooting Hands-On - Pod Failing to Insufficient Resources

Table of contents

📝Introduction

📝Log in to the Azure Management Console

📝Prerequisites:

📝Setting an Azure Storage Account to Load Bash or PowerShell

📝Create an AKS Cluster

📝 Connect to AKS Cluster

📝Deploy the Application to AKS

Troubleshoot the Issue:

Resolve the Issue:

Verify the Resolution:

Check Pod Status Again:

Subscribe to my newsletter

Francisco Souza

Francisco Souza