🚀 Kubernetes Debugging: Solve Cluster Issues Like a Pro

Kubernetes is great—until something goes wrong. Then, it turns into a black box of cryptic failures, disappearing logs, and misbehaving workloads that refuse to explain themselves.

Most debugging attempts follow the same cycle:

Run kubectl get pods and squint at the output.
Try kubectl describe pod and pretend you understand what’s happening.
Start tailing logs, praying for an obvious error message.
Give up and restart the pod, hoping it magically fixes itself.

But Kubernetes forensics doesn’t have to be an unsolvable crime scene. If you know the right kubectl commands, you can trace, diagnose, and fix problems like a cluster detective—without resorting to a blind kubectl delete pod.

1. Getting Inside a Running Pod: The "SSH Equivalent"

Need to poke around inside a container? kubectl exec is your backdoor into the running workload.

kubectl exec -it my-pod -- /bin/sh

-i → Keeps the session interactive.
-t → Allocates a TTY (so it doesn’t look like garbage).
-- /bin/sh → Opens a shell inside the container.

If the container is running Alpine Linux, it probably needs:

kubectl exec -it my-pod -- /bin/ash

For BusyBox-based containers:

kubectl exec -it my-pod -- /bin/busybox sh

If you don’t know what shell the container has, try this:

kubectl exec -it my-pod -- sh -c 'which bash || which sh || which ash || which busybox'

One of them will work.

And if your container is multi-container, specify which one:

kubectl exec -it my-pod -c my-container -- /bin/sh

Now you’re inside the running pod, ready to explore.

2. Debugging with Logs: Reading the Cluster’s Diary

If a pod is failing, logs are your first clue. Instead of checking logs one pod at a time, you can stream logs across multiple pods at once:

kubectl logs -f -l app=my-app

This pulls logs from all pods matching the label app=my-app, updating in real-time (-f for follow).

Need to check logs from a pod that already crashed?

kubectl logs my-pod --previous

This retrieves logs from the last container instance before it exited.

3. Finding the Root Cause with `kubectl describe`

If a pod won’t start, kubectl describe can reveal why Kubernetes is mad at you.

kubectl describe pod my-pod

Look for events at the bottom of the output. Some common failure messages:

CrashLoopBackOff → The container keeps crashing on startup.
ImagePullBackOff → Kubernetes can’t pull the image (wrong name or authentication issue).
ErrImagePull → Same as above, but the failure happened instantly.
OOMKilled → The pod ran out of memory and got terminated.

If you see OOMKilled, your container probably needs more memory:

kubectl set resources deployment my-app --limits=memory=512Mi

This bumps the memory limit, preventing the pod from being mercilessly executed by the Kubernetes scheduler.

4. Investigating the Cluster’s Health

Pods are just the tip of the iceberg. If your cluster itself is struggling, these commands help diagnose deeper issues.

Checking the Cluster's Overall Health

kubectl cluster-info

This shows whether the API server and core services are running.

Inspecting Node Health

kubectl get nodes
kubectl describe node my-node

If a node is NotReady, it could mean:

The node is out of memory or disk space.
The kubelet process has crashed.
The node has lost network connectivity.

Check if the node is running out of resources:

kubectl top node

If CPU or memory is maxed out, you may need to scale up your cluster.

Checking for Failing System Components

kubectl get componentstatuses

This shows the health of core Kubernetes services like the scheduler, controller manager, and etcd.

5. Debugging a Pod Before It Even Starts

If a pod never even reaches the "Running" state, you can spin up a temporary debug container to investigate its environment.

kubectl run debug-shell --rm -it --image=ubuntu-- /bin/sh

This gives you a temporary Ubuntu container inside the same namespace as your app, letting you inspect networking, DNS resolution, and environment variables before the real pod launches.

6. Port Forwarding: Expose Internal Services for Debugging

Some services are only accessible inside the cluster. If you need to debug a database or API that’s locked down, you can use port forwarding:

kubectl port-forward svc/my-service 8080:80 -n my-namespace

This maps port 80 inside the cluster to port 8080 on your local machine. Now you can access the service by visiting http://localhost:8080.

Need to connect to a database inside the cluster?

kubectl port-forward pod/my-db-pod 5432:5432 -n my-namespace

Now you can connect to localhost:5432 as if the database was running locally.

7. Killing a Stuck Pod the Right Way

Sometimes, a pod refuses to die. If kubectl delete pod just sits there doing nothing, try force deletion:

kubectl delete pod my-pod --force --grace-period=0

This bypasses normal termination and immediately removes the pod from the API server.

Still stuck? The issue might be on the node itself. Find the node running the pod:

kubectl get pod my-pod -o wide

SSH into the node and manually remove the pod’s data:

ssh my-node
sudo crictl pods | grep my-pod
sudo crictl stopp <pod-id>
sudo crictl rmp <pod-id>

This forcefully removes the pod at the container runtime level.

Final Thoughts

Debugging Kubernetes isn’t about guessing—it’s about methodically uncovering the truth.

Need to inspect a running container? kubectl exec
Logs disappeared too fast? kubectl logs --previous
Pod refuses to start? kubectl describe pod
Cluster acting weird? kubectl cluster-info
Service unreachable? kubectl port-forward

Instead of randomly restarting things and hoping for the best, use the right kubectl tools to trace the problem to its root cause.

Kubernetes isn’t a mystery. It just hides its secrets well—but with the right forensic skills, you can make your cluster tell you exactly what’s wrong.

Now go forth and debug like a Kubernetes detective. 🚀

Debugging and Forensics: CSI Mode for Your Cluster

Kubernetes is great—until something goes wrong. Then, it turns into a black box of cryptic failures, disappearing logs, and misbehaving workloads that refuse to explain themselves.

1. Getting Inside a Running Pod: The "SSH Equivalent"

2. Debugging with Logs: Reading the Cluster’s Diary

3. Finding the Root Cause with `kubectl describe`

4. Investigating the Cluster’s Health

Checking the Cluster's Overall Health

Inspecting Node Health

Checking for Failing System Components

5. Debugging a Pod Before It Even Starts

6. Port Forwarding: Expose Internal Services for Debugging

7. Killing a Stuck Pod the Right Way

Final Thoughts

Subscribe to my newsletter

Jaakko Leskinen

Jaakko Leskinen

Debugging and Forensics: CSI Mode for Your Cluster

Kubernetes is great—until something goes wrong. Then, it turns into a black box of cryptic failures, disappearing logs, and misbehaving workloads that refuse to explain themselves.

1. Getting Inside a Running Pod: The "SSH Equivalent"

2. Debugging with Logs: Reading the Cluster’s Diary

3. Finding the Root Cause with kubectl describe

4. Investigating the Cluster’s Health

Checking the Cluster's Overall Health

Inspecting Node Health

Checking for Failing System Components

5. Debugging a Pod Before It Even Starts

6. Port Forwarding: Expose Internal Services for Debugging

7. Killing a Stuck Pod the Right Way

Final Thoughts

Subscribe to my newsletter

Jaakko Leskinen

Jaakko Leskinen

3. Finding the Root Cause with `kubectl describe`