Debugging and Forensics: CSI Mode for Your Cluster


Kubernetes is great—until something goes wrong. Then, it turns into a black box of cryptic failures, disappearing logs, and misbehaving workloads that refuse to explain themselves.
Most debugging attempts follow the same cycle:
Run
kubectl get pods
and squint at the output.Try
kubectl describe pod
and pretend you understand what’s happening.Start tailing logs, praying for an obvious error message.
Give up and restart the pod, hoping it magically fixes itself.
But Kubernetes forensics doesn’t have to be an unsolvable crime scene. If you know the right kubectl commands, you can trace, diagnose, and fix problems like a cluster detective—without resorting to a blind kubectl delete pod
.
1. Getting Inside a Running Pod: The "SSH Equivalent"
Need to poke around inside a container? kubectl exec
is your backdoor into the running workload.
kubectl exec -it my-pod -- /bin/sh
-i
→ Keeps the session interactive.-t
→ Allocates a TTY (so it doesn’t look like garbage).-- /bin/sh
→ Opens a shell inside the container.
If the container is running Alpine Linux, it probably needs:
kubectl exec -it my-pod -- /bin/ash
For BusyBox-based containers:
kubectl exec -it my-pod -- /bin/busybox sh
If you don’t know what shell the container has, try this:
kubectl exec -it my-pod -- sh -c 'which bash || which sh || which ash || which busybox'
One of them will work.
And if your container is multi-container, specify which one:
kubectl exec -it my-pod -c my-container -- /bin/sh
Now you’re inside the running pod, ready to explore.
2. Debugging with Logs: Reading the Cluster’s Diary
If a pod is failing, logs are your first clue. Instead of checking logs one pod at a time, you can stream logs across multiple pods at once:
kubectl logs -f -l app=my-app
This pulls logs from all pods matching the label app=my-app
, updating in real-time (-f
for follow).
Need to check logs from a pod that already crashed?
kubectl logs my-pod --previous
This retrieves logs from the last container instance before it exited.
3. Finding the Root Cause with kubectl describe
If a pod won’t start, kubectl describe
can reveal why Kubernetes is mad at you.
kubectl describe pod my-pod
Look for events at the bottom of the output. Some common failure messages:
CrashLoopBackOff → The container keeps crashing on startup.
ImagePullBackOff → Kubernetes can’t pull the image (wrong name or authentication issue).
ErrImagePull → Same as above, but the failure happened instantly.
OOMKilled → The pod ran out of memory and got terminated.
If you see OOMKilled
, your container probably needs more memory:
kubectl set resources deployment my-app --limits=memory=512Mi
This bumps the memory limit, preventing the pod from being mercilessly executed by the Kubernetes scheduler.
4. Investigating the Cluster’s Health
Pods are just the tip of the iceberg. If your cluster itself is struggling, these commands help diagnose deeper issues.
Checking the Cluster's Overall Health
kubectl cluster-info
This shows whether the API server and core services are running.
Inspecting Node Health
kubectl get nodes
kubectl describe node my-node
If a node is NotReady, it could mean:
The node is out of memory or disk space.
The kubelet process has crashed.
The node has lost network connectivity.
Check if the node is running out of resources:
kubectl top node
If CPU or memory is maxed out, you may need to scale up your cluster.
Checking for Failing System Components
kubectl get componentstatuses
This shows the health of core Kubernetes services like the scheduler, controller manager, and etcd.
5. Debugging a Pod Before It Even Starts
If a pod never even reaches the "Running" state, you can spin up a temporary debug container to investigate its environment.
kubectl run debug-shell --rm -it --image=ubuntu-- /bin/sh
This gives you a temporary Ubuntu container inside the same namespace as your app, letting you inspect networking, DNS resolution, and environment variables before the real pod launches.
6. Port Forwarding: Expose Internal Services for Debugging
Some services are only accessible inside the cluster. If you need to debug a database or API that’s locked down, you can use port forwarding:
kubectl port-forward svc/my-service 8080:80 -n my-namespace
This maps port 80 inside the cluster to port 8080 on your local machine. Now you can access the service by visiting http://localhost:8080
.
Need to connect to a database inside the cluster?
kubectl port-forward pod/my-db-pod 5432:5432 -n my-namespace
Now you can connect to localhost:5432
as if the database was running locally.
7. Killing a Stuck Pod the Right Way
Sometimes, a pod refuses to die. If kubectl delete pod
just sits there doing nothing, try force deletion:
kubectl delete pod my-pod --force --grace-period=0
This bypasses normal termination and immediately removes the pod from the API server.
Still stuck? The issue might be on the node itself. Find the node running the pod:
kubectl get pod my-pod -o wide
SSH into the node and manually remove the pod’s data:
ssh my-node
sudo crictl pods | grep my-pod
sudo crictl stopp <pod-id>
sudo crictl rmp <pod-id>
This forcefully removes the pod at the container runtime level.
Final Thoughts
Debugging Kubernetes isn’t about guessing—it’s about methodically uncovering the truth.
Need to inspect a running container?
kubectl exec
Logs disappeared too fast?
kubectl logs --previous
Pod refuses to start?
kubectl describe pod
Cluster acting weird?
kubectl cluster-info
Service unreachable?
kubectl port-forward
Instead of randomly restarting things and hoping for the best, use the right kubectl tools to trace the problem to its root cause.
Kubernetes isn’t a mystery. It just hides its secrets well—but with the right forensic skills, you can make your cluster tell you exactly what’s wrong.
Now go forth and debug like a Kubernetes detective. 🚀
Subscribe to my newsletter
Read articles from Jaakko Leskinen directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
