Debugging

Introduction: What Triggered This Post?

I was working with Kubernetes and encountered a real-world issue:
🚨 My pod kept crashing with an OOMKilled error even though the node had free memory.

So, I decided to debug the problem step-by-step — and what I found helped me better understand Requests, Limits, and Metrics Server in Kubernetes.

🔍 What Are Requests and Limits in K8s?

Request = Minimum resources a pod is guaranteed.
Limit = Maximum resources a pod can use.
If a pod uses more than its memory limit, it's terminated with OOMKilled.
This protects the node from crashing by isolating the faulty pod.

❗ Real-World Scenarios I Faced

✅ Pod couldn't be scheduled due to insufficient resources

I had 2 nodes completely full. When I deployed a new pod, Kubernetes couldn’t schedule it.
It showed:
0/2 nodes are available: Insufficient cpu.

✅ Pod crashed when memory exceeded

A pod used more memory than its limit. Result:
OOMKilled – Container was terminated because it tried to use more than allowed.
The node stayed healthy, only the pod failed.

✅ Request vs Limit Difference

Request: Tells scheduler “I need at least this much”.
Limit: Tells kubelet “I can’t cross this much”.

➡️ Going beyond limit = crash
➡️ Going beyond request but within limit = allowed

✅ Why only the pod is killed, not the node

Kubernetes enforces this so one misbehaving pod doesn't bring down the entire node.
Smart design 💡

📊 Metrics Server & Monitoring

✅ What is Metrics Server?

A lightweight component that gives live resource usage of pods/nodes.
Used with kubectl top and autoscaling.

✅ How I Installed It

kubectl apply -f metrics-server.yaml

✅ Check if it’s running

kubectl get po -n kube-system

✅ Check Node Metrics

kubectl top node

🧪 Memory Stress Testing – Live Experiment

I wanted to test what happens if I push a pod beyond its memory limit.

✅ Step 1: Create a namespace

kubectl create ns mem-example

✅ Step 2: Deploy a pod with memory request/limit

In my YAML::

resources:
  requests:
    memory: "50Mi"
  limits:
    memory: "100Mi"

Then I ran a container that tried to consume 250Mi.

✅ Step 3: Watch behavior

kubectl get po -n mem-example kubectl top pod memory-demo -n mem-example

✅ Result: OOMKilled

The pod was terminated. Reason: OOMKilled.
Node stayed healthy. That’s exactly how Kubernetes should behave.

📌 Lesson Learned

Don’t leave limits undefined — risky!
Use metrics-server for live debugging.
Always monitor behavior after deploying resource-intensive apps.

💡 My Thought

This was a small failure, but a huge learning.
I’ll keep facing and documenting these bugs as I work toward becoming a better cloud-native engineer ☁️👨‍💻

My Pod Got OOMKilled