My Pod Got OOMKilled

TheTansihTheTansih
3 min read

Introduction: What Triggered This Post?

I was working with Kubernetes and encountered a real-world issue:
🚨 My pod kept crashing with an OOMKilled error even though the node had free memory.

So, I decided to debug the problem step-by-step — and what I found helped me better understand Requests, Limits, and Metrics Server in Kubernetes.


🔍 What Are Requests and Limits in K8s?

  • Request = Minimum resources a pod is guaranteed.

  • Limit = Maximum resources a pod can use.

  • If a pod uses more than its memory limit, it's terminated with OOMKilled.

  • This protects the node from crashing by isolating the faulty pod.


❗ Real-World Scenarios I Faced

Pod couldn't be scheduled due to insufficient resources

I had 2 nodes completely full. When I deployed a new pod, Kubernetes couldn’t schedule it.
It showed:
0/2 nodes are available: Insufficient cpu.

Pod crashed when memory exceeded

A pod used more memory than its limit. Result:
OOMKilled – Container was terminated because it tried to use more than allowed.
The node stayed healthy, only the pod failed.

Request vs Limit Difference

  • Request: Tells scheduler “I need at least this much”.

  • Limit: Tells kubelet “I can’t cross this much”.

➡️ Going beyond limit = crash
➡️ Going beyond request but within limit = allowed

Why only the pod is killed, not the node

Kubernetes enforces this so one misbehaving pod doesn't bring down the entire node.
Smart design 💡


📊 Metrics Server & Monitoring

What is Metrics Server?

A lightweight component that gives live resource usage of pods/nodes.
Used with kubectl top and autoscaling.

How I Installed It

kubectl apply -f metrics-server.yaml

Check if it’s running

kubectl get po -n kube-system

Check Node Metrics

kubectl top node

🧪 Memory Stress Testing – Live Experiment

I wanted to test what happens if I push a pod beyond its memory limit.

Step 1: Create a namespace

kubectl create ns mem-example

Step 2: Deploy a pod with memory request/limit

In my YAML::

resources:
  requests:
    memory: "50Mi"
  limits:
    memory: "100Mi"

Then I ran a container that tried to consume 250Mi.

Step 3: Watch behavior

kubectl get po -n mem-example kubectl top pod memory-demo -n mem-example

Result: OOMKilled

The pod was terminated. Reason: OOMKilled.
Node stayed healthy. That’s exactly how Kubernetes should behave.


📌 Lesson Learned

  • Don’t leave limits undefined — risky!

  • Use metrics-server for live debugging.

  • Always monitor behavior after deploying resource-intensive apps.


💡 My Thought

This was a small failure, but a huge learning.
I’ll keep facing and documenting these bugs as I work toward becoming a better cloud-native engineer ☁️👨‍💻

0
Subscribe to my newsletter

Read articles from TheTansih directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

TheTansih
TheTansih