Yesterday, we taught our pods how to keep secrets like little spies and load their config without breaking a sweat.

Today, we’re cranking things up.

We’re giving our apps survival skills — so they can check if they’re alive, stay healthy, and even clone themselves when things get busy.

Because let’s face it: launching an app is easy — we can do it in any web server, really. Keeping it alive when things get crazy? That’s where Kubernetes shines.

In Kubernetes, apps don’t just sit there hoping for the best. They know when they’re in trouble — and clusters grow when they need to.

In Kubernetes language: today’s about probes and autoscaling.

Let’s dive in - before Kubernetes restarts these pods. 😉

Under the Hood

Let’s say your app in one of the pods crashes. As we have seen before, Kubernetes will try to restart it. That’s cool. But what if it’s running but stuck? Or too slow to respond?

Or maybe your service is doing fine—until Friday night traffic hits and suddenly you need five copies instead of one. You don’t want to babysit it every time that happens. I mean alternatively, you can use an app to send your phone alerts so you can get up in the middle of the night and fire up 5 more pods, but in reality that is not sustainable.

K8s solves both problems with:

Probes: For checking if a container is healthy or ready
Horizontal Pod Autoscaler (HPA): For scaling your app up and down based on load

Let’s see how they work.

Probes: Give me a Sign

Kubernetes uses HTTP requests or commands to ask your pod: "Are you alive? Ready? Still booting?Helllooo?"

Kubernetes supports three main types of probes:

Liveness Probe – checks if the pod is still alive. If it fails, Kubernetes restarts the pod.
Readiness Probe – checks if the pod is ready to serve traffic. If a readiness probe fails, Kubernetes will stop routing traffic to that pod — but it keeps checking periodically, and once the pod is ready again, it’s automatically brought back into the rotation.
Startup Probe – used for slow-starting apps. It gives the app extra time before liveness and readiness probes kick in. It is good to know that if a startup probe is defined, it disables liveness and readiness checks until it succeeds.

Example of HTTP-based probes:

livenessProbe:
  httpGet:
    path: /healthz
    port: 80
  initialDelaySeconds: 3
  periodSeconds: 5

readinessProbe:
  httpGet:
    path: /ready
    port: 80
  initialDelaySeconds: 5
  periodSeconds: 10

These probes hit your app endpoints periodically and act accordingly.

Autoscaling: Let the Cluster Do the Work

Now that your app knows how to check itself, we need to talk about when it is healthy but needs to do more - aka needs to scale and increase its capacity.

You can definitely do that manually with:

kubectl scale deployment nginx-deployment --replicas=3

The code above tells Kubernetes to scale the nginx-deployment to exactly 3 pods. It's a one-time instruction—Kubernetes will make sure 3 pods are running, no more, no less, until you tell it otherwise or autoscaling takes over.

But the real magic comes with something called Horizontal Pod Autoscaler (HPA):

kubectl autoscale deployment nginx-deployment \
  --cpu-percent=50 --min=2 --max=5

This tells K8s:

Watch CPU usage on the pods and scale up if average CPU > 50%
Keep 2 to 5 replicas (

The HPA controller checks every 15 seconds by default. You can also autoscale based on memory or custom metrics, but CPU is the most common.

To see it in action:

kubectl get hpa

You’ll get something like:

NAME               REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx-deployment   Deployment/nginx-deployment   35%/50%   2         5         2          1m

Lab: Add Probes and Autoscaling

In this lab, we’re going to teach Kubernetes how to monitor and expand our pods automatically.

Step 1: Add Probes to Your Deployment

First, we need to tell Kubernetes how to check if a pod is alive and ready. Edit your nginx-deployment.yaml, and under the container spec, add these two sections:

livenessProbe:
  httpGet:
    path: /
    port: 80
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /
    port: 80
  initialDelaySeconds: 5
  periodSeconds: 5

So, we just defined two probes. Let’s take a closer look at them and see what they are doing.

livenessProbe: Kubernetes will ping the pod’s root URL (/) on port 80 every 10 seconds after an initial 5-second wait. If it fails (timeout, bad response, crash, etc.), Kubernetes restarts the pod.

readinessProbe: Kubernetes will check if the pod is ready to accept traffic. If this check fails, it stops sending new traffic to that pod but keeps it alive to recover. It checks more frequently compared to the livenessProbe (every 5 seconds).

💡

These checks protect you against both dead pods and unhealthy ones.

Then let’s apply our deployment:

kubectl apply -f nginx-deployment.yaml

And just like that, Kubernetes now knows how to monitor your app health automatically.

Step 2: Enable Autoscaling

Now that our app can check if it is healthy or not, let’s let Kubernetes decide how many copies of it we need based on load.

Run this command to enable autoscaling:

kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=2 --max=4

What happens when you run this code is that now Kubernetes watches CPU usage. If CPU usage across pods goes over 50%, it adds pods. If CPU usage drops, it shrinks back down to the minimum but it always keeps at least 2 pods running, and never more than 4.

It’s basically dynamic scaling without you lifting a finger. Pretty cool, huh?

⚠️ Important: Metrics Server Needed

Autoscaling needs real-time CPU stats from your pods.

If you’re using Docker Desktop or a local cluster, you might not have metrics enabled by default.

To Install Metrics Server run this command:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Now you need to one more little thing: Create another file called components.yaml and make sure its contents are as follows:

apiVersion: v1
kind: Namespace
metadata:
  name: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - name: metrics-server
        image: k8s.gcr.io/metrics-server/metrics-server:v0.6.3
        args:
          - --cert-dir=/tmp
          - --secure-port=4443
          - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
          - --kubelet-use-node-status-port
          - --metric-resolution=15s
          - --kubelet-insecure-tls    # <--- important
        ports:
        - containerPort: 4443
          name: main-port
          protocol: TCP
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: main-port
  selector:
    k8s-app: metrics-server
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

After that, just apply it:

kubectl apply -f components.yaml

The important bit is this and it does not exist in the original:

 - --kubelet-insecure-tls   # <--- important

Why doesn’t it exist, you may ask? Well, for security reasons. This is OK for dev/test but a big no-no for production.

Let’s make sure that it is running and healthy:

kubectl get apiservices | findstr metrics

You will see that metrics server will be installed. After that, go grab yourself a cup of coffee or tea, or just wait a minute or two after installation so it can gather some data.

Then check your HPA:

kubectl get hpa

Example output:

NAME               REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx-deployment   Deployment/nginx-deployment   35%/50%   2         4         2          2m

This means, 35%/50% = your app is currently at 35% CPU; threshold is 50%, gives you the minimum and maximum pods. Kubernetes will scale up/down automatically if needed.

This is what we expect:

[ Load Low ] → 2 pods
[ Load Medium ] → 3 pods
[ Load High ] → 4 pods

When I ran this code on my computer, this is what I got (and you may get this too):

NAME               REFERENCE                     TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
nginx-deployment   Deployment/nginx-deployment   cpu: <unknown>/50%   2         4         2          22m

As you can see, my CPU is showing <unknown>:50%, which means it cannot get info from the CPU. This may mean one of two things: Either the Metrics isn’t installed/working yet, or it hasn’t started collecting enough metrics yet.

Here is exactly why it happens:

Reason	What’s happening
Metrics Server not installed	Kubernetes can't collect CPU/memory usage stats for pods
Metrics Server installed but not ready yet	It takes some time (~30-60 seconds) after installation to start scraping pod metrics
Your pod isn’t generating any meaningful CPU load	If your app is just idling (like nginx doing nothing), CPU can be so low that it shows as unknown until some threshold

💡

You can check if your Metrics Server is running with this command: kubectl get deployment metrics-server -n kube-system. If it is not running, you may need t reboot your computer as Docker Desktop clusters sometimes need it after installing Metrics Server. Or you may need to patch the deployment to allow insecure TLS with the command:

kubectl patch deployment metrics-server -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

. Only do this for dev/test clusters and not production clusters. For Production clusters, you would configure proper certificates.

Since my pod has been running for 22minutes, I suspect, that I need to simulate some load on the pod - which means it is a good time to test our autoscaling.

Let’s exec into our pod and run the following code:

kubectl exec -n dev-space -it <your-pod-name> -- /bin/sh

In the code above, we’re telling Kubernetes to find the <pod-name> inside dev-space namespace, start a shell (/bin/sh) session and stay interactive (-it), so we can type stuff and see the output live.

💡

I hear you asking What does exec-ing in your pod mean? Ok, I didn’t but I am going to tell you anyway. It means opening a terminal session inside a running container in your K8s pod. Basically you are stepping inside the app’s little (and contained) world. This way, you can check environment variables, see running processes, explore the file system, etc. It’s kind of like popping the hood of your car and fiddling with the engine while it’s running.

💡

I know there is a lot of callouts in this part but they are important. Just do not froget to get the list of your pods with kubectl get pods -n dev-space and pick the proper pod name. For example, in my case, if you just use nginx-deployment as your pod name, you will get an error because it is a deployment. You need a pod. If you are confused, just run the command and get the proper pod name and you’ll be fine.

So, exec into your pod. If all goes well, you should now see the # prompt. It means congrats, you are inside your pod. It’s not fancy, but it works. So, now run the following command to get our CPU busy - this is a simple infinite loop:

while true; do :; done

Use Ctrl+C to stop it.

💡

If somehow it hangs weirdly or you really want to nuke the pod, you can also just run the command exit to leave the shell or , worst case, just nuke the pod with kubectl delete pod -n dev-space

But, wait! Let’s do this like the cooler kids or super duper K8s Pros do. Let’s create a container for CPU stress and run it.

Bonus: Super Duper Pro Tip: CPU Stress Container

When we manually did while true; do :; done, it created a CPU spike inside that one pod, but metrics-server sometimes doesn’t catch it properly unless everything lines up and even then the pod’s container might not even properly reflect internal loop CPU to the kubelet if the loop is too "tight" (especially inside an nginx container which isn't busy to begin with).

OK, then. Now what?

Well, since we are a K8s pro now (we have been playing with it for 5 days after all), we can pull a tiny container that generates CPU stress automatically.

Instead of trying to break our nginx manually, let's deploy a tiny pod whose only job is to eat CPU for breakfast.

Create a new YAML file (let’s call it cpu-stress-deployment.yaml) and add this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpu-stress
  namespace: dev-space
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cpu-stress
  template:
    metadata:
      labels:
        app: cpu-stress
    spec:
      containers:
      - name: stress
        image: polinux/stress
        command: ["stress"]
        args: ["--cpu", "2", "--timeout", "300s"]

What this does:

Pulls a small Linux container pre-installed with the stress tool
Fires up 2 CPU workers (--cpu 2)
Runs them for 5 minutes (--timeout 300s)

Let’s apply it:

kubectl apply -f cpu-stress-deployment.yaml

Cool, now let’s run our handy command again to see the effects:

kubectl get hpa

And you should see your CPU usage.

💡

On local Docker Desktop clusters, HPA metrics may show <unknown> even if pods are properly stressed. Don’t panic—this is a known limitation when running Kubernetes inside WSL2-based Docker setups. In real cloud clusters (like EKS, AKS, or GKE), metrics work normally and autoscaling behaves as expected. In this case, it’s Kubernetes throwing a tantrum inside Docker Desktop. Maybe it’s time to switch to Linux - sometimes, you just need to accept defeat. It’s not you, it’s just the OS. 🤷‍♂️

What’s Next

If you’re still here, congratulations—you’re not just poking around Kubernetes anymore. You’re engineering it now.

Today, we taught our apps how to survive, heal, and grow without babysitting. Not bad for a few YAML edits, right?

Tomorrow, we’re switching gears a little. We’ll dig into how Kubernetes actually connects things behind the scenes—routing traffic, exposing services, and making sure your users don’t have to send smoke signals to reach your apps.

Day 6 is all about Services, Cluster IPs, and LoadBalancers. (Yes, it sounds complicated, but it’s not. We’ll make it make sense.)

See you then. Let’s keep this cluster flying high.

Day 5 - Kubernetes Without The Tears