Day 5 - Kubernetes Without The Tears


Yesterday, we taught our pods how to keep secrets like little spies and load their config without breaking a sweat.
Today, we’re cranking things up.
We’re giving our apps survival skills — so they can check if they’re alive, stay healthy, and even clone themselves when things get busy.
Because let’s face it: launching an app is easy — we can do it in any web server, really. Keeping it alive when things get crazy? That’s where Kubernetes shines.
In Kubernetes, apps don’t just sit there hoping for the best. They know when they’re in trouble — and clusters grow when they need to.
In Kubernetes language: today’s about probes and autoscaling.
Let’s dive in - before Kubernetes restarts these pods. 😉
Under the Hood
Let’s say your app in one of the pods crashes. As we have seen before, Kubernetes will try to restart it. That’s cool. But what if it’s running but stuck? Or too slow to respond?
Or maybe your service is doing fine—until Friday night traffic hits and suddenly you need five copies instead of one. You don’t want to babysit it every time that happens. I mean alternatively, you can use an app to send your phone alerts so you can get up in the middle of the night and fire up 5 more pods, but in reality that is not sustainable.
K8s solves both problems with:
Probes: For checking if a container is healthy or ready
Horizontal Pod Autoscaler (HPA): For scaling your app up and down based on load
Let’s see how they work.
Probes: Give me a Sign
Kubernetes uses HTTP requests or commands to ask your pod: "Are you alive? Ready? Still booting?Helllooo?"
Kubernetes supports three main types of probes:
Liveness Probe – checks if the pod is still alive. If it fails, Kubernetes restarts the pod.
Readiness Probe – checks if the pod is ready to serve traffic. If a readiness probe fails, Kubernetes will stop routing traffic to that pod — but it keeps checking periodically, and once the pod is ready again, it’s automatically brought back into the rotation.
Startup Probe – used for slow-starting apps. It gives the app extra time before liveness and readiness probes kick in. It is good to know that if a startup probe is defined, it disables liveness and readiness checks until it succeeds.
Example of HTTP-based probes:
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 3
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 10
These probes hit your app endpoints periodically and act accordingly.
Autoscaling: Let the Cluster Do the Work
Now that your app knows how to check itself, we need to talk about when it is healthy but needs to do more - aka needs to scale and increase its capacity.
You can definitely do that manually with:
kubectl scale deployment nginx-deployment --replicas=3
The code above tells Kubernetes to scale the nginx-deployment to exactly 3 pods. It's a one-time instruction—Kubernetes will make sure 3 pods are running, no more, no less, until you tell it otherwise or autoscaling takes over.
But the real magic comes with something called Horizontal Pod Autoscaler (HPA):
kubectl autoscale deployment nginx-deployment \
--cpu-percent=50 --min=2 --max=5
This tells K8s:
Watch CPU usage on the pods and scale up if average CPU > 50%
Keep 2 to 5 replicas (
The HPA controller checks every 15 seconds by default. You can also autoscale based on memory or custom metrics, but CPU is the most common.
To see it in action:
kubectl get hpa
You’ll get something like:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-deployment Deployment/nginx-deployment 35%/50% 2 5 2 1m
Lab: Add Probes and Autoscaling
In this lab, we’re going to teach Kubernetes how to monitor and expand our pods automatically.
Step 1: Add Probes to Your Deployment
First, we need to tell Kubernetes how to check if a pod is alive and ready. Edit your nginx-deployment.yaml
, and under the container spec, add these two sections:
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
So, we just defined two probes. Let’s take a closer look at them and see what they are doing.
livenessProbe: Kubernetes will ping the pod’s root URL (/
) on port 80 every 10 seconds after an initial 5-second wait. If it fails (timeout, bad response, crash, etc.), Kubernetes restarts the pod.
readinessProbe: Kubernetes will check if the pod is ready to accept traffic. If this check fails, it stops sending new traffic to that pod but keeps it alive to recover. It checks more frequently compared to the livenessProbe (every 5 seconds).
Then let’s apply our deployment:
kubectl apply -f nginx-deployment.yaml
And just like that, Kubernetes now knows how to monitor your app health automatically.
Step 2: Enable Autoscaling
Now that our app can check if it is healthy or not, let’s let Kubernetes decide how many copies of it we need based on load.
Run this command to enable autoscaling:
kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=2 --max=4
What happens when you run this code is that now Kubernetes watches CPU usage. If CPU usage across pods goes over 50%, it adds pods. If CPU usage drops, it shrinks back down to the minimum but it always keeps at least 2 pods running, and never more than 4.
It’s basically dynamic scaling without you lifting a finger. Pretty cool, huh?
⚠️ Important: Metrics Server Needed
Autoscaling needs real-time CPU stats from your pods.
If you’re using Docker Desktop or a local cluster, you might not have metrics enabled by default.
To Install Metrics Server run this command:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Now you need to one more little thing: Create another file called components.yaml
and make sure its contents are as follows:
apiVersion: v1
kind: Namespace
metadata:
name: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- name: metrics-server
image: k8s.gcr.io/metrics-server/metrics-server:v0.6.3
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # <--- important
ports:
- containerPort: 4443
name: main-port
protocol: TCP
volumeMounts:
- mountPath: /tmp
name: tmp-dir
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- port: 443
protocol: TCP
targetPort: main-port
selector:
k8s-app: metrics-server
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
After that, just apply it:
kubectl apply -f components.yaml
The important bit is this and it does not exist in the original:
- --kubelet-insecure-tls # <--- important
Why doesn’t it exist, you may ask? Well, for security reasons. This is OK for dev/test but a big no-no for production.
Let’s make sure that it is running and healthy:
kubectl get apiservices | findstr metrics
You will see that metrics server will be installed. After that, go grab yourself a cup of coffee or tea, or just wait a minute or two after installation so it can gather some data.
Then check your HPA:
kubectl get hpa
Example output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-deployment Deployment/nginx-deployment 35%/50% 2 4 2 2m
This means, 35%/50% = your app is currently at 35% CPU; threshold is 50%, gives you the minimum and maximum pods. Kubernetes will scale up/down automatically if needed.
This is what we expect:
[ Load Low ] → 2 pods
[ Load Medium ] → 3 pods
[ Load High ] → 4 pods
When I ran this code on my computer, this is what I got (and you may get this too):
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-deployment Deployment/nginx-deployment cpu: <unknown>/50% 2 4 2 22m
As you can see, my CPU is showing <unknown>:50%, which means it cannot get info from the CPU. This may mean one of two things: Either the Metrics isn’t installed/working yet, or it hasn’t started collecting enough metrics yet.
Here is exactly why it happens:
Reason | What’s happening |
Metrics Server not installed | Kubernetes can't collect CPU/memory usage stats for pods |
Metrics Server installed but not ready yet | It takes some time (~30-60 seconds) after installation to start scraping pod metrics |
Your pod isn’t generating any meaningful CPU load | If your app is just idling (like nginx doing nothing), CPU can be so low that it shows as unknown until some threshold |
kubectl get deployment metrics-server -n kube-system
. If it is not running, you may need t reboot your computer as Docker Desktop clusters sometimes need it after installing Metrics Server. Or you may need to patch the deployment to allow insecure TLS with the command: kubectl patch deployment metrics-server -n kube-system --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
. Only do this for dev/test clusters and not production clusters. For Production clusters, you would configure proper certificates.Since my pod has been running for 22minutes, I suspect, that I need to simulate some load on the pod - which means it is a good time to test our autoscaling.
Let’s exec into our pod and run the following code:
kubectl exec -n dev-space -it <your-pod-name> -- /bin/sh
In the code above, we’re telling Kubernetes to find the <pod-name>
inside dev-space
namespace, start a shell (/bin/sh
) session and stay interactive (-it
), so we can type stuff and see the output live.
So, exec into your pod. If all goes well, you should now see the # prompt. It means congrats, you are inside your pod. It’s not fancy, but it works. So, now run the following command to get our CPU busy - this is a simple infinite loop:
while true; do :; done
Use Ctrl+C
to stop it.
exit
to leave the shell or , worst case, just nuke the pod with kubectl delete pod -n dev-space
But, wait! Let’s do this like the cooler kids or super duper K8s Pros do. Let’s create a container for CPU stress and run it.
Bonus: Super Duper Pro Tip: CPU Stress Container
When we manually did while true; do :; done
, it created a CPU spike inside that one pod, but metrics-server sometimes doesn’t catch it properly unless everything lines up and even then the pod’s container might not even properly reflect internal loop CPU to the kubelet if the loop is too "tight" (especially inside an nginx container which isn't busy to begin with).
OK, then. Now what?
Well, since we are a K8s pro now (we have been playing with it for 5 days after all), we can pull a tiny container that generates CPU stress automatically.
Instead of trying to break our nginx manually, let's deploy a tiny pod whose only job is to eat CPU for breakfast.
Create a new YAML file (let’s call it cpu-stress-deployment.yaml
) and add this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cpu-stress
namespace: dev-space
spec:
replicas: 1
selector:
matchLabels:
app: cpu-stress
template:
metadata:
labels:
app: cpu-stress
spec:
containers:
- name: stress
image: polinux/stress
command: ["stress"]
args: ["--cpu", "2", "--timeout", "300s"]
What this does:
Pulls a small Linux container pre-installed with the
stress
toolFires up 2 CPU workers (
--cpu 2
)Runs them for 5 minutes (
--timeout 300s
)
Let’s apply it:
kubectl apply -f cpu-stress-deployment.yaml
Cool, now let’s run our handy command again to see the effects:
kubectl get hpa
And you should see your CPU usage.
<unknown>
even if pods are properly stressed. Don’t panic—this is a known limitation when running Kubernetes inside WSL2-based Docker setups. In real cloud clusters (like EKS, AKS, or GKE), metrics work normally and autoscaling behaves as expected. In this case, it’s Kubernetes throwing a tantrum inside Docker Desktop. Maybe it’s time to switch to Linux - sometimes, you just need to accept defeat. It’s not you, it’s just the OS. 🤷♂️What’s Next
If you’re still here, congratulations—you’re not just poking around Kubernetes anymore. You’re engineering it now.
Today, we taught our apps how to survive, heal, and grow without babysitting. Not bad for a few YAML edits, right?
Tomorrow, we’re switching gears a little. We’ll dig into how Kubernetes actually connects things behind the scenes—routing traffic, exposing services, and making sure your users don’t have to send smoke signals to reach your apps.
Day 6 is all about Services, Cluster IPs, and LoadBalancers. (Yes, it sounds complicated, but it’s not. We’ll make it make sense.)
See you then. Let’s keep this cluster flying high.
Subscribe to my newsletter
Read articles from TJ Gokken directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

TJ Gokken
TJ Gokken
TJ Gokken is an Enterprise AI/ML Integration Engineer with a passion for bridging the gap between technology and practical application. Specializing in .NET frameworks and machine learning, TJ helps software teams operationalize AI to drive innovation and efficiency. With over two decades of experience in programming and technology integration, he is a trusted advisor and thought leader in the AI community