Application

Consider a case where a modern microservices application, with complex external integrations, large datasets, and extensive configuration, is deployed on GKE (Google Kubernetes Engine). This application should auto-scale horizontally based on increases or decreases in traffic.

Problems

New pods cannot handle traffic during auto-scaling.
Pods are failing to start and are stuck in a deadlock state.

Gathering Facts

It is time to dive deeper and gather information to diagnose the issue. Let's consult with the SRE team to gain a better understanding, as the issue arises during autoscaling. The existing healthy pod is overwhelmed with traffic, and the load is also distributed to unhealthy pods during auto-scaling, which are unable to serve requests effectively.

Reproducing the Issue Locally

We replicated the production environment in a non-production setting. Similar data sizes were set up in the database, and live traffic was gradually increased to trigger auto-scaling. As the traffic crossed the target CPU utilization threshold, pods began to auto-scale, and, as anticipated, similar issues started appearing in the non-production environment.

This issue wasn’t detected in the QA environment due to the absence of live traffic and the use of limited data for testing.

Root Cause

The engineering team discovered that the issue arose because the pods were not ready to manage traffic while auto-scaling as traffic increased. Therefore, a mechanism was needed to ensure the pods reached a healthy and ready state before serving traffic. The SRE team also observed this behavior and identified that readiness and liveness probes were missing, which are necessary to validate the health of containers and pods in Kubernetes.

Solution

The readiness and liveness probes were configured in the deployment YAML file, and the same steps (outlined in the “Reproducing the Issue Locally” section) were followed to test the application’s behavior. We fine-tuned the readiness and liveness probes by adjusting various fields (explained below) and conducted multiple tests to determine the optimal values for these fields.

Kubernetes offers different types of probes:

Readiness
Liveness
Startup

In this article, we’ll focus on the readiness and liveness probes that resolved the issue.

Readiness Probe

The readiness probe determines when a container is ready to serve traffic. This probe is useful for complex applications that perform time-consuming initial tasks, such as establishing network connections or loading files and caches. If the readiness probe fails, Kubernetes removes the pod from all service endpoints.

Liveness Probe

Consider a scenario where the application is running but unable to progress. The liveness probe detects such deadlock conditions. If the container repeatedly fails the liveness probe, the kubelet restarts it.

These probes need to be configured with several fields to control their behavior:

initialDelaySeconds: Specifies the number of seconds the kubelet should wait before performing the first probe after the container has started.
periodSeconds: The frequency at which the probe is performed.
timeoutSeconds: The number of seconds after which the probe times out.
successThreshold: The minimum number of consecutive successes required for the probe to be considered successful after a failure.
failureThreshold: The number of failed attempts required for Kubernetes to consider the container unhealthy and initiate a restart.
terminationGracePeriodSeconds: The grace period to wait between initiating the shutdown of a failed container and actually stopping it.

Conclusion

By implementing readiness and liveness probes in our Kubernetes deployment, we were able to effectively manage traffic during auto-scaling and prevent pods from entering a deadlock state. The readiness probe ensured that only healthy pods could serve requests, while the liveness probe allowed the system to automatically recover from any potential failures or deadlocks. Configuring these probes with the right parameters allowed us to create a resilient, self-healing system capable of handling traffic fluctuations seamlessly. This solution highlights the importance of understanding Kubernetes health checks for optimizing application availability and reliability in dynamic cloud environments.

With these adjustments, our application now scales smoothly on GKE, ready to handle traffic spikes without compromising on stability.

Optimizing Kubernetes Auto-Scaling: How Readiness and Liveness Probes Enhance Application Stability