Kubernetes Series: Auto Scaling Explained

Kubernetes has become a cornerstone for modern containerized application deployment, providing robust mechanisms to manage, deploy, and scale applications. One of the key features that make Kubernetes so powerful is its ability to auto-scale. This blog delves into auto-scaling in Kubernetes, explaining what scaling is, why it's needed, and how Kubernetes handles both manual and automatic scaling. We'll also explore the different types of auto-scaling in Kubernetes, focusing on the Horizontal Pod Autoscaler (HPA) and providing a practical example.

What is Scaling?

Scaling in the context of computing refers to adjusting the capacity of a system to handle varying workloads efficiently. There are two main types of scaling:

Vertical Scaling: Adding more resources (CPU, RAM) to an existing node or server.
Horizontal Scaling: Adding more instances of a resource, such as creating additional nodes or pods, to distribute the load.

The Need for Autoscaling

Applications experience fluctuating workloads due to various factors, such as time of day, user activity, and external events. Manually adjusting resources to meet these demands can be inefficient and error-prone. Autoscaling addresses this by dynamically adjusting the number of instances or resources allocated to an application, ensuring optimal performance and cost-efficiency.

Manual vs. Automatic Scaling

Manual Scaling: Involves human intervention to adjust the number of instances or resources. This approach is straightforward but can be slow and less responsive to sudden changes in workload.
Automatic Scaling: Uses predefined rules or metrics to automatically adjust resources without human intervention. This method is more responsive and ensures that applications can handle varying loads seamlessly.

Types of Auto Scaling in Kubernetes

Kubernetes offers several mechanisms for auto-scaling:

Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas based on CPU utilization or other custom metrics.
Vertical Pod Autoscaler (VPA): Adjusts the resource requests and limits of containers within pods based on usage patterns.
Cluster Autoscaler: Adjusts the number of nodes in a Kubernetes cluster based on the resource requirements of the pods.

How HPA Works

The Horizontal Pod Autoscaler (HPA) is a key component of Kubernetes' auto-scaling capabilities. It automatically adjusts the number of pod replicas in a deployment, replication controller, or replica set based on observed CPU utilization or other select metrics.

HPA Workflow:

Metrics Collection: HPA collects metrics from the Metrics Server (e.g., CPU, memory).
Calculation: It calculates the desired number of replicas based on the target utilization specified.
Adjustment: HPA updates the deployment or replica set to scale up or down the number of pods.

Practical Example: Horizontal Pod Autoscaling

Let's walk through a practical example of setting up Horizontal Pod Autoscaling in a Kubernetes cluster.

Prerequisites:

A running Kubernetes cluster.
kubectl configured to interact with the cluster.
Metrics Server installed in the cluster.

Step 1: Deploy an Application

Deploy a sample application, such as a simple HTTP server.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: nginx
        resources:
          requests:
            cpu: "100m"
          limits:
            cpu: "200m"

Apply the deployment:

kubectl apply -f my-app-deployment.yaml

Step 2: Create a Horizontal Pod Autoscaler

Create an HPA for the deployed application. The following example sets a target CPU utilization of 50%.

kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10

Step 3: Generate Load

Generate load to see the autoscaler in action. This can be done using a load-testing tool like hey or ab (Apache Bench).

hey -z 1m -c 10 http://<external-ip>

Step 4: Monitor Scaling

Monitor the HPA to see how it adjusts the number of replicas based on the load.

kubectl get hpa
kubectl get pods -l app=my-app

Conclusion

Auto-scaling in Kubernetes ensures that your applications can handle varying loads efficiently without manual intervention. By leveraging tools like the Horizontal Pod Autoscaler, you can create robust, self-healing, and highly available applications. Understanding and implementing auto-scaling in your Kubernetes clusters is crucial for optimizing performance and cost-effectiveness.

Reference

Video

Documentation

Day 17 of Kubernetes Series : Understanding Auto Scaling in Kubernetes