Autoscaling Kubernetes Pods Based on HTTP Traffic

rishita trishita t
5 min read

Kubernetes (K8s) is a powerful tool for managing and containerized our applications. One of the key features that makes K8s stand out is autoscaling the ability to automatically adjust the number of pods running our application to handle traffic efficiently.

In this article we are going to dive into the world of autoscaling based on HTTP requests so by the end of this we will know how to configure Kubernetes to scale our application up or down automatically depending on incoming HTTP traffic.

What is Autoscaling?

Before we get into the specifics of autoscaling in K8s, let’s see what autoscaling is.

Autoscaling is the process of automatically adjusting the number of running instances (or pods) of an application to meet the demand. If there’s a surge in traffic, autoscaling will scale up (add more pods), and if the traffic decreases it will scale down (remove pods).

For example, imagine our web application is receiving hundreds of requests per second. Kubernetes will automatically spin up more pods to handle the load. Once the traffic drops Kubernetes will scale back the pods to save resources and cost.

Key Concepts in Kubernetes Autoscaling

In Kubernetes there are few types of autoscaling we often hear about:

  1. Horizontal Pod Autoscaler (HPA): This is the most commonly used autoscaler. It automatically adjusts the number of pods in a deployment or replica set based on CPU usage, memory or custom metrics like HTTP request counts.

  2. Cluster Autoscaler: This adjusts the number of nodes in the cluster based on resource demands. It's a more high-level scaling, unlike the HPA, which operates at the pod level.

  3. Vertical Pod Autoscaler (VPA): This adjusts the CPU and memory requests and limits for individual pods based on their usage. It doesn't scale the number of pods but adjusts the resources assigned to them.

For this article, lets focus on Horizontal Pod Autoscaler and scale our application based on HTTP requests making it more suitable for web based applications.

How Does Autoscaling Based on HTTP Requests Work?

In most cases K8s Horizontal Pod Autoscaler (HPA) uses CPU or memory usage to trigger scaling actions. But what if we want to scale based on HTTP requests instead. This is where custom metrics come into play.

With custom metrics we can tell K8s to scale based on metrics like the number of HTTP requests per second or any other custom metric relevant to our application.

Step 1: Set Up Metrics Server

Kubernetes needs a metrics server to gather and expose resource usage data (like CPU and memory) that HPA can use. Without it, autoscaling based on HTTP requests or any resource usage won’t work.

To install the metrics server, run the following command:

Copy

Copy

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml

Once the metrics server is set up, We can verify that it's running properly by executing:

Copy

Copy

kubectl get deployment metrics-server -n kube-system

Step 2: Deploy Application

Lets assume we have a simple application like a basic web server or API that handles HTTP requests. We can deploy it as a Kubernetes Deployment:

Copy

Copy

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myappcontainer
          image: myappimage
          ports:
            - containerPort: 80

This is a simple deployment configuration with three replica (three pod). Here we will scale this deployment based on HTTP requests.

Step 3: Expose Custom Metrics for HTTP Requests

To scale based on HTTP requests we need to create custom metrics that Kubernetes can use. There are multiple ways we can do this but a common approach is to use an external metrics provider like Prometheus with Prometheus Adapter.

Prometheus is a monitoring and alerting tool kit that can gather and expose metrics, including HTTP requests. Prometheus Adapter makes these metrics available for HPA to use.

How to set it up:

  1. Install Prometheus in Kubernetes cluster using Helm (or manually).

  2. Install Prometheus Adapter to expose Prometheus metrics to Kubernetes.

Once Prometheus Adapter is installed we can expose metrics like HTTP requests per second.

Step 4: Configure Horizontal Pod Autoscaler

Now comes the best part: configuring the Horizontal Pod Autoscaler (HPA) to scale based on the custom metric (HTTP requests).

We can create a HPA that scales our pods based on the number of HTTP requests. Firstly we need to make sure that our custom metric ( http_requests_total) is exposed and available.

Here is an example HPA configuration:

Copy

Copy

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapphpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: http_requests_total
          selector:
            matchLabels:
              app: myapp
        target:
          type: Value
          value: "100"

This HPA configuration will scale our myapp deployment based on the number of HTTP requests. If the number of requests exceeds the defined value (e.g., 100 requests) it will automatically scale the pods.

  • minReplicas: The minimum number of pods running.

  • maxReplicas: The maximum number of pods running.

  • metrics: Here we specify the custom metric (http_requests_total) and define a target value for autoscaling.

Step 5: Test the Autoscaling

Now to test if everything works fine. We will send some HTTP requests to our application and watch how Kubernetes scales the number of pods based on traffic. We can use tools like curl to simulate traffic:

Copy

Copy

curl -n 1000 -c 10 http://your-app-url

We can check the scaling with:

Copy

Copy

kubectl get hpa

We should see that the number of pods adjusts automatically based on the traffic.

Final Words:

Autoscaling based on HTTP requests in Kubernetes is a powerful way to ensure that our application can handle traffic spikes while saving resources when demand is low. By leveraging the Horizontal Pod Autoscaler and custom metrics like HTTP requests we can create a dynamic, efficient, and scalable application that adjusts automatically to traffic patterns.

0
Subscribe to my newsletter

Read articles from rishita t directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

rishita t
rishita t