Resource Quotas & Horizontal Pod Autoscaling Guide

Let's explore how resource quotas and horizontal pod autoscaling (HPA) are implemented in Kubernetes, along with their key concepts, configurations, and interactions.

Resource Quotas

Resource quotas in Kubernetes manage and limit resource usage within a namespace, ensuring fair resource allocation among different namespaces and preventing resource exhaustion. They are defined using a YAML configuration file and applied to a specific namespace.

Key Components of Resource Quotas

Pods: Limits the number of pods that can be created in a namespace.
CPU and Memory Requests: Specifies the minimum amount of CPU and memory resources that must be requested by the pods.
CPU and Memory Limits: Specifies the maximum amount of CPU and memory resources that the pods can use.
Persistent Volume Claims (PVCs): Limits on the number of PVCs and storage usage.

Example Resource Quota Configuration

apiVersion: v1
kind: ResourceQuota
metadata:
  name: example-quota
  namespace: example-namespace
spec:
  hard:
    pods: "10"
    requests.cpu: "4"
    requests.memory: "8Gi"
    limits.cpu: "8"
    limits.memory: "16Gi"
    persistentvolumeclaims: "5"
    requests.storage: "100Gi"

In this example:

The namespace example-namespace can have up to 10 pods.
The total CPU requests can be up to 4 CPUs, and memory requests can be up to 8Gi.
The total CPU limits can be up to 8 CPUs, and memory limits can be up to 16Gi.
Up to 5 PVCs can be created, and the total storage requested can be up to 100Gi.

Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a deployment, replica set, or stateful set based on observed metrics like CPU usage, memory usage, or custom metrics. This helps applications handle different loads efficiently.

Key Components of HPA

Target Resource: The deployment, replica set, or stateful set that the HPA will scale.
Metrics: Metrics used to decide scaling actions, such as CPU usage, memory usage, or custom metrics.
Scaling Policy: Defines the minimum and maximum number of replicas and the target metric value.

Example HPA Configuration

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
  namespace: example-namespace
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

In this example:

The HPA targets a deployment named example-deployment.
It maintains between 1 and 10 replicas.
It scales based on CPU utilization, aiming for an average CPU utilization of 50%.

Interaction Between Resource Quotas and HPA

When using resource quotas and HPA together, it's important to ensure that the resource quota limits are compatible with the scaling requirements of the HPA. Here are some considerations:

Quota Limits and Scaling: Ensure that the resource quota limits are high enough to accommodate the maximum number of replicas that the HPA may scale to. If the quota is too restrictive, the HPA may not be able to scale up as needed.
Resource Requests and Limits: Properly set the resource requests and limits for the pods. The HPA will scale based on the metrics, but the actual resource usage should stay within the bounds defined by the resource quotas.
Namespace Constraints: Resource quotas apply at the namespace level, so if multiple HPA-enabled deployments exist within the same namespace, their combined resource usage must not exceed the quota.
Preventing Resource Starvation: Properly configured resource quotas prevent any single application from consuming all the resources, ensuring fair distribution among all applications within the namespace.

Practical Example

Imagine you have a namespace production with a resource quota and a deployment with HPA:

Resource Quota for `production` Namespace

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    pods: "20"
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    persistentvolumeclaims: "10"
    requests.storage: "200Gi"

HPA for a Deployment in `production` Namespace

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 15
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Summary

By using resource quotas and horizontal pod autoscaling effectively, Kubernetes ensures that applications can scale dynamically to meet demand while respecting resource limits set at the namespace level. This combination helps maintain a balanced and efficient use of cluster resources, preventing any single application from overusing resources and ensuring fair distribution among all applications.

A Guide to Resource Quotas and Horizontal Pod Autoscaling

Resource Quotas

Key Components of Resource Quotas

Example Resource Quota Configuration

Horizontal Pod Autoscaler (HPA)

Key Components of HPA

Example HPA Configuration

Interaction Between Resource Quotas and HPA

Practical Example

Resource Quota for `production` Namespace

HPA for a Deployment in `production` Namespace

Summary

Subscribe to my newsletter

Mohmmad Saif

Mohmmad Saif

A Guide to Resource Quotas and Horizontal Pod Autoscaling

Resource Quotas

Key Components of Resource Quotas

Example Resource Quota Configuration

Horizontal Pod Autoscaler (HPA)

Key Components of HPA

Example HPA Configuration

Interaction Between Resource Quotas and HPA

Practical Example

Resource Quota for production Namespace

HPA for a Deployment in production Namespace

Summary

Subscribe to my newsletter

Mohmmad Saif

Mohmmad Saif

Resource Quota for `production` Namespace

HPA for a Deployment in `production` Namespace