Understanding Pod Topology Spread Constraints in Kubernetes

Aayush BishtAayush Bisht
6 min read

Kubernetes is a powerful orchestration platform for managing containerized workloads, but ensuring high availability and efficient resource utilization across a cluster requires careful planning. One critical feature for achieving this is Pod Topology Spread Constraints, which helps distribute pods evenly across nodes, zones, or other topological domains. This article explores why pod spreading is essential, how topology spread constraints work, and provides practical examples with code to illustrate their usage.

Why Spread Pods Across a Cluster?

Imagine you have a Kubernetes cluster with 20 nodes, but all your application pods are scheduled on a single node. This setup is a recipe for disaster when traffic spikes. The single node could hit resource or memory constraints, leading to performance degradation or outright failure. By spreading pods across multiple nodes or zones, you ensure:

  • High Availability: If one node or zone fails, other pods in different nodes/zones keep your application running.

  • Resource Efficiency: Distributing pods prevents overloading specific nodes, balancing CPU, memory, and network usage.

  • Scalability: As your application scales, evenly distributed pods can handle increased traffic without bottlenecks.

How Do Topology Spread Constraints Work?

These constraints let you define rules for pod placement using three main settings:

  1. topologyKey: The label that defines your “domain” (e.g., kubernetes.io/hostname for nodes or topology.kubernetes.io/zone for zones).

  2. maxSkew: The max difference in pod count between any two domains. Smaller maxSkew = more evenly spread.

  3. whenUnsatisfiable: What happens if the rule can’t be met:

    • DoNotSchedule: Pods won’t be scheduled unless the constraint is satisfied (strict).

    • ScheduleAnyway: Pods are scheduled, but the constraint is treated as a preference (flexible).

  4. labelSelector: Identifies which pods the constraint applies to, based on their labels.

Understanding maxSkew with an Example

The maxSkew parameter controls how evenly pods are distributed across topological domains. It represents the maximum difference in the number of pods between any two domains.

Example: Distributing 8 Pods Across 3 Nodes

Suppose you have a cluster with 3 nodes, and you want to deploy 8 pods of an application. You set maxSkew: 1, meaning the difference in the number of pods between any two nodes should not exceed 1.

  • Ideal Distribution: With maxSkew: 1, Kubernetes aims to distribute pods as evenly as possible. For 8 pods across 3 nodes, the distribution would be 3 pods, 3 pods, 2 pods (since 8 ÷ 3 isn’t perfectly divisible).

  • How It Works:

    • Node A: 3 pods

    • Node B: 3 pods

    • Node C: 2 pods

    • The skew between Node A (3 pods) and Node C (2 pods) is 1, which satisfies maxSkew: 1.

If you had set maxSkew: 2, a distribution like 4 pods, 2 pods, 2 pods would also be valid, as the maximum difference (4 - 2 = 2) is within the allowed skew.

Here’s a sample Kubernetes manifest to achieve this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  replicas: 8
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: my-app
      containers:
      - name: my-app
        image: nginx:1.14.2

Understanding whenUnsatisfiable with an Example

The whenUnsatisfiable field determines the scheduler’s behavior when it cannot satisfy the topology spread constraint.

1. DoNotSchedule (Hard Constraint)

With DoNotSchedule, Kubernetes will not schedule pods unless the topology spread constraint can be fully satisfied. This is useful for critical applications where even distribution is non-negotiable.Example: Enforcing Zone-Based DistributionSuppose you have a cluster with 3 availability zones (zone-a, zone-b, zone-c), and you want to deploy 6 pods with maxSkew: 1 across zones. If only two zones are available (e.g., zone-c is down), and whenUnsatisfiable: DoNotSchedule, the scheduler will not place the pods until all zones are available.

Here’s the manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: zone-app
  labels:
    app: zone-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: zone-app
  template:
    metadata:
      labels:
        app: zone-app
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: zone-app
      containers:
      - name: zone-app
        image: nginx:1.14.2

Outcome:

  • If all 3 zones are available, pods are distributed as 2 pods per zone.

  • If only 2 zones are available, the pods won’t be scheduled until the third zone is back.

2. ScheduleAnyway (Soft Constraint)

With ScheduleAnyway, Kubernetes treats the constraint as a preference. If the constraint cannot be satisfied, pods are still scheduled, but the scheduler tries to minimize the skew.Example: Flexible Zone-Based DistributionUsing the same setup as above, but with whenUnsatisfiable: ScheduleAnyway, the scheduler will place pods even if one zone is unavailable, aiming to keep the skew as low as possible.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flexible-app
  labels:
    app: flexible-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: flexible-app
  template:
    metadata:
      labels:
        app: flexible-app
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: flexible-app
      containers:
      - name: flexible-app
        image: nginx:1.14.2

Outcome:

  • If all 3 zones are available, pods are distributed as 2 pods per zone.

  • If only 2 zones are available, pods might be distributed as 3 pods in zone-a and 3 pods in zone-b, minimizing skew but proceeding with scheduling.

Understanding topologyKey for Custom Topologies with an Example

The topologyKey defines the scope of the topological domain. Common keys include:

  • kubernetes.io/hostname: Spreads pods across individual nodes.

  • topology.kubernetes.io/zone: Spreads pods across availability zones.

  • topology.kubernetes.io/region: Spreads pods across regions.

You can also use custom labels to define your own topological domains, such as rack or data-center.

Example: Custom Topology with Rack Labels

Suppose your cluster nodes are labeled with a custom key topology.kubernetes.io/rack (e.g., rack-1, rack-2). You want to deploy 10 pods across racks with maxSkew: 2.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rack-app
  labels:
    app: rack-app
spec:
  replicas: 10
  selector:
    matchLabels:
      app: rack-app
  template:
    metadata:
      labels:
        app: rack-app
    spec:
      topologySpreadConstraints:
      - maxSkew: 2
        topologyKey: topology.kubernetes.io/rack
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: rack-app
      containers:
      - name: rack-app
        image: nginx:1.14.2

Outcome:

  • If you have 3 racks, pods might be distributed as 4, 3, 3 (skew of 4 - 3 = 1, which is within maxSkew: 2).

  • If a rack is unavailable, ScheduleAnyway ensures pods are still scheduled on available racks.

Best Practices for Topology Spread Constraints

  1. Start with ScheduleAnyway for Flexibility: Use DoNotSchedule only for critical applications where uneven distribution is unacceptable.

  2. Combine with Affinity Rules: Use node or pod affinity/anti-affinity rules alongside topology spread constraints for more granular control.

  3. Monitor Resource Usage: Ensure nodes have sufficient resources to handle distributed pods, especially when scaling.

  4. Test with Small Clusters First: Experiment with constraints in a test environment to understand their impact before applying to production.

  5. Use Appropriate maxSkew: Set maxSkew based on your cluster size and desired distribution. A smaller maxSkew enforces stricter balancing but may limit scheduling

0
Subscribe to my newsletter

Read articles from Aayush Bisht directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aayush Bisht
Aayush Bisht

I'm interested in Cloud and DevOps ☁️ AWS | 🛠️ Ansible | 🌐 Terraform | 🔧 Jenkins | 🔄 Git/GitHub | ☸️ Kubernetes | 🐳 Docker | 📜 Shell Script