Understanding Pod Topology Spread Constraints in Kubernetes

Kubernetes is a powerful orchestration platform for managing containerized workloads, but ensuring high availability and efficient resource utilization across a cluster requires careful planning. One critical feature for achieving this is Pod Topology Spread Constraints, which helps distribute pods evenly across nodes, zones, or other topological domains. This article explores why pod spreading is essential, how topology spread constraints work, and provides practical examples with code to illustrate their usage.
Why Spread Pods Across a Cluster?
Imagine you have a Kubernetes cluster with 20 nodes, but all your application pods are scheduled on a single node. This setup is a recipe for disaster when traffic spikes. The single node could hit resource or memory constraints, leading to performance degradation or outright failure. By spreading pods across multiple nodes or zones, you ensure:
High Availability: If one node or zone fails, other pods in different nodes/zones keep your application running.
Resource Efficiency: Distributing pods prevents overloading specific nodes, balancing CPU, memory, and network usage.
Scalability: As your application scales, evenly distributed pods can handle increased traffic without bottlenecks.
How Do Topology Spread Constraints Work?
These constraints let you define rules for pod placement using three main settings:
topologyKey: The label that defines your “domain” (e.g., kubernetes.io/hostname for nodes or topology.kubernetes.io/zone for zones).
maxSkew: The max difference in pod count between any two domains. Smaller maxSkew = more evenly spread.
whenUnsatisfiable: What happens if the rule can’t be met:
DoNotSchedule: Pods won’t be scheduled unless the constraint is satisfied (strict).
ScheduleAnyway: Pods are scheduled, but the constraint is treated as a preference (flexible).
labelSelector: Identifies which pods the constraint applies to, based on their labels.
Understanding maxSkew with an Example
The maxSkew parameter controls how evenly pods are distributed across topological domains. It represents the maximum difference in the number of pods between any two domains.
Example: Distributing 8 Pods Across 3 Nodes
Suppose you have a cluster with 3 nodes, and you want to deploy 8 pods of an application. You set maxSkew: 1, meaning the difference in the number of pods between any two nodes should not exceed 1.
Ideal Distribution: With maxSkew: 1, Kubernetes aims to distribute pods as evenly as possible. For 8 pods across 3 nodes, the distribution would be 3 pods, 3 pods, 2 pods (since 8 ÷ 3 isn’t perfectly divisible).
How It Works:
Node A: 3 pods
Node B: 3 pods
Node C: 2 pods
The skew between Node A (3 pods) and Node C (2 pods) is 1, which satisfies maxSkew: 1.
If you had set maxSkew: 2, a distribution like 4 pods, 2 pods, 2 pods would also be valid, as the maximum difference (4 - 2 = 2) is within the allowed skew.
Here’s a sample Kubernetes manifest to achieve this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
spec:
replicas: 8
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: my-app
containers:
- name: my-app
image: nginx:1.14.2
Understanding whenUnsatisfiable with an Example
The whenUnsatisfiable field determines the scheduler’s behavior when it cannot satisfy the topology spread constraint.
1. DoNotSchedule (Hard Constraint)
With DoNotSchedule, Kubernetes will not schedule pods unless the topology spread constraint can be fully satisfied. This is useful for critical applications where even distribution is non-negotiable.Example: Enforcing Zone-Based DistributionSuppose you have a cluster with 3 availability zones (zone-a, zone-b, zone-c), and you want to deploy 6 pods with maxSkew: 1 across zones. If only two zones are available (e.g., zone-c is down), and whenUnsatisfiable: DoNotSchedule, the scheduler will not place the pods until all zones are available.
Here’s the manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: zone-app
labels:
app: zone-app
spec:
replicas: 6
selector:
matchLabels:
app: zone-app
template:
metadata:
labels:
app: zone-app
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: zone-app
containers:
- name: zone-app
image: nginx:1.14.2
Outcome:
If all 3 zones are available, pods are distributed as 2 pods per zone.
If only 2 zones are available, the pods won’t be scheduled until the third zone is back.
2. ScheduleAnyway (Soft Constraint)
With ScheduleAnyway, Kubernetes treats the constraint as a preference. If the constraint cannot be satisfied, pods are still scheduled, but the scheduler tries to minimize the skew.Example: Flexible Zone-Based DistributionUsing the same setup as above, but with whenUnsatisfiable: ScheduleAnyway, the scheduler will place pods even if one zone is unavailable, aiming to keep the skew as low as possible.
apiVersion: apps/v1
kind: Deployment
metadata:
name: flexible-app
labels:
app: flexible-app
spec:
replicas: 6
selector:
matchLabels:
app: flexible-app
template:
metadata:
labels:
app: flexible-app
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: flexible-app
containers:
- name: flexible-app
image: nginx:1.14.2
Outcome:
If all 3 zones are available, pods are distributed as 2 pods per zone.
If only 2 zones are available, pods might be distributed as 3 pods in zone-a and 3 pods in zone-b, minimizing skew but proceeding with scheduling.
Understanding topologyKey for Custom Topologies with an Example
The topologyKey defines the scope of the topological domain. Common keys include:
kubernetes.io/hostname: Spreads pods across individual nodes.
topology.kubernetes.io/zone: Spreads pods across availability zones.
topology.kubernetes.io/region: Spreads pods across regions.
You can also use custom labels to define your own topological domains, such as rack or data-center.
Example: Custom Topology with Rack Labels
Suppose your cluster nodes are labeled with a custom key topology.kubernetes.io/rack (e.g., rack-1, rack-2). You want to deploy 10 pods across racks with maxSkew: 2.
apiVersion: apps/v1
kind: Deployment
metadata:
name: rack-app
labels:
app: rack-app
spec:
replicas: 10
selector:
matchLabels:
app: rack-app
template:
metadata:
labels:
app: rack-app
spec:
topologySpreadConstraints:
- maxSkew: 2
topologyKey: topology.kubernetes.io/rack
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: rack-app
containers:
- name: rack-app
image: nginx:1.14.2
Outcome:
If you have 3 racks, pods might be distributed as 4, 3, 3 (skew of 4 - 3 = 1, which is within maxSkew: 2).
If a rack is unavailable, ScheduleAnyway ensures pods are still scheduled on available racks.
Best Practices for Topology Spread Constraints
Start with ScheduleAnyway for Flexibility: Use DoNotSchedule only for critical applications where uneven distribution is unacceptable.
Combine with Affinity Rules: Use node or pod affinity/anti-affinity rules alongside topology spread constraints for more granular control.
Monitor Resource Usage: Ensure nodes have sufficient resources to handle distributed pods, especially when scaling.
Test with Small Clusters First: Experiment with constraints in a test environment to understand their impact before applying to production.
Use Appropriate maxSkew: Set maxSkew based on your cluster size and desired distribution. A smaller maxSkew enforces stricter balancing but may limit scheduling
Subscribe to my newsletter
Read articles from Aayush Bisht directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Aayush Bisht
Aayush Bisht
I'm interested in Cloud and DevOps ☁️ AWS | 🛠️ Ansible | 🌐 Terraform | 🔧 Jenkins | 🔄 Git/GitHub | ☸️ Kubernetes | 🐳 Docker | 📜 Shell Script