Kubernetes Scheduling Algorithm: How the Scheduler Chooses the Best Node for Your Pods
Introduction
A key part of Kubernetes' functionality is the scheduler, which decides where each Pod should run. The scheduler considers various factors, balancing workloads and maximizing resource efficiency across the cluster. In this post, we’ll explore the scheduling algorithm used by Kubernetes, breaking down the steps involved in selecting the best node for each Pod.
High-Level Scheduling Algorithm Explained
The core logic of the scheduling algorithm can be described in the following steps:
1. Gather All Healthy Nodes
The scheduler begins by retrieving a list of all nodes in the cluster that are both known and healthy. Only these nodes are viable options for Pod placement.
2. Apply Predicates
Each node is then evaluated based on a set of conditions known as predicates. Predicates ensure that a node has the necessary resources and configurations required by the Pod. Examples include:
Checking if a node has enough CPU and memory.
Ensuring that required node labels or constraints are met.
If a node satisfies all predicate checks for a given Pod, it’s added to a list of viable nodes.
3. Calculate Priority Scores
Once viable nodes are identified, the scheduler evaluates each one using priority functions. Priority functions assign a score to each node based on criteria like:
Resource availability (favoring nodes with more spare capacity).
Affinity rules (e.g., preferring certain zones or proximity to other Pods).
Nodes with higher scores are more likely to be chosen for scheduling. The scores are pushed into a priority queue ordered by the score, with the highest scores at the top.
4. Identify Nodes with the Best Scores
The node at the top of the queue represents the best-scoring node. To ensure fairness, the scheduler selects all nodes that have the same top score, treating them as equal options.
5. Select a Node Using Round-Robin Selection
In the case of identical scores, a round-robin selection is used rather than random choice. This approach helps distribute Pods evenly among nodes with equal scores, promoting balance and resource utilization across the cluster.
Algorithm Summary
def schedule(pod):
nodes = getAllHealthyNodes()
viableNodes = []
for node in nodes:
for predicate in predicates:
if predicate(node, pod):
viableNodes.append(node)
scoredNodes = PriorityQueue()
priorities = GetPriorityFunctions()
for node in viableNodes:
score = CalculateCombinedPriority(node, pod, priorities)
scoredNodes.push((score, node))
bestScore = scoredNodes.top().score
selectedNodes = []
while scoredNodes.top().score == bestScore:
selectedNodes.append(scoredNodes.pop())
node = selectAtRandom(selectedNodes)
return node.name
Let’s walk through this scheduling algorithm with a practical example, assuming we have a Kubernetes cluster with three nodes and a new Pod that needs to be scheduled.
Example Scenario
Suppose we have the following nodes in our Kubernetes cluster and a Pod with specific resource requirements:
Cluster Setup:
Node X: 6 CPUs, 12 GB RAM, healthy
Node Y: 4 CPUs, 8 GB RAM, healthy
Node Z: 10 CPUs, 20 GB RAM, healthy
Pod Requirements:
CPU: 2 CPUs
Memory: 4 GB RAM
Our goal is to place this Pod on the best node according to the Kubernetes scheduling algorithm, following each step to ensure accuracy.
Step-by-Step Execution of the Scheduling Algorithm
1. Gather All Healthy Nodes
First, the scheduler retrieves a list of all nodes that are healthy:
- Nodes X, Y, and Z are all healthy and thus are potential options for scheduling.
2. Apply Predicates to Filter Viable Nodes
The scheduler applies predicates to check if each node can meet the resource requirements of the Pod.
Node X:
CPU: 6 CPUs ≥ 2 CPUs (Pod requirement) – ✅
Memory: 12 GB ≥ 4 GB (Pod requirement) – ✅
Result: Node X is viable.
Node Y:
CPU: 4 CPUs ≥ 2 CPUs (Pod requirement) – ✅
Memory: 8 GB ≥ 4 GB (Pod requirement) – ✅
Result: Node Y is also viable.
Node Z:
CPU: 10 CPUs ≥ 2 CPUs (Pod requirement) – ✅
Memory: 20 GB ≥ 4 GB (Pod requirement) – ✅
Result: Node Z is viable.
All three nodes meet the Pod’s requirements and are included in the viable nodes list: Node X, Node Y, Node Z.
3. Calculate Priority Scores
Now, each viable node is scored based on available resources left after scheduling the Pod. Let’s assume 1 point per remaining CPU and 1 point per remaining GB of RAM after placing the Pod.
Node X:
Remaining CPU: 6 - 2 = 4 CPUs
Remaining Memory: 12 GB - 4 GB = 8 GB
Score: 4 (CPU) + 8 (Memory) = 12 points
Node Y:
Remaining CPU: 4 - 2 = 2 CPUs
Remaining Memory: 8 GB - 4 GB = 4 GB
Score: 2 (CPU) + 4 (Memory) = 6 points
Node Z:
Remaining CPU: 10 - 2 = 8 CPUs
Remaining Memory: 20 GB - 4 GB = 16 GB
Score: 8 (CPU) + 16 (Memory) = 24 points
After scoring, the priority queue of nodes by score would look like this:
Node Z (24 points)
Node X (12 points)
Node Y (6 points)
4. Identify Nodes with the Best Scores
The scheduler identifies Node Z as having the highest score (24 points). Since no other nodes share this score, no tie-breaking is necessary.
5. Select Node with the Highest Score
Since Node Z has the highest score and is the best match for the Pod’s requirements, the scheduler selects Node Z as the target node for the Pod.
Subscribe to my newsletter
Read articles from Rahul Bansod directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Rahul Bansod
Rahul Bansod
Kubernetes Consultant and DevOps Enthusiast, passionate about simplifying cloud-native technologies for developers and businesses. With a focus on Kubernetes, I dive deep into topics like API server processing, authentication, RBAC, and container orchestration. Sharing insights, best practices, and real-world examples to empower teams in building scalable, resilient infrastructure. Let's unlock the full potential of cloud-native together! Let me know if you'd like any adjustments!