Kubernetes Scheduling Algorithm: How the Scheduler Chooses the Best Node for Your Pods

Rahul BansodRahul Bansod
5 min read

Introduction

A key part of Kubernetes' functionality is the scheduler, which decides where each Pod should run. The scheduler considers various factors, balancing workloads and maximizing resource efficiency across the cluster. In this post, we’ll explore the scheduling algorithm used by Kubernetes, breaking down the steps involved in selecting the best node for each Pod.

High-Level Scheduling Algorithm Explained

The core logic of the scheduling algorithm can be described in the following steps:

1. Gather All Healthy Nodes

The scheduler begins by retrieving a list of all nodes in the cluster that are both known and healthy. Only these nodes are viable options for Pod placement.

2. Apply Predicates

Each node is then evaluated based on a set of conditions known as predicates. Predicates ensure that a node has the necessary resources and configurations required by the Pod. Examples include:

  • Checking if a node has enough CPU and memory.

  • Ensuring that required node labels or constraints are met.

If a node satisfies all predicate checks for a given Pod, it’s added to a list of viable nodes.

3. Calculate Priority Scores

Once viable nodes are identified, the scheduler evaluates each one using priority functions. Priority functions assign a score to each node based on criteria like:

  • Resource availability (favoring nodes with more spare capacity).

  • Affinity rules (e.g., preferring certain zones or proximity to other Pods).

Nodes with higher scores are more likely to be chosen for scheduling. The scores are pushed into a priority queue ordered by the score, with the highest scores at the top.

4. Identify Nodes with the Best Scores

The node at the top of the queue represents the best-scoring node. To ensure fairness, the scheduler selects all nodes that have the same top score, treating them as equal options.

5. Select a Node Using Round-Robin Selection

In the case of identical scores, a round-robin selection is used rather than random choice. This approach helps distribute Pods evenly among nodes with equal scores, promoting balance and resource utilization across the cluster.

Algorithm Summary

def schedule(pod):
    nodes = getAllHealthyNodes()
    viableNodes = []

    for node in nodes:
        for predicate in predicates:
            if predicate(node, pod):
                viableNodes.append(node)

    scoredNodes = PriorityQueue()
    priorities = GetPriorityFunctions()

    for node in viableNodes:
        score = CalculateCombinedPriority(node, pod, priorities)
        scoredNodes.push((score, node))

    bestScore = scoredNodes.top().score
    selectedNodes = []

    while scoredNodes.top().score == bestScore:
        selectedNodes.append(scoredNodes.pop())

    node = selectAtRandom(selectedNodes)
    return node.name

Let’s walk through this scheduling algorithm with a practical example, assuming we have a Kubernetes cluster with three nodes and a new Pod that needs to be scheduled.


Example Scenario

Suppose we have the following nodes in our Kubernetes cluster and a Pod with specific resource requirements:

Cluster Setup:

  • Node X: 6 CPUs, 12 GB RAM, healthy

  • Node Y: 4 CPUs, 8 GB RAM, healthy

  • Node Z: 10 CPUs, 20 GB RAM, healthy

Pod Requirements:

  • CPU: 2 CPUs

  • Memory: 4 GB RAM

Our goal is to place this Pod on the best node according to the Kubernetes scheduling algorithm, following each step to ensure accuracy.


Step-by-Step Execution of the Scheduling Algorithm

1. Gather All Healthy Nodes

First, the scheduler retrieves a list of all nodes that are healthy:

  • Nodes X, Y, and Z are all healthy and thus are potential options for scheduling.

2. Apply Predicates to Filter Viable Nodes

The scheduler applies predicates to check if each node can meet the resource requirements of the Pod.

  1. Node X:

    • CPU: 6 CPUs ≥ 2 CPUs (Pod requirement) – ✅

    • Memory: 12 GB ≥ 4 GB (Pod requirement) – ✅

    • Result: Node X is viable.

  2. Node Y:

    • CPU: 4 CPUs ≥ 2 CPUs (Pod requirement) – ✅

    • Memory: 8 GB ≥ 4 GB (Pod requirement) – ✅

    • Result: Node Y is also viable.

  3. Node Z:

    • CPU: 10 CPUs ≥ 2 CPUs (Pod requirement) – ✅

    • Memory: 20 GB ≥ 4 GB (Pod requirement) – ✅

    • Result: Node Z is viable.

All three nodes meet the Pod’s requirements and are included in the viable nodes list: Node X, Node Y, Node Z.

3. Calculate Priority Scores

Now, each viable node is scored based on available resources left after scheduling the Pod. Let’s assume 1 point per remaining CPU and 1 point per remaining GB of RAM after placing the Pod.

  1. Node X:

    • Remaining CPU: 6 - 2 = 4 CPUs

    • Remaining Memory: 12 GB - 4 GB = 8 GB

    • Score: 4 (CPU) + 8 (Memory) = 12 points

  2. Node Y:

    • Remaining CPU: 4 - 2 = 2 CPUs

    • Remaining Memory: 8 GB - 4 GB = 4 GB

    • Score: 2 (CPU) + 4 (Memory) = 6 points

  3. Node Z:

    • Remaining CPU: 10 - 2 = 8 CPUs

    • Remaining Memory: 20 GB - 4 GB = 16 GB

    • Score: 8 (CPU) + 16 (Memory) = 24 points

After scoring, the priority queue of nodes by score would look like this:

  • Node Z (24 points)

  • Node X (12 points)

  • Node Y (6 points)

4. Identify Nodes with the Best Scores

The scheduler identifies Node Z as having the highest score (24 points). Since no other nodes share this score, no tie-breaking is necessary.

5. Select Node with the Highest Score

Since Node Z has the highest score and is the best match for the Pod’s requirements, the scheduler selects Node Z as the target node for the Pod.

1
Subscribe to my newsletter

Read articles from Rahul Bansod directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rahul Bansod
Rahul Bansod

Kubernetes Consultant and DevOps Enthusiast, passionate about simplifying cloud-native technologies for developers and businesses. With a focus on Kubernetes, I dive deep into topics like API server processing, authentication, RBAC, and container orchestration. Sharing insights, best practices, and real-world examples to empower teams in building scalable, resilient infrastructure. Let's unlock the full potential of cloud-native together! Let me know if you'd like any adjustments!