Demystifying Kubernetes Scheduling and Pod Placement

sneh srivastavasneh srivastava
5 min read

As Kubernetes becomes the de facto standard for container orchestration, understanding how the scheduler makes placement decisions is essential. In this article, we walk through the key components that influence pod scheduling and placement, including labels, selectors, quotas, taints, topology rules, and more. Whether you're just getting started or brushing up your fundamentals, this guide breaks down the concepts like a Kubernetes expert would explain them to a beginner.


1. Kubernetes Scheduling Fundamentals

The scheduler is a control plane component responsible for selecting the most suitable node for a pod. Its decision-making involves a multi-step pipeline:

  • Queue: Pods waiting for scheduling sit in a queue.

  • Filter: Nodes that don't meet basic requirements (like insufficient resources or taint conflicts) are filtered out.

  • Score: Remaining nodes are scored based on policies (like resource balance).

  • Binding: The chosen node is recorded in the pod's spec by the scheduler.

This pipeline is foundational, and understanding it is key to advanced scheduling decisions.

Note: The scheduler operates independently of namespaces. That is, while namespaces logically separate resources, the scheduler doesn't consider namespace as part of its decision-making.


2. Labels, Selectors, and Annotations

Labels and Selectors:

Labels are key-value pairs attached to objects like pods or nodes. They're used to organize and select subsets of objects.

  • Equality-based selectors:
matchLabels:
  app: frontend

This selects objects where app=frontend.

  • Set-based selectors:
matchExpressions:
  - key: env
    operator: In
    values:
      - prod
      - staging

This selects objects where env is either prod or staging.

Annotations:

Labels can be used to select objects and to find collections of objects that satisfy certain conditions. In contrast, annotations are not used to identify and select objects. The metadata in an annotation can be small or large, structured or unstructured, and can include characters not permitted by labels. It is possible to use labels as well as annotations in the metadata of the same object.

1. Describing the Last User Who Modified a Resource

metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: <json blob>

Used by kubectl apply to track the last applied configuration. Helps with intelligent merges and diffs.


metadata:
  annotations:
    documentation-url: "https://internal.docs.company.com/my-app"

Useful in enterprises where each service has a Confluence or documentation page.


3. Telling Ingress Controller How to Handle Requests

metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /

Ingress controllers like NGINX or Traefik rely heavily on annotations to define behaviors like SSL redirect, path rewrites, etc.


4. Service Mesh Integration (e.g., Istio)

metadata:
  annotations:
    sidecar.istio.io/inject: "true"

Tells Istio to automatically inject its Envoy sidecar into the pod.


5. Custom Monitoring Tags

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"

Prometheus uses these to decide whether to scrape metrics from a pod/service.


6. Backup Instructions for Velero

metadata:
  annotations:
    backup.velero.io/backup-volumes: "data-volume"

Velero, a backup/restore tool, uses this to identify which volumes to snapshot.


7. Adding Owner or Team Info (Internal Tracking)

metadata:
  annotations:
    owner: "devops-team"
    contact-email: "devops@company.com"

Helps with ownership traceability — especially useful for internal policies.


8. Controlling Pod Security Policies (Deprecated in newer versions, still valid for legacy)

metadata:
  annotations:
    apparmor.security.beta.kubernetes.io/nginx: localhost/nginx-apparmor-profile

Used to assign AppArmor profiles to pods.


3. Namespaces: Logical Resource Boundaries

Namespaces are a way to logically group resources in a cluster. They're useful for multi-tenancy, separating dev/staging/prod, and applying policies.

  • They span across nodes (VMs).

  • You can set resource quotas on namespaces to limit how much CPU/memory they can consume.

Pro Tip: Scheduler is namespace-agnostic, meaning it schedules based on node resources and policies, not namespace boundaries.


4. Resource Quotas and Pod Overhead

Resource Quotas:

You can apply a ResourceQuota on a namespace to limit total CPU/memory usage:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
spec:
  hard:
    requests.cpu: "4"
    limits.memory: "8Gi"

Pod Overhead:

Introduced to account for the resources consumed by the container runtime itself (like pause containers). It's added on top of a pod's resource requests.


5. Advanced Scheduling Rules

Taints and Tolerations:

  • Taint: Applied on nodes to repel pods unless tolerated.
kubectl taint nodes node1 key=value:NoSchedule
  • Effects:

    • NoSchedule: Pod won't be scheduled.

    • PreferNoSchedule: Scheduler tries to avoid but may schedule.

    • NoExecute: Evicts existing pods.

  • Tolerations: Pods declare which taints they can tolerate.

Node Affinity:

Used to express node preferences using labels:

  • requiredDuringSchedulingIgnoredDuringExecution

  • preferredDuringSchedulingIgnoredDuringExecution

Example:

requiredDuringSchedulingIgnoredDuringExecution:
  nodeSelectorTerms:
  - matchExpressions:
    - key: disktype
      operator: In
      values:
      - ssd

6. Topology Spread Constraints

Used to ensure pods are evenly distributed across zones, nodes, or other topologies.

  • Max Skew: Max difference in pod counts across topologies.

  • Topology Key: The label used to define the domain (e.g., topology.kubernetes.io/zone).

  • When Unsatisfiable:

    • DoNotSchedule: Reject pod.

    • ScheduleAnyway: Allow, but prefer to honor constraints.

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: frontend

7. Priority Classes and Preemption

Used to influence which pods get scheduled first during resource scarcity.

  • Higher priority pods can preempt (evict) lower priority ones.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000

8. Node Lease (Heartbeat Mechanism)

Each node periodically renews its lease in the kube-node-lease namespace. This reduces API server load while maintaining accurate node status.

If the lease isn’t renewed in time, Kubernetes assumes the node is down and evicts pods.


Final Thoughts

Kubernetes scheduling is both powerful and flexible. By understanding how labels, namespaces, taints, and priorities influence pod placement, you're well on your way to designing production-ready workloads. Whether you're building resilient systems or fine-tuning resource usage, the key lies in mastering these building blocks.

Ready to level up your workloads? Start small, experiment with taints, priorities, and quotas — and you'll soon think like the scheduler!

0
Subscribe to my newsletter

Read articles from sneh srivastava directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

sneh srivastava
sneh srivastava