Scheduling in Kubernetes
Manual Scheduling
Every pod has a field named nodeName, which by default is not set. Kubernetes adds it automatically.
The scheduler goes through all the pods and looks for those that do not have this property set. Those are the candidates for scheduling.
It then identifies the right node for the pod by running a scheduling algorithm.
Once identified, it schedules the pod on the node by setting the nodeName property to the name of the node by creating a binding object.
If there is no scheduler to schedule nodes, the pods continue to be in a pending state.
In such cases, you can manually assign pods to nodes yourself.
Without a scheduler, the easiest way to schedule a pod is to simply set the nodeName field to the name of the node in your pod specification file while creating the pod.
Kubernetes won't allow to modify the nodeName property of a pod. So another way to assign a node to an existing pod is to create a binding object and send a POST request to the pod's binding API.
Labels and Selectors
Labels
Labels are properties attached to each item.
In a pod definition file, under metadata, create a section called labels.
Under that, add the labels in a key-value format.
You can add as many labels as you like.
Selectors
- Once the pod is created, to select the pod with the labels, use the
kubectl get pods
command along with the--selector
option and specify the label.
Use case of Labels and Selectors
Kubernetes objects use labels and selectors internally to connect different objects.
Ex. for ReplicaSet, to connect the ReplicaSet to the pod, we configure the selector field under the ReplicaSet specification to match the labels defined on the pod.
Annotations
While labels and selectors are used to group and select objects, annotations are used to record other details for informatory purposes.
Ex. tool details like name, version, build information, contact details etc.
Taints and Tolerations
Taints and Tolerations have nothing to do with security or intrusion on the cluster.
Taints and Tolerations are used to set restrictions on what pod can be scheduled on a node.
Example:
Suppose we have 1 worker node and 3 pods (A, B, C) that need to be scheduled on the given node.
First, we prevent all pods from being placed on the node by placing a taint on the node.
By default, pods have no tolerations, which means unless specified otherwise, none of the pods can tolerate any taint. So in this case, none of the pods can be placed on the node, as none of them tolerates the taint.
Next, we want to schedule/place pod C on the given node, so we add toleration to pod C.
So now when the scheduler tries to set pod C on the node, it goes through.
Taints are set on nodes and Tolerations are set on pods.
Taints and Tolerations do not tell the pod to go to a particular node. Instead, it tells the node to only accept pods with certain tolerations.
When the Kubernetes cluster is first set up, a taint is set on the master node automatically that prevents any pods from being scheduled on this node.
Taint Effects
The taint effect defines what would happen to the pods if they do not tolerate the taint.
There are 3 taint effects.
NoSchedule: The pods will not be scheduled on the node.
PrefereNoSchedule: The system will try to avoid placing a pod on the node but that is not guaranteed.
NoExecute: New pods will not be scheduled on the node and existing pods on the node, if any, will be evicted if they do not tolerate the taint. These pods may have been scheduled on the node before the taint was applied on the node.
Taint Commands
kubectl taint nodes <node_name> <key=value:taint-effect>
To taint a node.
kubectl taint nodes node01 app=blue:NoSchedule
kubectl taint nodes <node_name> <key=value:taint-effect->
To remove taint from a node.
kubectl taint nodes node01 app=blue:NoSchedule-
Add Tolerations
In the spec section of the pod definition file, add a section called tolerations and move the same values used while creating the taint.
All of these values need to be encoded in double-quotes.
Node Selectors
This is a simple Pod scheduling feature that allows scheduling a Pod onto a node whose labels match the
nodeSelector
labels specified in the Pod definition file.To use labels in a nodeSelector, you must have first labelled your nodes before creating the pod.
Node Selectors have limitations, you cannot provide advanced expressions like or, not with it.
Label Nodes
kubectl label nodes <node_name> <label_key>=<label_value>
Ex.
kubectl label nodes node01 size=large
Node Affinity
- This is the enhanced version of the
nodeSelector
which offers a more expressive syntax for fine-grained control of how Pods are scheduled to specific nodes.
Node Affinity Types
Available
requiredDuringSchedulingIgnoredDuringExecution
preferredDuringSchedulingIgnoredDuringExection
Planned (may come in future)
- requiredDuringSchedulingRequiredDuringExecution
Resource Requirements and Limits
Resource Requests
Kubernetes defines requests as a guaranteed minimum amount of a resource to be used by a container.
It will set the minimum amount of the resource for the container to consume.
Resource Limits
Kubernetes defines limits as a maximum amount of a resource to be used by a container.
This means that the container can never consume more than the CPU amount indicated.
But it can consume more memory than the limit and will ultimately throw OOM (Out Of Memory) error. Also known as Exceed Limits.
Default Behavior
By default, Kubernetes does not have a CPU or memory request or limit set.
This means any pod can consume as many resources as required on any node and suffocate other pods or processes that are running on the node of resources.
The most ideal behaviour is to set the requests and no limits for all the pods/containers in a cluster as it will let the containers which have extra resource requirements can use the resources set for the other containers if they are not using them.
Limit Range
- Limits Ranges can help you define default values to be set for containers in pods that are created without a request or limit specified in the pod-definition files.
Resource Quotas
- Resource Quota is a namespace-level object that can be created to set hard limits for requests and limits.
Daemon Sets
Daemon Sets are like ReplicaSets, it helps you deploy multiple instances of pods. But it runs one copy of your pod on each node in your cluster.
Whenever a new node is added to the cluster, a replica of the pod is automatically added to that node. And when a node is removed, the pod is automatically removed.
The Daemon Sets ensure that one copy of the pod is always present in all nodes in the cluster.
Ex. Say you need to deploy a monitoring agent or logger on each of your nodes in the cluster, A DaemonSet is perfect for that.
DaemonSet definition file has an almost exact structure like ReplicaSet, except that the kind is a DaemonSet.
Daemon Set Commands
kubectl create -f daemonset-definition.yml
- To create a daemon set.
kubectl get daemonset | daemonset | ds
- To get the list of created daemon sets.
kubectl delete daemonset <daemonset_name>
- To delete the defined daemon set with all the underlying pods.
kubectl describe daemonset <daemonset_name>
- To describe the given daemon set.
Static Pods
The pods that are created by the kubelet on their own, without intervention from the API server or the rest of the Kubernetes cluster components, are known as Static Pods.
For this, we have to place the pod-definition files in the designated directory. Kubelet periodically checks the directory and creates the pods and manages it as well.
The kubelet agent is responsible to watch each static Pod and restart it if it crashes.
You can only create pods this way. You cannot create ReplicaSet, Deployments or Services.
The kubelet works at a pod level and can only understand pods, which is why it can create static pods this way.
Designated Folder: It can be any directory on the host, and the location of that directory is passed into the kubelet as an option (
--pod -manifest-path
) while running the service OR pass the path of the file in--config
option and set the location path in that file with key staticPodPath.We can view the created static pods using
docker ps
command./var/lib/kubelet/config.yaml
- inside this config file, we get to see a static pod folder pathstaticPodPath: /etc/kubernetes/manifests
.Also, the static pods name are trailed by
controlplane
node name.
Static Pods VS Daemon Sets
Static PODs
Created by the kubelet.
Deploy Control Plane components as Static Pods.
Ignored by the Kube-Scheduler.
DaemonSets
Created by Kube-API server (DaemonSet-Controller).
Deploying monitoring agents, and logging agents on nodes.
Ignored by the Kube-Scheduler.
Subscribe to my newsletter
Read articles from Rohit Pagote directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Rohit Pagote
Rohit Pagote
I am an aspiring DevOps Engineer proficient with containers and container orchestration tools like Docker, Kubernetes along with experienced in Infrastructure as code tools and Configuration as code tools, Terraform, Ansible. Well-versed in CICD tool - Jenkins. Have hands-on experience with various AWS and Azure services. I really enjoy learning new things and connecting with people across a range of industries, so don't hesitate to reach out if you'd like to get in touch.