02 - Scheduling in Kubernetes
![Rohit Pagote](https://cdn.hashnode.com/res/hashnode/image/upload/v1687262399134/4c8e2338-7b23-4ad0-9e71-d71f2cc73de5.png)
Manual Scheduling
Every pod has a field named
nodeName
, which by default is not set. Kubernetes adds it automatically.The scheduler goes through all the pods and looks for those that do not have this property set. Those are the candidates for scheduling.
It then identifies the right node for the pod by running a scheduling algorithm.
Once identified, it schedules the pod on the node by setting the
nodeName
property to the name of the node by creating a binding object.If there is no scheduler to schedule nodes, the pods continue to be in a pending state.
In such cases, you can manually assign pods to nodes yourself.
Without a scheduler, the easiest way to schedule a pod is to simply set the
nodeName
field to the name of the node in your pod specification file while creating the pod.Kubernetes won't allow to modify the
nodeName
property of a pod. So another way to assign a node to an existing pod is to create a binding object and send a POST request to the pod's binding API.
Labels and Selectors
Labels
Labels are properties attached to each item.
In a pod definition file, under metadata, create a section called labels.
Under that, add the labels in a key-value format.
You can add as many labels as you like.
Selectors
- Once the pod is created, to select the pod with the labels, use the
kubectl get pods
command along with the--selector
option and specify the label.
Use case of Labels and Selectors
Kubernetes objects use labels and selectors internally to connect different objects.
Ex. for ReplicaSet, to connect the ReplicaSet to the pod, we configure the selector field under the ReplicaSet specification to match the labels defined on the pod.
Annotations
While labels and selectors are used to group and select objects, annotations are used to record other details for informatory purposes.
Ex. tool details like name, version, build information, contact details etc.
Taints and Tolerations
Taints and Tolerations have nothing to do with security or intrusion on the cluster.
Taints and Tolerations are used to set restrictions on what pod can be scheduled on a node.
Example:
Suppose we have 1 worker node and 3 pods (A, B, C) that need to be scheduled on the given node.
First, we prevent all pods from being placed on the node by placing a taint on the node.
By default, pods have no tolerations, which means unless specified otherwise, none of the pods can tolerate any taint. So in this case, none of the pods can be placed on the node, as none of them tolerates the taint.
Next, we want to schedule/place pod C on the given node, so we add toleration to pod C.
So now when the scheduler tries to set pod C on the node, it goes through.
Taints are set on nodes and Tolerations are set on pods.
Taints and Tolerations do not tell the pod to go to a particular node. Instead, it tells the node to only accept pods with certain tolerations.
When the Kubernetes cluster is first set up, a taint is set on the master node automatically that prevents any pods from being scheduled on this node.
Taint Effects
The taint effect defines what would happen to the pods if they do not tolerate the taint.
There are 3 taint effects.
NoSchedule: The pods will not be scheduled on the node.
PrefereNoSchedule: The system will try to avoid placing a pod on the node but that is not guaranteed.
NoExecute: New pods will not be scheduled on the node and existing pods on the node, if any, will be evicted if they do not tolerate the taint. These pods may have been scheduled on the node before the taint was applied on the node.
Taint Commands
kubectl taint nodes <node_name> <key=value:taint-effect>
To taint a node.
kubectl taint nodes node01 app=blue:NoSchedule
kubectl taint nodes <node_name> <key=value:taint-effect->
To remove taint from a node.
kubectl taint nodes node01 app=blue:NoSchedule-
Add Tolerations
In the spec section of the pod definition file, add a section called tolerations and move the same values used while creating the taint.
All of these values need to be encoded in double-quotes.
Node Selectors
This is a simple Pod scheduling feature that allows scheduling a Pod onto a node whose labels match the
nodeSelector
labels specified in the Pod definition file.To use labels in a
nodeSelector
, you must have first labelled your nodes before creating the pod.Node Selectors have limitations, you cannot provide advanced expressions like or, not with it.
Label Nodes
kubectl label nodes <node_name> <label_key>=<label_value>
Ex.
kubectl label nodes node01 size=large
Node Affinity
- This is the enhanced version of the
nodeSelector
which offers a more expressive syntax for fine-grained control of how Pods are scheduled to specific nodes.
Node Affinity Types
Available
requiredDuringSchedulingIgnoredDuringExecution
preferredDuringSchedulingIgnoredDuringExection
Planned (may come in future)
- requiredDuringSchedulingRequiredDuringExecution
Resource Requirements and Limits
Resource Requests
Kubernetes defines requests as a guaranteed minimum amount of a resource to be used by a container.
It will set the minimum amount of the resource for the container to consume.
Resource Limits
Kubernetes defines limits as a maximum amount of a resource to be used by a container.
This means that the container can never consume more than the CPU amount indicated.
But it can consume more memory than the limit and will ultimately throw OOM (Out Of Memory) error. Also known as Exceed Limits.
Default Behavior
By default, Kubernetes does not have a CPU or memory request or limit set.
This means any pod can consume as many resources as required on any node and suffocate other pods or processes that are running on the node of resources.
The most ideal behaviour is to set the requests and no limits for all the pods/containers in a cluster as it will let the containers which have extra resource requirements can use the resources set for the other containers if they are not using them.
Limit Range
- Limits Ranges can help you define default values to be set for containers in pods that are created without a request or limit specified in the pod-definition files.
Resource Quotas
- Resource Quota is a namespace-level object that can be created to set hard limits for requests and limits.
Daemon Sets
Daemon Sets are like ReplicaSets, it helps you deploy multiple instances of pods. But it runs one copy of your pod on each node in your cluster.
Whenever a new node is added to the cluster, a replica of the pod is automatically added to that node. And when a node is removed, the pod is automatically removed.
The Daemon Sets ensure that one copy of the pod is always present in all nodes in the cluster.
Ex. Say you need to deploy a monitoring agent or logger on each of your nodes in the cluster, A DaemonSet is perfect for that.
DaemonSet definition file has an almost exact structure like ReplicaSet, except that the kind is a DaemonSet.
Daemon Set Commands
kubectl create -f daemonset-definition.yml
- To create a daemon set.
kubectl get daemonset | daemonset | ds
- To get the list of created daemon sets.
kubectl delete daemonset <daemonset_name>
- To delete the defined daemon set with all the underlying pods.
kubectl describe daemonset <daemonset_name>
- To describe the given daemon set.
Static Pods
The pods that are created by the kubelet on their own, without intervention from the API server or the rest of the Kubernetes cluster components, are known as Static Pods.
For this, we have to place the pod-definition files in the designated directory. Kubelet periodically checks the directory and creates the pods and manages it as well.
The kubelet agent is responsible to watch each static Pod and restart it if it crashes.
You can only create pods this way. You cannot create ReplicaSet, Deployments or Services.
The kubelet works at a pod level and can only understand pods, which is why it can create static pods this way.
Designated Folder: It can be any directory on the host, and the location of that directory is passed into the kubelet as an option (
--pod -manifest-path
) while running the service OR pass the path of the file in--config
option and set the location path in that file with key staticPodPath.We can view the created static pods using
docker ps
command./var/lib/kubelet/config.yaml
- inside this config file, we get to see a static pod folder pathstaticPodPath: /etc/kubernetes/manifests
.Also, the static pods name are trailed by
controlplane
node name.
Static Pods VS Daemon Sets
Static PODs
Created by the kubelet.
Deploy Control Plane components as Static Pods.
Ignored by the Kube-Scheduler.
DaemonSets
Created by Kube-API server (DaemonSet-Controller).
Deploying monitoring agents, and logging agents on nodes.
Ignored by the Kube-Scheduler.
Admission Controllers
When we run a
kubectl
command, we know that the request goes to the KubeAPI Server, and then the pod is created, and the information is finally persisted or entered in the ETCD database.When request hits the KubeAPI Server, it first goes through an Authentication process which is usually done through certificates.
Then the request goes through an Authorization process where we check if the user has permission to perform that operation, and we achieved this using RBAC.
With RBAC, we can:
can list/create/delete the pods/deployments/services…
can restrict the access to a specific namespace or resource name..
Most of this rules that we can create with RBAC are at the Kubernetes API level (what user is allowed access to what kind of API operations) and it does not go beyond that.
For ex: When a pod creation request comes in, we’d like to review the configuration file and decide:
Only allow/permit images from a specific internal registry.
Enforce to never use the latest tag for any images.
Do not permit runAs root user.
Only permit certain capabilites.
Pod always has labels.
This are some of many things that we can’t achieve using existing RBAC, and that is where Admission Controller comes in.
Admission Controller
Admission Controller helps to implement better security measures to enforce how a cluster is used.
Apart from simply validating configuration, admission controllers can do a lot more, such as change the request itself, perform additional operations before the pod gets created, etc.
There are number of admission controller that come pre-built with Kubernetes such as,
AlwaysPullImages
: It ensures that every time a pod is created, the images are always pulled.DefaultStorageClass
: It observers the creation of PVCs and automatically adds adefault
storage class to them, if not specified.EventRateLimit
: It helps to set a limit on the requests that the API server can handle at a time to prevent the API server from flooding with requests.NamespaceExists
: It rejects requests to namespaces that do not exist (enabled by default).- Ex: We are trying to create a pod in a namespace that doesn’t exist, the
NamespaceExists
admission controller will throw an error saying Namespace not found.
- Ex: We are trying to create a pod in a namespace that doesn’t exist, the
NamespaceAutoProvision
: It automatically create the namespace if it does not exist.- Ex: We are trying to create a pod in a namespace that doesn’t exist, the
NamespaceAutoProvision
admission controller firstly create that namespace and then create a pod inside that namespace without throwing any error.
- Ex: We are trying to create a pod in a namespace that doesn’t exist, the
To see a list of admission controllers enabled by default, run
kubectl exec -it kube-apiserver-controlplane -n kube-system -- kube-apiserver -h | grep 'enable-admission-plugins'
To enable admission controller, update the
--enable-admission-plugins
flag on the KubeAPI server service.To disable admission controller, update the
--disable-admission-plugins
flag on the KubeAPI server serviceAdmission controller can not only validates and rejects requests from users, but it can also perform operations in the back end or change the request itself.
Note
NamespaceExists
andNamespaceAutoProvision
admission controllers are deprecated and now replaced byNamespaceLifecycle
admission controller.The
NamespaceLifecycle
admission controller will make sure that requests to a non-existent namespace is rejected and that the default namespaces such as default, kube-system and kube-public cannot be deleted.
Types of Admission Controllers
There are 2 types of admission controllers:
Validating admission controllers:
This controllers validates the request and allow or deny it.
Ex:
NamespaceExists
orNamespaceLifecycle
admission controller helps validate if a namespace already exists and reject the request if it doesn’t exists.
Mutating admission controllers:
This controllers change the request.
Ex:
DefaultStorageClass
(enabled by default): It observers the creation of PVCs and automatically adds adefault
storage class to them, if not specified. We can view it by preforming describe on PVC.
There can be admission controllers that can do both, that can mutate the request as well as validate a request.
Generally, mutating admission controllers are invoked first followed by validating admission controllers. This is so that any change made by the mutating controller can be considered during the validation process.
External Admission Controllers
We can also have our own admission controllers with our own mutation and validation logic.
To support external admission controllers, there are two special admission controllers available:
MutatingAdmissionWebhook
andValidatingAdmissionWebhook
.We can configure these webhooks to point to a server that is hosted either within the Kubernetes cluster or outside it.
Our server will have our own admission webhook service running its own code and logic.
After a request goes through all the built-in admission controllers, it hits the webhook that’s configured.
Once it hits the webhook, it makes a call to the admission webhook server by passing in an admission review object in a JSON format.
This object has all the details about the request, such as the user that made the request and the type of operation the user is trying to perform, and on what objects and details about the object itself.
On receiving the request, the admission webhook server responds with an admission review object with a result of whether the request is allowed or not.
If the allowed field in the response object is set to true, then the request is allowed. And if it’s set to false, it is rejected.
How to set all this?
We must deploy our admission webhook server, which will have our own logic.
Then we configure the webhook on Kubernetes by creating a WebhookConfiguration object.
Steps to setup External Admission Controllers:
Develop Admission Webhook Server
The first step is to deploy our own webhook server.
Now this could be an API server that could be build on any platform.
We can develop our own server in any of the programming languages, the only requirement is that it must accept the mutate and validate APIs and respond with a JSON object that the web server expects.
Example of validate request:
- The validate call receives the validation webhook request and compares the name of the object and the name of the user who sent the request and rejects the request it if’s the same name.
- This mutate call receives the mutating webhook request, which gets the username and responds with a JSON patch operation of adding the username as a label to any request that was raised by anyone.
Host/Deploy Admission Webhook Server
Once we have developed our own admission webhook server, the next step is to host it.
So we can either run it as a server somewhere or containerize it and deploy it within the Kubernetes cluster itself as a deployment (webhook-deployment).
If deployed as a deployment in a cluster, then it needs a service for it to be accessed (webhook-service).
Configure Admission Webhook
The next step is to configure our cluster to reach out to the service and validate or mutate the requests.
For this, we create a
ValidatingWebhookConfiguration
object in Kubernetes. If we are configuring a mutating webhook, then this would be aMutatingWebhookConfiguration
object.And that should be it. Once this object is created, every time we create a pod, a call would be made to a webhook service and depending on the response, it would be allowed or rejected.
Subscribe to my newsletter
Read articles from Rohit Pagote directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
![Rohit Pagote](https://cdn.hashnode.com/res/hashnode/image/upload/v1687262399134/4c8e2338-7b23-4ad0-9e71-d71f2cc73de5.png)
Rohit Pagote
Rohit Pagote
I am an aspiring DevOps Engineer proficient with containers and container orchestration tools like Docker, Kubernetes along with experienced in Infrastructure as code tools and Configuration as code tools, Terraform, Ansible. Well-versed in CICD tool - Jenkins. Have hands-on experience with various AWS and Azure services. I really enjoy learning new things and connecting with people across a range of industries, so don't hesitate to reach out if you'd like to get in touch.