Manual Scheduling

Every pod has a field named nodeName, which by default is not set. Kubernetes adds it automatically.
The scheduler goes through all the pods and looks for those that do not have this property set. Those are the candidates for scheduling.
It then identifies the right node for the pod by running a scheduling algorithm.
Once identified, it schedules the pod on the node by setting the nodeName property to the name of the node by creating a binding object.
If there is no scheduler to schedule nodes, the pods continue to be in a pending state.
In such cases, you can manually assign pods to nodes yourself.
Without a scheduler, the easiest way to schedule a pod is to simply set the nodeName field to the name of the node in your pod specification file while creating the pod.
Kubernetes won't allow to modify the nodeName property of a pod. So another way to assign a node to an existing pod is to create a binding object and send a POST request to the pod's binding API.

Labels and Selectors

Labels

Labels are properties attached to each item.
In a pod definition file, under metadata, create a section called labels.
Under that, add the labels in a key-value format.
You can add as many labels as you like.

Selectors

Once the pod is created, to select the pod with the labels, use the kubectl get pods command along with the --selector option and specify the label.

Use case of Labels and Selectors

Kubernetes objects use labels and selectors internally to connect different objects.
Ex. for ReplicaSet, to connect the ReplicaSet to the pod, we configure the selector field under the ReplicaSet specification to match the labels defined on the pod.

Annotations

While labels and selectors are used to group and select objects, annotations are used to record other details for informatory purposes.
Ex. tool details like name, version, build information, contact details etc.

Taints and Tolerations

Taints and Tolerations have nothing to do with security or intrusion on the cluster.
Taints and Tolerations are used to set restrictions on what pod can be scheduled on a node.
Example:
- Suppose we have 1 worker node and 3 pods (A, B, C) that need to be scheduled on the given node.
- First, we prevent all pods from being placed on the node by placing a taint on the node.
- By default, pods have no tolerations, which means unless specified otherwise, none of the pods can tolerate any taint. So in this case, none of the pods can be placed on the node, as none of them tolerates the taint.
- Next, we want to schedule/place pod C on the given node, so we add toleration to pod C.
- So now when the scheduler tries to set pod C on the node, it goes through.
- Taints are set on nodes and Tolerations are set on pods.
- Taints and Tolerations do not tell the pod to go to a particular node. Instead, it tells the node to only accept pods with certain tolerations.
- When the Kubernetes cluster is first set up, a taint is set on the master node automatically that prevents any pods from being scheduled on this node.

Taint Effects

The taint effect defines what would happen to the pods if they do not tolerate the taint.
There are 3 taint effects.
1. NoSchedule: The pods will not be scheduled on the node.
2. PrefereNoSchedule: The system will try to avoid placing a pod on the node but that is not guaranteed.
3. NoExecute: New pods will not be scheduled on the node and existing pods on the node, if any, will be evicted if they do not tolerate the taint. These pods may have been scheduled on the node before the taint was applied on the node.

Taint Commands

kubectl taint nodes <node_name> <key=value:taint-effect>
- To taint a node.
- kubectl taint nodes node01 app=blue:NoSchedule
kubectl taint nodes <node_name> <key=value:taint-effect->
- To remove taint from a node.
- kubectl taint nodes node01 app=blue:NoSchedule-

Add Tolerations

In the spec section of the pod definition file, add a section called tolerations and move the same values used while creating the taint.
All of these values need to be encoded in double-quotes.

Node Selectors

This is a simple Pod scheduling feature that allows scheduling a Pod onto a node whose labels match the nodeSelector labels specified in the Pod definition file.
To use labels in a nodeSelector, you must have first labelled your nodes before creating the pod.
Node Selectors have limitations, you cannot provide advanced expressions like or, not with it.

Label Nodes

kubectl label nodes <node_name> <label_key>=<label_value>
Ex. kubectl label nodes node01 size=large

Node Affinity

This is the enhanced version of the nodeSelector which offers a more expressive syntax for fine-grained control of how Pods are scheduled to specific nodes.

Node Affinity Types

Available
1. requiredDuringSchedulingIgnoredDuringExecution
2. preferredDuringSchedulingIgnoredDuringExection
Planned (may come in future)
1. requiredDuringSchedulingRequiredDuringExecution

Resource Requirements and Limits

Resource Requests

Kubernetes defines requests as a guaranteed minimum amount of a resource to be used by a container.
It will set the minimum amount of the resource for the container to consume.

Resource Limits

Kubernetes defines limits as a maximum amount of a resource to be used by a container.
This means that the container can never consume more than the CPU amount indicated.
But it can consume more memory than the limit and will ultimately throw OOM (Out Of Memory) error. Also known as Exceed Limits.

Default Behavior

By default, Kubernetes does not have a CPU or memory request or limit set.
This means any pod can consume as many resources as required on any node and suffocate other pods or processes that are running on the node of resources.
The most ideal behaviour is to set the requests and no limits for all the pods/containers in a cluster as it will let the containers which have extra resource requirements can use the resources set for the other containers if they are not using them.

Limit Range

Limits Ranges can help you define default values to be set for containers in pods that are created without a request or limit specified in the pod-definition files.

Resource Quotas

Resource Quota is a namespace-level object that can be created to set hard limits for requests and limits.

Daemon Sets

Daemon Sets are like ReplicaSets, it helps you deploy multiple instances of pods. But it runs one copy of your pod on each node in your cluster.
Whenever a new node is added to the cluster, a replica of the pod is automatically added to that node. And when a node is removed, the pod is automatically removed.
The Daemon Sets ensure that one copy of the pod is always present in all nodes in the cluster.
Ex. Say you need to deploy a monitoring agent or logger on each of your nodes in the cluster, A DaemonSet is perfect for that.
DaemonSet definition file has an almost exact structure like ReplicaSet, except that the kind is a DaemonSet.

Daemon Set Commands

kubectl create -f daemonset-definition.yml
- To create a daemon set.
kubectl get daemonset | daemonset | ds
- To get the list of created daemon sets.
kubectl delete daemonset <daemonset_name>
- To delete the defined daemon set with all the underlying pods.
kubectl describe daemonset <daemonset_name>
- To describe the given daemon set.

Static Pods

The pods that are created by the kubelet on their own, without intervention from the API server or the rest of the Kubernetes cluster components, are known as Static Pods.
For this, we have to place the pod-definition files in the designated directory. Kubelet periodically checks the directory and creates the pods and manages it as well.
The kubelet agent is responsible to watch each static Pod and restart it if it crashes.
You can only create pods this way. You cannot create ReplicaSet, Deployments or Services.
The kubelet works at a pod level and can only understand pods, which is why it can create static pods this way.
Designated Folder: It can be any directory on the host, and the location of that directory is passed into the kubelet as an option (--pod -manifest-path) while running the service OR pass the path of the file in --config option and set the location path in that file with key staticPodPath.
We can view the created static pods using docker ps command.
/var/lib/kubelet/config.yaml - inside this config file, we get to see a static pod folder path staticPodPath: /etc/kubernetes/manifests.
Also, the static pods name are trailed by controlplane node name.

Static Pods VS Daemon Sets

Static PODs
- Created by the kubelet.
- Deploy Control Plane components as Static Pods.
- Ignored by the Kube-Scheduler.
DaemonSets
- Created by Kube-API server (DaemonSet-Controller).
- Deploying monitoring agents, and logging agents on nodes.
- Ignored by the Kube-Scheduler.

Admission Controllers

When we run a kubectl command, we know that the request goes to the KubeAPI Server, and then the pod is created, and the information is finally persisted or entered in the ETCD database.
When request hits the KubeAPI Server, it first goes through an Authentication process which is usually done through certificates.
Then the request goes through an Authorization process where we check if the user has permission to perform that operation, and we achieved this using RBAC.
With RBAC, we can:
- can list/create/delete the pods/deployments/services…
- can restrict the access to a specific namespace or resource name..
Most of this rules that we can create with RBAC are at the Kubernetes API level (what user is allowed access to what kind of API operations) and it does not go beyond that.
For ex: When a pod creation request comes in, we’d like to review the configuration file and decide:
- Only allow/permit images from a specific internal registry.
- Enforce to never use the latest tag for any images.
- Do not permit runAs root user.
- Only permit certain capabilites.
- Pod always has labels.
- This are some of many things that we can’t achieve using existing RBAC, and that is where Admission Controller comes in.

Admission Controller

Admission Controller helps to implement better security measures to enforce how a cluster is used.
Apart from simply validating configuration, admission controllers can do a lot more, such as change the request itself, perform additional operations before the pod gets created, etc.
There are number of admission controller that come pre-built with Kubernetes such as,
- AlwaysPullImages: It ensures that every time a pod is created, the images are always pulled.
- DefaultStorageClass: It observers the creation of PVCs and automatically adds a default storage class to them, if not specified.
- EventRateLimit: It helps to set a limit on the requests that the API server can handle at a time to prevent the API server from flooding with requests.
- NamespaceExists: It rejects requests to namespaces that do not exist (enabled by default).
  - Ex: We are trying to create a pod in a namespace that doesn’t exist, the NamespaceExists admission controller will throw an error saying Namespace not found.
- NamespaceAutoProvision: It automatically create the namespace if it does not exist.
  - Ex: We are trying to create a pod in a namespace that doesn’t exist, the NamespaceAutoProvision admission controller firstly create that namespace and then create a pod inside that namespace without throwing any error.
To see a list of admission controllers enabled by default, run

kubectl exec -it kube-apiserver-controlplane -n kube-system -- kube-apiserver -h | grep 'enable-admission-plugins'
To enable admission controller, update the --enable-admission-plugins flag on the KubeAPI server service.
To disable admission controller, update the --disable-admission-plugins flag on the KubeAPI server service
Admission controller can not only validates and rejects requests from users, but it can also perform operations in the back end or change the request itself.

Note

NamespaceExists and NamespaceAutoProvision admission controllers are deprecated and now replaced by NamespaceLifecycle admission controller.
The NamespaceLifecycle admission controller will make sure that requests to a non-existent namespace is rejected and that the default namespaces such as default, kube-system and kube-public cannot be deleted.

Types of Admission Controllers

There are 2 types of admission controllers:
1. Validating admission controllers:
  - This controllers validates the request and allow or deny it.
  - Ex: NamespaceExists or NamespaceLifecycle admission controller helps validate if a namespace already exists and reject the request if it doesn’t exists.
2. Mutating admission controllers:
  - This controllers change the request.
  - Ex: DefaultStorageClass (enabled by default): It observers the creation of PVCs and automatically adds a default storage class to them, if not specified. We can view it by preforming describe on PVC.
There can be admission controllers that can do both, that can mutate the request as well as validate a request.
Generally, mutating admission controllers are invoked first followed by validating admission controllers. This is so that any change made by the mutating controller can be considered during the validation process.

External Admission Controllers

We can also have our own admission controllers with our own mutation and validation logic.
To support external admission controllers, there are two special admission controllers available: MutatingAdmissionWebhook and ValidatingAdmissionWebhook.
We can configure these webhooks to point to a server that is hosted either within the Kubernetes cluster or outside it.
Our server will have our own admission webhook service running its own code and logic.
After a request goes through all the built-in admission controllers, it hits the webhook that’s configured.
Once it hits the webhook, it makes a call to the admission webhook server by passing in an admission review object in a JSON format.
This object has all the details about the request, such as the user that made the request and the type of operation the user is trying to perform, and on what objects and details about the object itself.
On receiving the request, the admission webhook server responds with an admission review object with a result of whether the request is allowed or not.
If the allowed field in the response object is set to true, then the request is allowed. And if it’s set to false, it is rejected.
How to set all this?
1. We must deploy our admission webhook server, which will have our own logic.
2. Then we configure the webhook on Kubernetes by creating a WebhookConfiguration object.

Steps to setup External Admission Controllers:

Develop Admission Webhook Server
- The first step is to deploy our own webhook server.
- Now this could be an API server that could be build on any platform.
- We can develop our own server in any of the programming languages, the only requirement is that it must accept the mutate and validate APIs and respond with a JSON object that the web server expects.
- Example of validate request:
  - The validate call receives the validation webhook request and compares the name of the object and the name of the user who sent the request and rejects the request it if’s the same name.

This mutate call receives the mutating webhook request, which gets the username and responds with a JSON patch operation of adding the username as a label to any request that was raised by anyone.

Host/Deploy Admission Webhook Server
- Once we have developed our own admission webhook server, the next step is to host it.
- So we can either run it as a server somewhere or containerize it and deploy it within the Kubernetes cluster itself as a deployment (webhook-deployment).
- If deployed as a deployment in a cluster, then it needs a service for it to be accessed (webhook-service).
Configure Admission Webhook
- The next step is to configure our cluster to reach out to the service and validate or mutate the requests.
- For this, we create a ValidatingWebhookConfiguration object in Kubernetes. If we are configuring a mutating webhook, then this would be a MutatingWebhookConfiguration object.
- And that should be it. Once this object is created, every time we create a pod, a call would be made to a webhook service and depending on the response, it would be allowed or rejected.

02 - Scheduling in Kubernetes

Table of contents

Manual Scheduling

Labels and Selectors

Labels

Selectors

Use case of Labels and Selectors

Annotations

Taints and Tolerations

Taint Effects

Taint Commands

Add Tolerations

Node Selectors

Label Nodes

Node Affinity

Node Affinity Types

Resource Requirements and Limits

Resource Requests

Resource Limits

Default Behavior

Limit Range

Resource Quotas

Daemon Sets

Daemon Set Commands

Static Pods

Static Pods VS Daemon Sets

Admission Controllers

Admission Controller

Note

Types of Admission Controllers

External Admission Controllers

Steps to setup External Admission Controllers:

Subscribe to my newsletter

Rohit Pagote

Rohit Pagote