Topic 3: Kubernetes Affinity (Part 1 Taints and Tolerations)
Kubernetes (K8s) scheduler often uses simple rules based on resource availability to place pods on nodes. What if you would want to specify your own rule where the pods go. That’s where Kubernetes affinity and anti-affinity come in. They are advanced K8s scheduling techniques that can help you create flexible scheduling policies.
In general, affinity enables the Kubernetes scheduler to place a pod either on a group of nodes or a pod relative to the
placement of other pods. To control pod placements on a group of nodes, a user needs to use node affinity rules. In contrast,
pod affinity or pod anti-affinity rules provide the ability to control pod placements relative to other pods.
Lets look into the different affinity in Kubernetes
Technique | Summary |
Taints and Tolerations | Allowing a Node to control which pods can be run on them and which pods will be repelled. |
NodeSelector | Assigning a Pod to a specific Node using Labels |
Node Affinity | Similar to NodeSelector but flexible such as adding “Required” and “Preferred” Rules |
Pod Affinity and Anti-Affinity | Co-locating Pods or placing Pods away from each other based on Affinity and Anti-Affinity Rules |
Taints and Tolerations
Taints, nodes have control over pod placement. Taints allow nodes to define which pods can be placed on them and which pods are repelled away from them.
For example, suppose you have a node with special hardware and want the scheduler only to deploy pods requiring the special hardware.
You can use Tolerations for the node’s Taints to meet this requirement.
The pods that require special hardware must define toleration for the Taints on those nodes. When you taint a node, it will repel all the pods except those that have a toleration for that taint. A node can have one or many taints associated with it.
A taint can produce three possible outcome:
NoSchedule-The Kubernetes scheduler will only allow scheduling pods that have tolerations for the tainted nodes.
PreferNoSchedule-The Kubernetes scheduler will try to avoid scheduling pods that don’t have tolerations for the tainted nodes.
NoExecute-Kubernetes will evict the running pods from the nodes if the pods don’t have tolerations for the tainted nodes.
For example
If you need to dedicate a group of worker nodes for a set of users, you can add a taint to those nodes, such as by using this command:
kubectl taint nodes nodename dedicated=groupName:NoSchedule
For specialized hardware
kubectl taint nodes nodename special=true:NoSchedule
or
kubectl taint nodes nodename special=true:PreferNoSchedule
How to use Taint and Toleration
Let’s assume that we need to deploy the front-end application pods so that they are placed only on front-end nodes. We also must ensure that new pods are not scheduled into master nodes because those nodes run control plane components such as etcd.
Listing the nodes
kubectl get nodes -o=custom-columns=NodeName:.metadata.name,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value,TaintEffect:.spec.taints[*].effect
NodeName TaintKey TaintValue TaintEffect
cluster01-master-1 node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd true,true NoSchedule,NoExecute
cluster01-master-2 node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd true,true NoSchedule,NoExecute
cluster01-master-3 node-role.kubernetes.io/controlplane,node-role.kubernetes.io/etcd true,true NoSchedule,NoExecute
cluster01-worker-1 <none> <none> <none>
Lets taint worker-1 node
kubectl taint nodes cluster01-worker-1 app=frontend:NoSchedule
node/cluster01-worker-1 tainted
So let’s say you would want to deploy a pod to cluster01-worker-1. Notice the tolerations section of the pod spec: We have added a toleration for the taint so that the pod can be scheduled on the worker node.
kubectl edit deployment nginx -n frontend
deployment.apps/nginx edited
kubectl get deployment nginx -n frontend -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
creationTimestamp: "2024-11-14T09:39:37Z"
generation: 3
labels:
run: nginx
name: nginx
namespace: frontend
resourceVersion: "13368509"
selfLink: /apis/apps/v1/namespaces/frontend/deployments/nginx
uid: f56f026f-3a92-4bbc-c185-3110426bba335
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 6
selector:
matchLabels:
run: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
run: nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: app
operator: Equal
value: frontend
By running pod’s status and events we shall notice the pod is deployed on the node
kubectl get events -n frontend
How do we untaint a node?
We can use kubectl taint but adding an hyphen at the end to remove the taint (untaint the node):
kubectl taint nodes cluster01-worker-1 app=frontend:NoSchedule~
So what happened to the pod then, well it will be evicted
NoExecute effect
If a pod has no toleration for the taint, it will be evicted immediately. If a pod has a toleration for the taint, but it doesn't specify tolerationSeconds, it will stay bound to the node forever. If a pod has a toleration for the taint and it does specify tolerationSeconds, it will stay bound for that amount of time.
Other taints
If a pod has a toleration that matches the taint on the node, the pod can be scheduled on the node.
Subscribe to my newsletter
Read articles from Kev directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by