Taints and tolerations

When we run a cluster that we want to deploy multiple environments on, for example dev, testing, staging, production. Then we want our production server to be the strongest server with the most resources, while the dev server only has few resources. So when deploying, we will discover that we have a need that we do not want dev pods to be deployed to worker node production and vice versa.

Besides the scheduler automatically choosing which worker node to deploy the pod to, we can also control that job. These operations are called advanced scheduling, and in this article we will talk about techniques by which we can assign pods to the exact worker node we want. The first part looked at two properties that will help us in restricting pods deployed to the node.

Taints and tolerations

The first two advanced scheduling features we'll look at are taints and tolerations.

Taints are used to prohibit a pod from being deployed to a worker node that has taints. We will use the command to type taints into a worker node. For example, if we have a worker node cluster running a production environment, we will type taints on that worker node cluster, and the pod will no longer be able to be deployed to this worker node cluster. But if that's the case, then no pods can be deployed to this cluster, so wouldn't the worker nodes we type taints become redundant? The answer will be no.

Tolerations are used to assign to pods, and pods that are assigned tolerations that satisfy the taints condition will be deployed to worker nodes that have taints assigned. For example, for the worker node production cluster above, we assign taints to it, then we will assign tolerations to the production pods, now only production pods with appropriate tolerations can be deployed to the worker node production cluster. , but pod dev does not.

To make it easier to understand, we will take a look at some of the default taints and tolerations of kubernetes.

Node's Taints

In kubernetes cluster, if you pay attention, you will see that by default our pods cannot be deployed to the master node, but only system pods when we create a kubernetes cluster can be deployed to the master node. node. The reason is because the master node has been taints, and system pods are assigned with tolerations that match the master node's taints, so only system pods can be deployed to the master node.

For example, I have an environment deployed using the kubeadm tool .

$ kubectl get node
dev-node                      Ready    <none>                 246d   v1.22.3
production-node               Ready    <none>                 302d   v1.20.2
kube-master                   Ready    control-plane,master   304d   v1.20.1

We can check the taints of the master node as follows:

$  kubectl describe node kube-master
Name:               kube-master
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kube-master
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node-role.kubernetes.io/master=
Annotations:        flannel.alpha.coreos.com/backend-data:
...
Taints:             node-role.kubernetes.io/master:NoSchedule

In the Taints field, you will see it has a value of node-role.kubernetes.io/master:NoSchedule, this is the default taints typed into the master node. Taints will have the following format <key>=<value>:<effect>, with the value above we will have key as role.kubernetes.io/master, value as null, and effect as NoSchedule.

This taints value will prohibit normal pods from being able to deploy to it, and only system pods with the tolerations value role.kubernetes.io/master=:NoSchedulecan deploy to it.

Pod's tolerations

You list the pods located in the namespace kube-system, and check to see if their tolerations are correct as we said.

$ kubectl get pod -n kube-system
configmapwatchers-controller-7d577784d-wl7wz   1/1     Running   5246       212d
coredns-5f4c5f68f9-mt2gz                       1/1     Running   2          246d
coredns-5f4c5f68f9-n4tc6                       1/1     Running   2          246d
etcd-kube-master                               1/1     Running   0          304d
kube-apiserver-kube-master                     1/1     Running   0          304d
...

These are the system pods created when I use the kubeadm tool. Let's describe the pod kube-apiserver-kube-master to see its tolerations:

$ kubectl describe pod -n kube-system
...
Tolerations:  node-role.kubernetes.io/master=:NoSchedule
              node.alpha.kubernetes.io/notReady=:Exists:NoExecute
              node.alpha.kubernetes.io/unreachable=:Exists:NoExecute
...

You will see that in the Tolerations field it has a value of node-role.kubernetes.io/master=:NoSchedule, this is the tolerations value so this pod can be deployed to the master node. You will notice that in tolerations we have an additional = sign, so taints and tolerations will have different ways of displaying this value when value = null.

Taint effect

In taints and tolerations, the value of key and value is quite easy to understand, they are just strings. But the effect values ​​are different, we will have the following effect values:

  • NoSchedule: when this value is assigned to taints, it will not allow pods to be deployed to it.

  • PreferNoSchedule: this value has the same effect as NoSchedule, but the difference is that if the Pod cannot be deployed to any node, but the node with the effect of PreferNoSchedule has enough resources to run the pod, then the pod can be deployed. deploy to it.

  • NoExecute: unlike NoSchedule and PreferNoSchedule, which only take effect when the pod is scheduled, this effect value will affect all pods running on the node. If the pod does not have tolerations that match the taints, it will also be removed. that node. For example, we have a pod running on a woker node, then we taints that node with the NoExecute effect, then any pods running on that node without appropriate tolerations will be removed from the pod.

Add Taints to node

As mentioned above, we can add taints to a node, take the example of the dev and production environment above, the production worker node cluster we will type taints for it as follows:

$ kubectl taint node production-node node-type=production:NoSchedule
node "production-node" tainted

Now when we create the pod, we will see that the entire pod will be deployed to the worker node dev.

$ kubectl run test --image busybox --replicas 5 -- sleep 99999
deployment/test created

$ kubectl get pod -o wide
NAME               READY  STATUS   RESTARTS  AGE  IP         NODE
test-196686-46ngl  1/1    Running  0         12s  10.47.0.1  dev-node
test-196686-73p89  1/1    Running  0         12s  10.47.0.7  dev-node
test-196686-77280  1/1    Running  0         12s  10.47.0.6  dev-node
test-196686-h9m8f  1/1    Running  0         12s  10.47.0.5  dev-node
test-196686-p85ll  1/1    Running  0         12s  10.47.0.4  dev-node

To deploy the pod to the production node, we add tolerations to it, create a file called production-deployment.yaml with the following configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: prod
  template:
    metadata:
      labels:
        app: prod
    spec:
      containers:
        - image: luksa/kubia:v1
          name: nodejs
          resources:
            requests:
              cpu: 100m
      tolerations:
        - key: node-type
          Operator: Equal
          value: production
          effect: NoSchedule

We create this deployment and list its pods, we will see that there are pods that will be deployed to the production node.

$ kubectl apply -f production-deployment.yaml
deployment/prod created

$ kubectl get po -o wide
NAME               READY  STATUS   RESTARTS  AGE  IP          NODE
prod-350605-1ph5h  1/1    Running  0         12s  10.47.0.11  production-node
prod-350605-73p89  1/1    Running  0         12s  10.47.0.17  production-node
prod-350605-k7c8g  1/1    Running  0         12s  10.47.0.16  production-node
prod-350605-h9m8f  1/1    Running  0         12s  10.47.0.15  production-node
prod-350605-p85ll  1/1    Running  0         12s  10.47.0.14  production-node

Understand how to use Taints and tolerations

A node can have many taints and a pod can also declare many tolerations. Taints have key and effect values ​​that are mandatory, while value is optional. We specify tolerations for the pod with the Operator being Equal (default) or Exists.

The first way we use taints and tolerations is the example above of a kubernetes cluster with multiple environments.

The second way we can use taints and tolerations is to configure when a node dies, how often pods on that node will be rescheduled and deployed to another node, using the NoExecute effect. For example, when we describe a pod, we can see that it has some default tolerations as follows:

...
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
...

This value means that when kubernetes Control Plane detects a dead node, it will give taints to that node node.kubernetes.io/not-ready:NoExecute, and then pod tolerations on this dead node are only valid for 300s, after this time tolerations will be is removed from the current pod, and thus it is removed from the node because there are no longer suitable tolerations.

Last updated