How to Use Kubernetes Taints and Tolerations to Avoid Undesirable Scheduling
Taints and tolerations are a Kubernetes mechanism for controlling how Pods schedule to the Nodes in your cluster. Taints are applied to Nodes and act as a repelling barrier against new Pods. Tainted Nodes will only accept Pods that have been marked with a corresponding toleration.
Taints are one of the more advanced Kubernetes scheduling mechanisms. They facilitate many different use cases where you want to prevent Pods ending up on undesirable Nodes. In this article, you’ll learn what taints and tolerations are and how you can utilize them in your own cluster.
How Scheduling Works
Kubernetes is a distributed system where you can deploy containerized applications (Pods) across multiple physical hosts (Nodes). When you create a new Pod, Kubernetes needs to determine the set of Nodes it can be placed on. This is what scheduling refers to.
The scheduler considers many different factors to establish a suitable placement for each Pod. It’ll default to selecting a Node that can provide sufficient resources to satisfy the Pod’s CPU and memory requests.
The selected Node won’t necessarily be appropriate for your deployment though. It could lack required hardware or be reserved for development use. Node taints are a mechanism for enforcing these constraints by preventing arbitrary assignation of Pods to Nodes.
Taint Use Cases
Tainting a Node means it will start to repel Pods, forcing the scheduler to consider the next candidate Node instead. You can overcome the taint by setting a matching toleration on the Pod. This provides a mechanism for allowing specific Pods onto the Node.
Taints are often used to keep Pods away from Nodes that are reserved for specific purposes. Some Kubernetes clusters might host several environments, such as staging and production. In this situation you’ll want to prevent staging deployments from ending up on the dedicated production hardware.
You can achieve the desired behavior by tainting the production Node and setting a matching toleration on production Pods. Staging Pods will be confined to the other Nodes in your cluster, preventing them from consuming production resources.
Taints can also help distinguish between Nodes with particular hardware. Operators might deploy a subset of Nodes with dedicated GPUs for use with AI workloads. Tainting these Nodes ensures Pods that don’t need the GPU can’t schedule onto them.
Taint Effects
Each Node taint can have one of three different effects on Kubernetes scheduling decisions:
NoSchedule
– Pods that lack a toleration for the taint won’t be scheduled onto the Node. Pods already scheduled to the Node aren’t affected, even if they don’t tolerate the taint.PreferNoSchedule
– Kubernetes will avoid scheduling Pods without the taint’s toleration. The Pod could still be scheduled to the Node as a last resort option. This does not affect existing Pods.NoExecute
– This functions similarly toNoSchedule
except that existing Pods are impacted too. Pods without the toleration will be immediately evicted from the Node, causing them to be rescheduled onto other Nodes in your cluster.
The NoExecute
effect is useful when you’re changing the role of a Node that’s already running some workloads. NoSchedule
is more appropriate if you want to guard the Node against receiving new Pods, without disrupting existing deployments.
Tainting a Node
Taints are applied to Nodes using the kubectl taint
command. It takes the name of the target Node, a key and value for the taint, and an effect.
Here’s an example of tainting a Node to allocate it to a specific environment:
$ kubectl taint nodes demo-node env=production:NoSchedule node/demo-node tainted
You can apply multiple taints to a Node by repeating the command. The key value is optional – you can create binary taints by omitting it:
$ kubectl taint nodes demo-node has-gpu:NoSchedule
To remove a previously applied taint, repeat the command but append a hyphen (-
) to the effect name:
$ kubectl taint nodes demo-node has-gpu:NoSchedule- node/demo-node untainted
This will delete the matching taint if it exists.
You can retrieve a list of all the taints applied to a Node using the describe
command. The taints will be shown near the top of the output, after the Node’s labels and annotations:
$ kubectl describe node demo-node Name: demo-node ... Taints: env=production:NoSchedule ...
Adding Tolerations to Pods
The example above tainted demo-node
with the intention of reserving it for production workloads. The next step is to add an equivalent toleration to your production Pods so that they’re permitted to schedule onto the Node.
Pod tolerations are declared in the spec.tolerations
manifest field:
apiVersion: v1 kind: Pod metadata: name: api spec: containers: - name: api image: example.com/api:latest tolerations: - key: env operator: Equals value: production effect: NoSchedule
This toleration allows the api
Pod to schedule to Nodes that have an env
taint with a value of production
and NoSchedule
as the effect. The example Pod can now be scheduled to demo-node
.
To tolerate taints without a value, use the Exists
operator instead:
apiVersion: v1 kind: Pod metadata: name: api spec: containers: - name: api image: example.com/api:latest tolerations: - key: has-gpu operator: Exists effect: NoSchedule
The Pod now tolerates the has-gpu
taint, whether or not a value has been set.
Tolerations do not require that the Pod is scheduled to a tainted Node. This is a common misconception around taints and tolerations. The mechanism only says that a Node can’t host a Pod; it does not express the alternative view that a Pod must be placed on a particular Node. Taints are commonly combined with affinities to achieve this bi-directional behavior.
Taint and Toleration Matching Rules
Tainted Nodes only receive Pods that tolerate all of their taints. Kubernetes first discovers the taints on the Node, then filters out taints that are tolerated by the Pod. The effects requested by the remaining set of taints will be applied to the Pod.
There’s a special case for the NoExecute
effect. Pods that tolerate this kind of taint will usually get to stay on the Node after the taint is applied. You can modify this behavior so that Pods are voluntarily evicted after a given time, despite tolerating the trait:
apiVersion: v1 kind: Pod metadata: name: api spec: containers: - name: api image: example.com/api:latest tolerations: - key: env operator: Equals value: production effect: NoExecute tolerationSeconds: 900
A Node that’s hosting this Pod but is subsequently tainted with env=production:NoExecute
will allow the Pod to remain present for up to 15 minutes after the taint’s applied. The Pod will then be evicted despite having the NoExecute
toleration.
Automatic Taints
Nodes are automatically tainted by the Kubernetes control plane to evict Pods and prevent scheduling when resource contention occurs. Taints such as node.kubernetes.io/memory-pressure
and node.kubernetes.io/disk-pressure
mean Kubernetes is blocking the Node from taking new Pods because it lacks sufficient resources.
Other commonly applied taints include node.kubernetes.io/not-ready
, when a new Node isn’t accepting Pods, and node.kubernetes.io/unschedulable
. The latter is applied to cordoned Nodes to halt all Pod scheduling activity.
These taints implement the Kubernetes eviction and Node management systems. You don’t normally need to think about them and you shouldn’t manage these taints manually. If you see them on a Node, it’s because Kubernetes has applied them in response to changing conditions or another command you’ve issued. It is possible to create Pod tolerations for these taints but doing so could lead to resource exhaustion and unexpected behavior.
Summary
Taints and tolerations are a mechanism for repelling Pods away from individual Kubernetes Nodes. They help you avoid undesirable scheduling outcomes by preventing Pods from being automatically assigned to arbitrary Nodes.
Tainting isn’t the only mechanism that provides control over scheduling behavior. Pod affinities and anti-affinities are a related technique for constraining the Nodes that can receive a Pod. Affinity can also be defined at an inter-Pod level, allowing you to make scheduling decisions based on the Pods already running on a Node. You can combine affinity with taints and tolerations to set up advanced scheduling rules.