Affinity Pc

0 views

Skip to first unread message

Práxedes Jamal

unread,

Aug 4, 2024, 6:27:32 PM8/4/24

to featoocarrenn

Youcan constrain a Pod so that it isrestricted to run on particular node(s),or to prefer to run on particular nodes.There are several ways to do this and the recommended approaches all uselabel selectors to facilitate the selection.Often, you do not need to set any such constraints; thescheduler will automatically do a reasonable placement(for example, spreading your Pods across nodes so as not place Pods on a node with insufficient free resources).However, there are some circumstances where you may want to control which nodethe Pod deploys to, for example, to ensure that a Pod ends up on a node with an SSD attached to it,or to co-locate Pods from two different services that communicate a lot into the same availability zone.

Adding labels to nodes allows you to target Pods for scheduling on specificnodes or groups of nodes. You can use this functionality to ensure that specificPods only run on nodes with certain isolation, security, or regulatoryproperties.

If you use labels for node isolation, choose label keys that the kubeletcannot modify. This prevents a compromised node from setting those labels onitself so that the scheduler schedules workloads onto the compromised node.

nodeSelector is the simplest recommended form of node selection constraint.You can add the nodeSelector field to your Pod specification and specify thenode labels you want the target node to have.Kubernetes only schedules the Pod onto nodes that have each of the labels youspecify.

nodeSelector is the simplest way to constrain Pods to nodes with specificlabels. Affinity and anti-affinity expands the types of constraints you candefine. Some of the benefits of affinity and anti-affinity include:

If you specify multiple expressions in a single matchExpressions field associated with aterm in nodeSelectorTerms, then the Pod can be scheduled onto a node onlyif all the expressions are satisfied (expressions are ANDed).

You can specify a weight between 1 and 100 for each instance of thepreferredDuringSchedulingIgnoredDuringExecution affinity type. When thescheduler finds nodes that meet all the other scheduling requirements of the Pod, thescheduler iterates through every preferred rule that the node satisfies and adds thevalue of the weight for that expression to a sum.

If there are two possible nodes that match thepreferredDuringSchedulingIgnoredDuringExecution rule, one with thelabel-1:key-1 label and another with the label-2:key-2 label, the schedulerconsiders the weight of each node and adds the weight to the other scores forthat node, and schedules the Pod onto the node with the highest final score.

When configuring multiple scheduling profiles, you can associatea profile with a node affinity, which is useful if a profile only applies to a specific set of nodes.To do so, add an addedAffinity to the args field of the NodeAffinity pluginin the scheduler configuration. For example:

The addedAffinity is applied to all Pods that set .spec.schedulerName to foo-scheduler, in addition to theNodeAffinity specified in the PodSpec.That is, in order to match the Pod, nodes need to satisfy addedAffinity andthe Pod's .spec.NodeAffinity.

Inter-pod affinity and anti-affinity allow you to constrain which nodes yourPods can be scheduled on based on the labels of Pods already running on thatnode, instead of the node labels.

Inter-pod affinity and anti-affinity rules take the form "thisPod should (or, in the case of anti-affinity, should not) run in an X if that Xis already running one or more Pods that meet rule Y", where X is a topologydomain like node, rack, cloud provider zone or region, or similar and Y is therule Kubernetes tries to satisfy.

You express these rules (Y) as label selectorswith an optional associated list of namespaces. Pods are namespaced objects inKubernetes, so Pod labels also implicitly have namespaces. Any label selectorsfor Pod labels should specify the namespaces in which Kubernetes should look for thoselabels.

For example, you could userequiredDuringSchedulingIgnoredDuringExecution affinity to tell the scheduler toco-locate Pods of two services in the same cloud provider zone because theycommunicate with each other a lot. Similarly, you could usepreferredDuringSchedulingIgnoredDuringExecution anti-affinity to spread Podsfrom a service across multiple cloud provider zones.

If the current Pod being scheduled is the first in a series that have affinity to themselves,it is allowed to be scheduled if it passes all other affinity checks. This is determined byverifying that no other pod in the cluster matches the namespace and selector of this pod,that the pod matches its own terms, and the chosen node matches all requested topologies.This ensures that there will not be a deadlock even if all the pods have inter-pod affinityspecified.

This example defines one Pod affinity rule and one Pod anti-affinity rule. ThePod affinity rule uses the "hard"requiredDuringSchedulingIgnoredDuringExecution, while the anti-affinity ruleuses the "soft" preferredDuringSchedulingIgnoredDuringExecution.

The affinity rule specifies that the scheduler is allowed to place the example Podon a node only if that node belongs to a specific zonewhere other Pods have been labeled with security=S1.For instance, if we have a cluster with a designated zone, let's call it "Zone V,"consisting of nodes labeled with topology.kubernetes.io/zone=V, the scheduler canassign the Pod to any node within Zone V, as long as there is at least one Pod withinZone V already labeled with security=S1. Conversely, if there are no Pods with security=S1labels in Zone V, the scheduler will not assign the example Pod to any node in that zone.

The anti-affinity rule specifies that the scheduler should try to avoid scheduling the Podon a node if that node belongs to a specific zonewhere other Pods have been labeled with security=S2.For instance, if we have a cluster with a designated zone, let's call it "Zone R,"consisting of nodes labeled with topology.kubernetes.io/zone=R, the scheduler should avoidassigning the Pod to any node within Zone R, as long as there is at least one Pod withinZone R already labeled with security=S2. Conversely, the anti-affinity rule does not impactscheduling into Zone R if there are no Pods with security=S2 labels.

In addition to labelSelector and topologyKey, you can optionally specify a listof namespaces which the labelSelector should match against using thenamespaces field at the same level as labelSelector and topologyKey.If omitted or empty, namespaces defaults to the namespace of the Pod where theaffinity/anti-affinity definition appears.

You can also select matching namespaces using namespaceSelector, which is a label query over the set of namespaces.The affinity term is applied to namespaces selected by both namespaceSelector and the namespaces field.Note that an empty namespaceSelector () matches all namespaces, while a null or empty namespaces list andnull namespaceSelector matches the namespace of the Pod where the rule is defined.

Kubernetes includes an optional matchLabelKeys field for Pod affinityor anti-affinity. The field specifies keys for the labels that should match with the incoming Pod's labels,when satisfying the Pod (anti)affinity.

The keys are used to look up values from the pod labels; those key-value labels are combined(using AND) with the match restrictions defined using the labelSelector field. The combinedfiltering selects the set of existing pods that will be taken into Pod (anti)affinity calculation.

A common use case is to use matchLabelKeys with pod-template-hash (set on Podsmanaged as part of a Deployment, where the value is unique for each revision).Using pod-template-hash in matchLabelKeys allows you to target the Pods that belongto the same revision as the incoming Pod, so that a rolling upgrade won't break affinity.

Kubernetes includes an optional mismatchLabelKeys field for Pod affinityor anti-affinity. The field specifies keys for the labels that should not match with the incoming Pod's labels,when satisfying the Pod (anti)affinity.

One example use case is to ensure Pods go to the topology domain (node, zone, etc) where only Pods from the same tenant or team are scheduled in.In other words, you want to avoid running Pods from two different tenants on the same topology domain at the same time.

Inter-pod affinity and anti-affinity can be even more useful when they are used with higherlevel collections such as ReplicaSets, StatefulSets, Deployments, etc. Theserules allow you to configure that a set of workloads shouldbe co-located in the same defined topology; for example, preferring to place two relatedPods onto the same node.

For example: imagine a three-node cluster. You use the cluster to run a web applicationand also an in-memory cache (such as Redis). For this example, also assume that latency betweenthe web application and the memory cache should be as low as is practical. You could use inter-podaffinity and anti-affinity to co-locate the web servers with the cache as much as possible.

In the following example Deployment for the Redis cache, the replicas get the label app=store. ThepodAntiAffinity rule tells the scheduler to avoid placing multiple replicaswith the app=store label on a single node. This creates each cache in aseparate node.

The following example Deployment for the web servers creates replicas with the label app=web-store.The Pod affinity rule tells the scheduler to place each replica on a node that has a Podwith the label app=store. The Pod anti-affinity rule tells the scheduler never to placemultiple app=web-store servers on a single node.

You might have other reasons to use Pod anti-affinity.See the ZooKeeper tutorialfor an example of a StatefulSet configured with anti-affinity for highavailability, using the same technique as this example.

nodeName is a more direct form of node selection than affinity ornodeSelector. nodeName is a field in the Pod spec. If the nodeName fieldis not empty, the scheduler ignores the Pod and the kubelet on the named nodetries to place the Pod on that node. Using nodeName overrules usingnodeSelector or affinity and anti-affinity rules.