architecture-aware placement of pods

92 views
Skip to first unread message

Andras Szerdahelyi

unread,
Sep 8, 2021, 4:59:40 PM9/8/21
to kubernetes-sig-scheduling
Hi kubernetes-sig-scheduling,

i've been having a not-so-great time trying to herd my x86 and arm32l workloads to the nodes with the appropriate architecture in my cluster, through taints, tolerations and node selectors. It's a pain and if i'm reading https://github.com/opencontainers/image-spec/blob/main/image-index.md correctly, a compliant registry may support pulling content built for a specific architecture. Interestingly, although i could only find articles promoting the taint/toleration/nodeSelector -dance, this docs page paragraph reads like it may already implemented in scheduling https://kubernetes.io/docs/concepts/containers/images/#multi-architecture-images-with-image-indexes . If it is, it's light on some advanced scheduling detail, e.g. is it possible to specify an architecture preference list for container image/s in the pods under scheduling? ( "for this pod prefer arm, fallback to x86" )

i have a couple ideas what the current situation regarding this feature may be and would appreciate some guidance figuring out next steps :
a ) PBKAC - it's implemented, i just can't search/read documentation
b ) it's implemented but may be slightly lacking documentation
c ) it's not implemented but planned
d ) it's not implemented because there hasn't been a need
e ) it's not implemented because it's out of scope for scheduling or other built-in controllers ( in this case, is this a "mutating" admission controller, maybe a custom scheduler, configured via labels? )

thanks!
Andras


Dave Chen

unread,
Sep 8, 2021, 10:58:20 PM9/8/21
to Andras Szerdahelyi, kubernetes-sig-scheduling, kevinw...@gmail.com
Thanks Andras for bringing this, I have been thinking about this for a while but without a solid reason to promote this for scheduling.

Currently, scheduler doesn't have such a plugin specifically for a hybrid cluster with different hardware architectures, or any planned feature enhancement anywhere IIUC,   I was thinking that the plugin like nodeAffinity could easily address this problem with some additional pod configuration (as you said, nodeSelectorTerms for the pod spec), so firstly, I'd like to understand your pain point with an approach like this, any other special requirement?

If we are going to support architecture-aware scheduling, this is mostly like a new scheduling plugin, and functionally, might overlap with other default enabled plugins implemented in scheduler, or maybe a good option for scheduler-plugin [1] ?



Cheers
Dave


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/21bee85d-f81e-45e2-9e92-66ea4b753751n%40googlegroups.com.

Andras Szerdahelyi

unread,
Sep 9, 2021, 1:18:37 PM9/9/21
to kubernetes-sig-scheduling
Hi Dave, thanks for the response!

one pain point with nodeAffinity would be, that this still leaves the responsibility of sorting out what architectures the pod spec' container images support, with the user having to maintain a preference/expression list on manifests that are possibly not under their control. 

Now i'm thinking this may be a mutating admission webhook or similar component, that hosts a "default affinity" in a configmap or namespaced CR, then it inspects the image manifests and filters matchExpressions / preferences to leave only architectures that are actually available ( in the manifest, not the cluster ), before hanging them on the podspec. This way one could let workloads in a namespace know that they value resources on their x86 nodes higher than arm ones ( let's say arm32 -> arm64 -> x86 in priority order ), and when available in their containers' image manifests, pods should launch on arm architectures first. Any architectures that are not available in the manifests would be removed ( say, we don't have any arm64s )

> If we are going to support architecture-aware scheduling, this is mostly like a new scheduling plugin, and functionally, might overlap with other default enabled plugins implemented in scheduler, or maybe a good option for scheduler-plugin [1] ?

can you clarify this, are you suggesting this could be a scheduler plugin? what default plugins would this overlap with?

thanks,
Andras

Andras Szerdahelyi

unread,
Sep 9, 2021, 1:25:55 PM9/9/21
to kubernetes-sig-scheduling
i also realized that the affinity path here would require requiredDuringSchedulingRequiredDuringExecution , that may not be available last time i checked?

Dave Chen

unread,
Sep 10, 2021, 5:22:12 AM9/10/21
to Andras Szerdahelyi, kubernetes-sig-scheduling
Hi Andras,

>  can you clarify this, are you suggesting this could be a scheduler plugin? what default plugins would this overlap with?

two cases here,
- image is already a multi-arch image, but user has some preference.
- image itself it's not a multi-arch image, the Pod could be only launched on a specific kind of nodes (amd64, arm64 etc.)

I think the second one is really an issue as Pod might scheduled to a node which incompatible with image, which will lead to a failed status during the runtime phase,  I not sure whether there is any API exposed for us to inspect the supported arch directly, if there is we can implement a scheduler plugin with some soft/hard requirement during scheduling, this will help to lighten the burden for the operators. Your idea of webhook seems workable too.

> i also realized that the affinity path here would require requiredDuringSchedulingRequiredDuringExecution

Yes, this is still not implemented,  have filed an issue here: https://github.com/kubernetes/kubernetes/issues/104895, let's see if we can get some feedback from the maintainers.

Cheers,
Dave

Reply all
Reply to author
Forward
0 new messages