I'm wondering if it's time to actually recast the CPUManager as a device plugin so that people can customize it to whatever policy they want. I haven't thought much about the details of how this would work exactly, but the CPUManager was written at a time before the device plugin interface existed, and it actually performs alot of the same functionality that device plugins do. If we decide to go this route, then the existing static CPUManager could remain available as a built-in policy, and a new "external" policy could be added to direct the CPUManager to make policy decisions from the plugin.
Kevin
Hey Kevin
I would very much like this approach, and I think it would also play nice with the longer-term future plans we were discussing in sig-node in the last few months, like https://github.com/container-orchestrated-devices/resource-management-improvements-wg/issues/1
I'm aware of this project https://github.com/nokia/CPU-Pooler
which seems to be very close to the goal, could be a very good
basis for this work. I'm not sure how the path forward could look
like however. For example, where this device plugin should sit?
Should be part of kubernetes core?
I'll mention this option in my session next week (April 13, I
cannot attend today) so we can keep the discussion open on this
subject.
Hi,
Let me drop Levente, the author or CPU Pooler to the discussion.
Br,
Gerg0
--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/HE1PR07MB43630E5BAB7C351F8F82B15CE1769%40HE1PR07MB4363.eurprd07.prod.outlook.com.
Hi,
Thanks for the add!
I will try and attend next Monday call too, hopefully can provide some insight related to the discussion considering we do use both the DP approach, and the two mentioned policies as well on the field already for some time.
If you don’t mind me, couple points from me related to what was being discussed so far:
1. External API vs internal API
I don’t think anyone cares about who is implementing the policy, the important thing is the API via which users can ask for the resources.
I.e. resources API. biggest pain point of my existing “customers” is that they need to use different syntax depending on whether they -want to- run on a CPU Manager, or CPU Pooler node (spec.resources.cpu vs spec.resources.POOL_NAME)
If the user-facing API could be the same for both in-built, and outsourced policies (either just .cpu, or just the pool-like nomenclature with .cpu becoming a reserved pool type) that would be fantastic
Note, outsourcing policies is not a new idea, it was just previously rejected by the community.
2. NUMA alignment
Can confirm indeed not an issue, Pooler does it 😊 (it doesn’t implement preferred allocation API atm but reports the socket info during device discovery, and as users usually ask these resources together with SR-IOV VFs the default alignment done by topology manager and the DM is good enough)
3. Shortcomings
While in theory outsourcing CPU management policies in such a way is indeed not a big issue, there are some minor pitfalls which need to be addressed.
A: automatic reconciliation currently forces us to entirely disable CPU-Manager on these nodes, so that’s something which would prob need to be looked at if community goes this way
B: cpuset cgroup creation: not sure who would be responsible for creating the cpuset (and cpu,cpuacct) cgroup(s) in this design.
Regardless whether it is Kubelet, or this would be also outsourced to the plugin container/Pod information now needs to reach the DP in the Allocate() call, which is currently not a thing in the DPAPI.
In the current design I’m trying to retro-actively rewrite the cgroups which causes all kinds of timing issues I need to deal with.
In an official design it would be good if either the plugin could create the cgroup for the containesr, or it could tell Kubelet what to create it with (maybe the mountpath in the Allocate response could be “abused” for this without needing a DPAPI change?)
That’s all, don’t want to hijack the thread just thought I share some field experience with you! See you on Monday 😊
Br,
Levent
Hi,
Thanks for the add!
I will try and attend next Monday call too, hopefully can provide some insight related to the discussion considering we do use both the DP approach, and the two mentioned policies as well on the field already for some time.
Hey! thanks for chiming in, lots of great points I'll need to think about. Just a quick note: the sig-node meeting is every TUESDAY (not monday) https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.d9zp2j5jvkke
I'll make sure to book a time slot in the coming days so we can
discuss