short/mid term resource management in 2024/2025 (was: WG-Creation-Request: WG Accelerator Management)

744 views

Skip to first unread message

Francesco Romani

unread,

Apr 3, 2024, 3:53:17 AM4/3/24

to kubernetes-sig-node, kubernetes-si...@googlegroups.com, wg-b...@kubernetes.io, Dawn Chen, jbela...@google.com, catb...@gmail.com, dav...@gmail.com, Derek Carr

(Spinoff from https://groups.google.com/g/kubernetes-sig-architecture/c/KI7IdPT1EKs/m/MEIuGa1qCAAJ?utm_medium=email&utm_source=footer
- intentionally narrowing down the recipient list because change of
subject, feel free to re-extend)

Thanks John for driving this initiative and the work about
re-architecturing the resource model
(https://docs.google.com/document/d/1Xy8HpGATxgA2S5tuFWNtaarw5KT8D2mj1F4AP1wg6dM/edit#heading=h.47s0iugucw4z)
it's awesome to see all the activity and the energy on this topic
lately!

And thanks Marlow for raising the point about the general resource
management (cpu, memory, ...).

I do agree that it seems wiser to leave the general resource
management and intra-node topology out of scope for the new effort
ongoing (WG Accelerator Management etc.). So many moving parts atm.

I'd like to start a separate-but-related-thread here about this very
topic. It seems to me there's a general trend and consensus about
letting a new resource model,
more flexible, more generic, emerge in the long run, which will need
its due time to be figured out, the community is already striding in
that direction. And this is awesome!

But what in between?
People are using the current infrastructure and model, e.g. classic
device plugins, current incarnation of cpumanager, memory manager,
topology manager and there's a growing set of use cases and needs
which are not completely (or insufficiently) addressed by what k8s
provides today.
For example, the KEP-3675 does a very nice job about collating some of
those use cases
https://github.com/obiTrinobiIntel/enhancements/blob/bc75367ad1572ec5daa4e66a8cd4507b773f8e15/keps/sig-node/3675-resource-plugin-manager/README.md#user-stories

We have multiple ongoing KEPs addressing specific needs.
Off the top of my head:
https://github.com/kubernetes/enhancements/issues/4176
https://github.com/kubernetes/enhancements/issues/3008
https://github.com/kubernetes/enhancements/issues/2902
https://github.com/kubernetes/enhancements/pull/4541 (and more...)
Likewise for issues:
https://github.com/kubernetes/kubernetes/issues/124144
https://github.com/kubernetes/kubernetes/issues/122295 (and more...)

Some of the links I shared are "just" plain old bugs, but point is:
there are currently gaps and pain points, and I'm sure all will be
addressed and improved by the long-term redesing. But what in between?
:)

Question is: how, as community, do we position ourselves wrt these needs?

IMHO there's room for some improvement of the current model and
codebase to meet some of these requirements and bridge some more the
gap while the new resource management iteration takes form.
OTOH the resources of the community are limited. This is an
unfortunate sad reality.

Perhaps do we want (or are we forced to?) to move towards a deeper
maintenance mode, on which bugs are fixed and only very limited
features are accepted?

Thoughts and comments appreciated!

Thanks and bests,

On Wed, Apr 3, 2024 at 1:24 AM 'John Belamaric' via
kubernetes-sig-architecture
<kubernetes-si...@googlegroups.com> wrote:
>
> And maybe to answer your initial question: I don't think we want this to be general resource management, at least not for 2024. I think we would lose focus too much. So, things like power saving CPUs, etc. are probably at least initially out of scope.
>
> We would like to address intra-node topology at some point, so that we can avoid scheduling failures due to topology misalignment, and so that we can express things like "these two devices need to be 'close' to one another (whatever 'close' means)". We also want to try to address things like inter-node topology (think: 'closeness'/connectivity between specialized interfaces), and multi-network attachments and topologies.
>
> I am worried trying to address topology will conflict with the goal of some beta in 1.32. But there are others in the discussion that believe it's critical. We'll have to sort that out in the WG!
>
> John
>
>
> On Tue, Apr 2, 2024 at 4:18 PM John Belamaric <jbela...@google.com> wrote:
>>
>> It certainly is useful to clearly define what is in and out of scope. We can address that in the charter, which we will need to write once the WG is approved.
>>
>> Given the (so far) enthusiastic support, I will put out a PR in the next day or two to get the WG added. Once that's merged we'll put out a charter and try to address your comment (help appreciated :) ).
>>
>> John
>>
>>
>> On Tue, Apr 2, 2024 at 2:51 PM Marlow Weston <catb...@gmail.com> wrote:
>>>
>>> Will discussions on other resources, say attributes around CPU & memory, be in scope? With a simple thought experiment, much of this can just be pushed into "compute in various regions" and generalized further. I'm assuming that NUMA nodes, power use, et cetera (particularly power given the large power demands some of the acceleration requires) may also be topics to come up.
>>>
>>> What I'm really asking is: should there be a goals/non goals section for this wg?
>>>
>>> On Tue, Apr 2, 2024, 3:41 PM 'John Belamaric' via kubernetes-sig-scheduling <kubernetes-s...@googlegroups.com> wrote:
>>>>
>>>> Yes, those are intended to be in scope. I am ok with your name suggestion. Patrick and Kevin? Others?
>>>>
>>>> On Tue, Apr 2, 2024 at 12:48 PM Bryant Biggs <bryan...@gmail.com> wrote:
>>>>>
>>>>> This is fantastic! Just curious if other devices that are commonly used with accelerated workloads will be considered in the design implementations created by this WG - such as networking devices like EFA and Infiniband? Any compelling reasons not to have the working group as - WG Device Management?
>>>>>
>>>>> On Tuesday, April 2, 2024 at 3:39:38 PM UTC-4 Tushar Katarki wrote:
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> On Tue, Apr 2, 2024 at 3:03 PM Derek Carr <dec...@redhat.com> wrote:
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> On Tue, Apr 2, 2024 at 2:39 PM Davanum Srinivas <dav...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> On Tue, Apr 2, 2024 at 2:11 PM Mrunal Patel <mpa...@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> On Tue, Apr 2, 2024 at 11:10 AM Antonio Ojea <antonio.o...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> On Tue, 2 Apr 2024 at 18:52, 'John Belamaric' via kubernetes-sig-network <kubernetes-...@googlegroups.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello Kube community,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We would like to propose a new working group, WG Accelerator Management, to address the urgent need for improved support for accelerators in Kubernetes. Satisfying the intense industry demand to make efficient use of these scarce and expensive resources will require revisiting the existing APIs, models, scheduling algorithms, and autoscaling functionality within Kubernetes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Our primary effort for supporting this so far has been in the DRA KEPs, which are currently managed out of WG Batch. However, it has become clear that there are many non-batch workloads - such as AI inference workloads - that also have requirements for these efforts. Thus, we are proposing this WG to directly address these needs, with WG Batch and the proposed WG Serving providing guidance, use cases, requirements and other input to this working group.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Much of this was discussed at the recent KubeCon EU, where many folks with non-batch use cases approached us and asked where they could join to help contribute to the efforts. This proposed working group would provide that forum.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This differs from the WG Serving proposal, in that we will not focus specifically on inference workloads, but more on the lower level APIs, abstractions, and feature designs needed to configure, target, and share the necessary hardware for both batch and inference workloads. WG Batch and WG Serving focus more on upper-level workload controller APIs; this WG is focused on the lower-level APIs. The APIs and functionality coordinated from this WG will be consumed by those coordinated from the other WGs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For additional background, see:
>>>>>>>>>>>
>>>>>>>>>>> KubeCon EU Unconference
>>>>>>>>>>>
>>>>>>>>>>> [PUBLIC] Revisiting Kubernetes Hardware Resource Model
>>>>>>>>>>>
>>>>>>>>>>> 1.30 DRA Semantic Model
>>>>>>>>>>>
>>>>>>>>>>> “Classic” DRA
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For some existing use cases, see:
>>>>>>>>>>>
>>>>>>>>>>> Dynamic Resource Allocation (DRA)
>>>>>>>>>>>
>>>>>>>>>>> NVIDIA GPU Use-Cases for Dynamic Resource Allocation (DRA)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Answers to the workgroup governance questions (see [PUBLIC] Revisiting Kubernetes Hardware Resource Model for more details):
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > What is the exact problem this group is trying to solve?
>>>>>>>>>>>
>>>>>>>>>>> Enable efficient utilization of specialized hardware. This includes sharing one or more resources effectively (many workloads sharing a pool of devices), as well as sharing individual devices effectively (several workloads dividing up a single device for sharing).
>>>>>>>>>>>
>>>>>>>>>>> Enable workload authors to specify “just enough” details about their workload requirements to ensure it runs optimally, without having to understand exactly how the infrastructure team has provisioned the cluster.
>>>>>>>>>>>
>>>>>>>>>>> Enable the scheduler to choose the correct place to run a workload the vast majority of the time (rejections should be extremely rare).
>>>>>>>>>>>
>>>>>>>>>>> Enable cluster autoscalers and other node auto-provisioning components to predict whether creating additional resources will satisfy workload needs, before provisioning those resources.
>>>>>>>>>>>
>>>>>>>>>>> Enable the shift from “pods run on nodes” to “workloads consume capacity”. This allows Kubernetes to provision sets of pods on top of sets of nodes and specialized hardware, while taking into account the relationships between those infrastructure components.
>>>>>>>>>>>
>>>>>>>>>>> Minimize workload disruption due to hardware failures.
>>>>>>>>>>>
>>>>>>>>>>> Address fragmentation of accelerator due to fractional use.
>>>>>>>>>>>
>>>>>>>>>>> Additional problems that may be identified and deemed in scope as we gather use cases and requirements from WG Serving, WG Batch, and other stakeholders.
>>>>>>>>>>>
>>>>>>>>>>> Address all of the above while with a simple API that is a natural extension of the existing Kubernetes APIs, and avoids or minimizes any transition effort.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > What is the artifact that this group will deliver, and to whom?
>>>>>>>>>>>
>>>>>>>>>>> Ultimately, the WG will coordinate the delivery of KEPs and their implementations by the participating SIGs. Interim artifacts will include documents capturing use cases, requirements, and designs; however, all of those will eventually result in KEPs and code owned by SIGs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > How does the group know when the problem solving process is completed, and it is time for the Working Group to dissolve?
>>>>>>>>>>>
>>>>>>>>>>> When the KEPs resulting from these discussions have reached a terminal state.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > Who are all of the stakeholder SIGs involved in this problem this group is trying to solve?
>>>>>>>>>>>
>>>>>>>>>>> SIG Architecture
>>>>>>>>>>>
>>>>>>>>>>> SIG Node
>>>>>>>>>>>
>>>>>>>>>>> SIG Scheduling
>>>>>>>>>>>
>>>>>>>>>>> SIG Autoscaling
>>>>>>>>>>>
>>>>>>>>>>> SIG Network
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > What are the meeting mechanics (frequency, duration, roles)?
>>>>>>>>>>>
>>>>>>>>>>> One hour meetings every other week, with a moderator.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > Does the goal of the Working Group represent the needs of the project as a whole, or is it focused on the interests of a narrow set of contributors or companies?
>>>>>>>>>>>
>>>>>>>>>>> A broad set of end users, device vendors, cloud providers, Kubernetes distribution providers, and ecosystem projects (particularly autoscaling-related projects) have expressed interest in this effort.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > Who will chair the group, and ensure it continues to meet these requirements?
>>>>>>>>>>>
>>>>>>>>>>> John Belamaric
>>>>>>>>>>>
>>>>>>>>>>> Kevin Klues
>>>>>>>>>>>
>>>>>>>>>>> Patrick Ohly
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > Is diversity well-represented in the Working Group?
>>>>>>>>>>>
>>>>>>>>>>> We welcome and encourage contributors of all backgrounds and geographies to participate.
>>>>>>>>>>>
>>>>>>>>>>> For diversity of stakeholder interests, we see five primary constituencies. We would like to recruit multiple representatives to participate from each of these constituencies:
>>>>>>>>>>>
>>>>>>>>>>> Device vendors that manufacture accelerators and other specialized hardware which they would like to make available to Kubernetes users.
>>>>>>>>>>>
>>>>>>>>>>> Kubernetes distribution and managed offering providers that would like to make specialized hardware available to their users.
>>>>>>>>>>>
>>>>>>>>>>> Kubernetes ecosystem projects that help manage workloads utilizing these accelerators (e.g., Karpenter, Kueue, Volcano)
>>>>>>>>>>>
>>>>>>>>>>> End user workload authors that will create workloads that take advantage of the specialized hardware.
>>>>>>>>>>>
>>>>>>>>>>> Cluster administrators that operate and govern clusters containing the specialized hardware.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>>
>>>>>>>>>>> John Belamaric
>>>>>>>>>>>
>>>>>>>>>>> Patrick Ohly
>>>>>>>>>>>
>>>>>>>>>>> Kevin Klues
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
>>>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CAC_RkjxqvBf3t11zmOK5zaNDq%2Bq70zwnk5w3aVpzo66L0M2v_A%40mail.gmail.com.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
>>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CABhP%3DtbFD-15u8MnU7EtU7qYrwi22LkgCCSeEzjo-zqudcX0Hg%40mail.gmail.com.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CANEZBD4DS-xuENyvOzOgJx%3D-y5wtvDPS%2B%3DoF4926EMO40fnyAQ%40mail.gmail.com.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Davanum Srinivas :: https://twitter.com/dims
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CANw6fcH-he-%2B6yyHkmVM6brvydqN4A7%2BYHo7eDyr8Jg_0H7p4g%40mail.gmail.com.
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
>>>>>>>
>>>>>>>
>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CAHROWxR4E6dovKzOOL2aK9G_jWNffJhbOg0MA3%3DX%3DU7RRxZSAw%40mail.gmail.com.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tushar Katarki
>>>>>> Director, OpenShift Product Management
>>>>>> Red Hat
>>>>>> +1-978-618-6690 (M)
>>>>>> US Eastern Time
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAC_Rkjxd-_6%3DCjE_aHTgMkegA8PPC6_PNvZQ_v0wB8jYZQW%3DVA%40mail.gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAC_RkjzBJur3JZOBFAJdLf24bGS5AP9n%2BrgW9sOK3woGzPf77g%40mail.gmail.com.

Reply all

Reply to author

Forward

0 new messages