Re: Making Kubernetes great for accelerated workloads: a serving working group

190 views
Skip to first unread message

Davanum Srinivas

unread,
Apr 2, 2024, 8:34:08 AMApr 2
to Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1 from me Sergey, folks!

On Mon, Apr 1, 2024 at 9:30 PM Sergey Kanzhelev <s.kan...@gmail.com> wrote:

This is a formal proposal to create a new working group that discusses and enhances support of inference serving for accelerated workloads. This document sets the context and outlines goals of the Working Group. Please share your feedback and suggestions, especially your own goals and use cases, so we can gauge the level of interest.


This email is also a google doc. If you want to comment on specific wording or make suggestions via comments in google doc, here is the link: Making Kubernetes great for accelerated workloads: a serving working group 


Context and proposed goals:


Kubernetes is in high demand for training and serving large language model (Generative AI) workloads using accelerators. While WG-Batch has been working for several years to enable large scale batch jobs for training and the DRA effort under SIG-Node has generated a number of important enabling scenarios, the inference side of the equation has been more diffusely represented or a secondary consideration among discussions in multiple SIGs. We believe there is a need for a working group to concentrate discussions on simplicity, reliability, and efficiency of accelerated inference workloads.


LLM workloads tend to be 1:1 with the size of the node or span multiple nodes in a single replica, put heavy demands on autoscaling of pods and clusters, can fail more often / take longer to start than regular web applications, and need/leverage a variety of workload primitives (StatefulSets are heavily used by large GenAI inference workloads) that are not always designed to support them.


The suggestion to start the WG-Serving was discussed at KubeCon EU in various forums and there was clear interest in addressing the proposed problems. After speaking with the SIG Apps leads and the connection to existing workload controllers, we believe it is the best suited SIG to host this working group, but it will depend on work in multiple SIGs just like WG-Batch.


Proposed goals:


  • Provide concrete input to other SIGs and WG around needs of inference workloads.

  • Gather requirements for serving workloads (inference primarily, but benefiting other non-batch use cases where possible) that have broad community alignment from practitioners, distros, and vendors.

  • Directly improve key kubernetes workload controllers when used with accelerators and the most common inference serving frameworks and model servers.

  • Partner with existing ecosystem projects like kServe, Seldon, Kaito, and others to identify, extract, or implement common shared problems (like Kueue abstracted deferred scheduling for multiple batch frameworks).

  • Explore new projects that improve orchestration, scaling, and load balancing of inference workloads and compose well with other workloads on Kubernetes


Many use cases are collected in this document: Use cases proposed for WG-Serving. Summarizing a few here:


  • Better workload controllers for inference workloads, especially those that span more than one host (e.g. LeaderWorkerSet)

  • Autoscaling and load balancing accelerated workloads is very important for cost, but it is weakly supported and slow

  • Running smaller pre-production serving workloads is hard relative to batch

    • Because dev/test/prototype serving workloads aren’t quite interruptible, but don’t run forever and need to scale to zero when unused

    • Because it’s hard to configure accelerators for sharing / it doesn’t work well 

  • Accelerators are hard to use consistently across multiple clouds for workloads (which are mostly serving workloads and pre-prod workloads)

  • Large accelerated workloads are more vulnerable to disruption, slower to start, and need better primitives for mitigating disruption (with limited capacity)


We would like to gather feedback from the involved SIGs and propose a charter that would ensure Kubernetes is an excellent foundation to run inference workloads.  The working group would run until inference workloads are as well supported as microservices, stateful workloads, or batch - which we believe based on experience in WG-Batch will take 1-2 years.


Answers to the working group governance questions:


> What is the exact problem this group is trying to solve?


The context above sets the context and goals of the Working Group. Making Kubernetes the natural choice when choosing the platform to run Serving Workload is a long term goal of the Working Group.

> What is the artifact that this group will deliver, and to whom?


We envision contributions to various SIGs to address near term pain points and allow proper extensibility for Serving workloads. The Working Group may also own a new repository under kubernetes-sigs or contribute to open projects implementing primitives that support simple, reliable, and efficient accelerated inference. Specifics of this will be a point of discussion for the charter of the working group.


> How does the group know when the problem solving process is completed, and it is time for the Working Group to dissolve?


When existing serving frameworks all converge on a set of common components or a new serving framework will choose the k8s as a first platform to run inference.


> Who are all of the stakeholder SIGs involved in this problem this group is trying to solve?


  • SIG Apps as an primary SIG

  • SIG Architecture

  • SIG Node

  • SIG Scheduling

  • SIG Autoscaling

  • SIG Network

  • SIG Storage


> What are the meeting mechanics (frequency, duration, roles)?


The plan to meet at least bi-weekly. 


> Does the goal of the Working Group represent the needs of the project as a whole, or is it focused on the interests of a narrow set of contributors or companies?


The goal of the WG to make Kubernetes to be the natural choice when thinking of serving accelerated workloads and to reduce the operational toil involved. Given the broad experimentation and exploration with open-weight large language models, as well as the extensive use of Kubernetes to host foundation models, we believe this will benefit most large platform teams. Individual companies will be able to innovate as an extension to the baseline support.


> Who will chair the group, and ensure it continues to meet these requirements?


The question is still open, but Sergey Kanzhelev volunteered to chair the working group.

 

> Is diversity well-represented in the Working Group?

We welcome and encourage contributors of all backgrounds and geographies to participate. As for corp diversity, a few companies already expressed interest to participate and contribute and we are very interested in others who wish to advise the work.


/Sergey, Clayton, with contributions of many


--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.


--
Davanum Srinivas :: https://twitter.com/dims

Derek Carr

unread,
Apr 2, 2024, 9:31:05 AMApr 2
to Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

On Tue, Apr 2, 2024 at 8:50 AM Xing Yang <xingy...@gmail.com> wrote:
+1. Great initiative!

Thanks,
Xing


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CANw6fcELihKjcS4YBGqBE1rJdOw0u9mTZDNqcuhzGbe4RLS5Kg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CADwwA9tgrtfWnAgzQV%3DQyf1LOLjrNj4MUtsdeMy9wsMosWt%2Bbw%40mail.gmail.com.

Tim Hockin

unread,
Apr 2, 2024, 11:42:52 AMApr 2
to Wei Huang, Derek Carr, Mrunal, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

On Tue, Apr 2, 2024 at 8:38 AM Wei Huang <hwe...@gmail.com> wrote:
>
> +1
>
> Regards,
> --------------
> Wei Huang
> hwe...@gmail.com
> On Apr 2, 2024 at 07:52 -0700, Mrunal <mru...@gmail.com>, wrote:
>
> +1
>
>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CAHROWxTMzAk941utOre834UFP9przYet1Xf%2Bh6dgNnbi43ARsw%40mail.gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAMp598U1qO7gdgLdkpsjwEc5K9-u6p5ix61XEBVhvZJrme5CTQ%40mail.gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/60bd4bd0-bd62-41ae-a938-978e0f5d0916%40Spark.

John Belamaric

unread,
Apr 2, 2024, 1:51:33 PMApr 2
to Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1 from me, with sig-arch hat

On Tue, Apr 2, 2024 at 7:50 AM Niteesh Rao <nitees...@gmail.com> wrote:
+1

--
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/CAF180f15iz7f%2B%3D14PfvpPeyguanSZOSxuZXyjBJwSGMyaWodwQ%40mail.gmail.com.

Sergey Kanzhelev

unread,
Apr 16, 2024, 8:17:26 PMApr 16
to kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
Hi,

With a lot of support for the idea, the next steps are to formalize the WG creation and pick a time slot for a meeting. We hope to get the first meeting on the calendar next week.

As discussed today at the WG Device Management kick-off meeting, there is a short timeframe to give requirements for 1.30, so we expect an intense and packed meeting agenda from the beginning. And we want to start meetings before the WG will formally be created. 

PR with charter is out: https://github.com/kubernetes/community/pull/7823
Time slot response form: https://forms.gle/27mqbTC1xBP5QPSV9

/Sergey

Sergey Kanzhelev

unread,
Apr 22, 2024, 3:35:18 PMApr 22
to kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
Hi,

As anybody can imagine, finding time for a meeting is hard. The results of a poll showed that Wed 9-10 (Pacific) works for the majority of people. Thank you for being flexible!

The kick off meeting for the WG will happen this Wed, Apr 24, 9:00AM- 10:00AM.

Calendar invite is here

I also hope to get the Slack channel going soon: https://github.com/kubernetes/community/pull/7830 so there is a place to coordinate things.

/Sergey

Clayton Coleman

unread,
Apr 23, 2024, 6:15:07 AMApr 23
to Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
I look forward to speaking with everyone who can make it then.  I added two agenda items, but since this is the first meeting I also welcome additional suggestions on clarifications / goals before we dive in.

--
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.

Sergey Kanzhelev

unread,
Apr 29, 2024, 6:48:04 PMApr 29
to Clayton Coleman, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
Hi,


Please join our WG google group mailing list and slack channel. And see you at the meeting this Wednesday!


Reply all
Reply to author
Forward
0 new messages