Re: Making Kubernetes great for accelerated workloads: a serving working group

54 views
Skip to the first unread message

Davanum Srinivas

unread,
2 Apr 2024, 08:34:072 Apr
to Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1 from me Sergey, folks!

On Mon, Apr 1, 2024 at 9:30 PM Sergey Kanzhelev <s.kan...@gmail.com> wrote:

This is a formal proposal to create a new working group that discusses and enhances support of inference serving for accelerated workloads. This document sets the context and outlines goals of the Working Group. Please share your feedback and suggestions, especially your own goals and use cases, so we can gauge the level of interest.


This email is also a google doc. If you want to comment on specific wording or make suggestions via comments in google doc, here is the link: Making Kubernetes great for accelerated workloads: a serving working group 


Context and proposed goals:


Kubernetes is in high demand for training and serving large language model (Generative AI) workloads using accelerators. While WG-Batch has been working for several years to enable large scale batch jobs for training and the DRA effort under SIG-Node has generated a number of important enabling scenarios, the inference side of the equation has been more diffusely represented or a secondary consideration among discussions in multiple SIGs. We believe there is a need for a working group to concentrate discussions on simplicity, reliability, and efficiency of accelerated inference workloads.


LLM workloads tend to be 1:1 with the size of the node or span multiple nodes in a single replica, put heavy demands on autoscaling of pods and clusters, can fail more often / take longer to start than regular web applications, and need/leverage a variety of workload primitives (StatefulSets are heavily used by large GenAI inference workloads) that are not always designed to support them.


The suggestion to start the WG-Serving was discussed at KubeCon EU in various forums and there was clear interest in addressing the proposed problems. After speaking with the SIG Apps leads and the connection to existing workload controllers, we believe it is the best suited SIG to host this working group, but it will depend on work in multiple SIGs just like WG-Batch.


Proposed goals:


  • Provide concrete input to other SIGs and WG around needs of inference workloads.

  • Gather requirements for serving workloads (inference primarily, but benefiting other non-batch use cases where possible) that have broad community alignment from practitioners, distros, and vendors.

  • Directly improve key kubernetes workload controllers when used with accelerators and the most common inference serving frameworks and model servers.

  • Partner with existing ecosystem projects like kServe, Seldon, Kaito, and others to identify, extract, or implement common shared problems (like Kueue abstracted deferred scheduling for multiple batch frameworks).

  • Explore new projects that improve orchestration, scaling, and load balancing of inference workloads and compose well with other workloads on Kubernetes


Many use cases are collected in this document: Use cases proposed for WG-Serving. Summarizing a few here:


  • Better workload controllers for inference workloads, especially those that span more than one host (e.g. LeaderWorkerSet)

  • Autoscaling and load balancing accelerated workloads is very important for cost, but it is weakly supported and slow

  • Running smaller pre-production serving workloads is hard relative to batch

    • Because dev/test/prototype serving workloads aren’t quite interruptible, but don’t run forever and need to scale to zero when unused

    • Because it’s hard to configure accelerators for sharing / it doesn’t work well 

  • Accelerators are hard to use consistently across multiple clouds for workloads (which are mostly serving workloads and pre-prod workloads)

  • Large accelerated workloads are more vulnerable to disruption, slower to start, and need better primitives for mitigating disruption (with limited capacity)


We would like to gather feedback from the involved SIGs and propose a charter that would ensure Kubernetes is an excellent foundation to run inference workloads.  The working group would run until inference workloads are as well supported as microservices, stateful workloads, or batch - which we believe based on experience in WG-Batch will take 1-2 years.


Answers to the working group governance questions:


> What is the exact problem this group is trying to solve?


The context above sets the context and goals of the Working Group. Making Kubernetes the natural choice when choosing the platform to run Serving Workload is a long term goal of the Working Group.

> What is the artifact that this group will deliver, and to whom?


We envision contributions to various SIGs to address near term pain points and allow proper extensibility for Serving workloads. The Working Group may also own a new repository under kubernetes-sigs or contribute to open projects implementing primitives that support simple, reliable, and efficient accelerated inference. Specifics of this will be a point of discussion for the charter of the working group.


> How does the group know when the problem solving process is completed, and it is time for the Working Group to dissolve?


When existing serving frameworks all converge on a set of common components or a new serving framework will choose the k8s as a first platform to run inference.


> Who are all of the stakeholder SIGs involved in this problem this group is trying to solve?


  • SIG Apps as an primary SIG

  • SIG Architecture

  • SIG Node

  • SIG Scheduling

  • SIG Autoscaling

  • SIG Network

  • SIG Storage


> What are the meeting mechanics (frequency, duration, roles)?


The plan to meet at least bi-weekly. 


> Does the goal of the Working Group represent the needs of the project as a whole, or is it focused on the interests of a narrow set of contributors or companies?


The goal of the WG to make Kubernetes to be the natural choice when thinking of serving accelerated workloads and to reduce the operational toil involved. Given the broad experimentation and exploration with open-weight large language models, as well as the extensive use of Kubernetes to host foundation models, we believe this will benefit most large platform teams. Individual companies will be able to innovate as an extension to the baseline support.


> Who will chair the group, and ensure it continues to meet these requirements?


The question is still open, but Sergey Kanzhelev volunteered to chair the working group.

 

> Is diversity well-represented in the Working Group?

We welcome and encourage contributors of all backgrounds and geographies to participate. As for corp diversity, a few companies already expressed interest to participate and contribute and we are very interested in others who wish to advise the work.


/Sergey, Clayton, with contributions of many


--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.


--
Davanum Srinivas :: https://twitter.com/dims

Xing Yang

unread,
2 Apr 2024, 08:49:552 Apr
to Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1. Great initiative!

Thanks,
Xing


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CANw6fcELihKjcS4YBGqBE1rJdOw0u9mTZDNqcuhzGbe4RLS5Kg%40mail.gmail.com.

Tim Hockin

unread,
2 Apr 2024, 11:42:512 Apr
to Wei Huang, Derek Carr, Mrunal, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

On Tue, Apr 2, 2024 at 8:38 AM Wei Huang <hwe...@gmail.com> wrote:
>
> +1
>
> Regards,
> --------------
> Wei Huang
> hwe...@gmail.com
> On Apr 2, 2024 at 07:52 -0700, Mrunal <mru...@gmail.com>, wrote:
>
> +1
>
> On Tue, Apr 2, 2024 at 6:31 AM Derek Carr <dec...@redhat.com> wrote:
>>
>> +1
>>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CADwwA9tgrtfWnAgzQV%3DQyf1LOLjrNj4MUtsdeMy9wsMosWt%2Bbw%40mail.gmail.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CAHROWxTMzAk941utOre834UFP9przYet1Xf%2Bh6dgNnbi43ARsw%40mail.gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAMp598U1qO7gdgLdkpsjwEc5K9-u6p5ix61XEBVhvZJrme5CTQ%40mail.gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/60bd4bd0-bd62-41ae-a938-978e0f5d0916%40Spark.

Aldo Culquicondor

unread,
2 Apr 2024, 13:43:082 Apr
to Daniel Vega-Myhre, Antonio Ojea, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1
Aldo


On Tue, Apr 2, 2024 at 1:11 PM 'Daniel Vega-Myhre' via wg-batch <wg-b...@kubernetes.io> wrote:
+1

On Tue, Apr 2, 2024 at 10:04 AM 'Antonio Ojea' via wg-batch <wg-b...@kubernetes.io> wrote:
+1

On Tue, Apr 2, 2024 at 4:50 PM Niteesh Rao <nitees...@gmail.com> wrote:
+1

On Tue, Apr 2, 2024 at 9:31 PM Derek Carr <dec...@redhat.com> wrote:
+1

On Tue, Apr 2, 2024 at 8:50 AM Xing Yang <xingy...@gmail.com> wrote:
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CADwwA9tgrtfWnAgzQV%3DQyf1LOLjrNj4MUtsdeMy9wsMosWt%2Bbw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAHROWxTMzAk941utOre834UFP9przYet1Xf%2Bh6dgNnbi43ARsw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAF180f15iz7f%2B%3D14PfvpPeyguanSZOSxuZXyjBJwSGMyaWodwQ%40mail.gmail.com.

--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.

Aldo Culquicondor

unread,
5 Apr 2024, 16:37:035 Apr
to Clayton Coleman, Madhav Jivrajani, haosdent, John Belamaric, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
> In my mind, bullets 1 and 3 would be more in the WG-Batch domain (but are very relevant).

Indeed. We welcome you to open issues in the Kueue repo and to come to the wg-batch meetings if you have specific proposals.

Aldo


On Fri, Apr 5, 2024 at 3:30 PM 'Clayton Coleman' via kubernetes-sig-scheduling <kubernetes-s...@googlegroups.com> wrote:
Thank you Madhav for the connections, that will be very helpful.  A few comments inline

On Fri, Apr 5, 2024 at 1:31 PM Madhav Jivrajani <madha...@gmail.com> wrote:
Hi folks,

Thank you so much for starting this, needless to say +1!

What I did want to drop in and say was that I had the opportunity to hold an unconference session at AI Hub in KubeCon specifically for what we need out of Kubernetes to help better support these workloads. It was great having this session attended by folks running inference workloads in eclectic ways and the outcomes of this session are summarised below:
  • Folks wanted to use schedulers like Volcano for some aspects of the model lifecycle and other projects like kueue for other aspects and integrating these was not really easy.
  • There was quite a lot of feedback around projects like KubeRay being used but not necessarily interfacing well with the default Kubernetes scheduler.
  • Slurm on Kubernetes was also brought up quite a few times.
In my mind, bullets 1 and 3 would be more in the WG-Batch domain (but are very relevant).  For 2, KubeRay and specifically RayService should be mentioned in the context of the first use case - standard primitives that multiple components would benefit from.  I.e. the need for RayService to have a nested deployment template as an escape hatch is a challenge that other projects like Kaito share - no way for workload CRDs to simultaneously abstract a pod template and allow users to provide arbitrary parameters, and that was something discussed at this week's SIG-API-Machinery call as an area of collaboration.

As far as scheduling, a gap I expect WG-Serving to take on is that preemption of serving workloads is currently very one dimensional, and there is no way to defend a workload's SLO while simultaneously leveraging the slack headroom all workloads are built around (every time you specify an HPA utilization of 60% for autoscaling, you are implicitly leaving a 33% headroom).  Better description of workload objectives on core workload primitives and stronger automation to defend those objectives (such as backpressure on successive disruption events) will allow better density on clusters, better sharing with batch, and reduce the need for higher level scheduler frameworks to directly place workloads.
 
And a bunch of other items that are probably more suited to WG Device Management.
Thank you Kante Yin for attending the session and providing your insight from a SIG Scheduling perspective!

I have brought this feedback up with Working Group AI and TAG Runtime in the CNCF as well and the point I'd like to make is that considering that one of the goals of WG Serving is to talk to other projects in the ecosystem, I would urge folks getting involved here to also talk to WG AI in the CNCF. One of the reasons for this is also because WG AI and TAG Runtime have invited projects like KubeRay and SkyPilot to present at their forums establishing means for feedback, and I think WG Serving is an excellent opportunity for our project to solicit feedback from these groups.

Agree, soliciting feedback from user groups is an explicit goal of the WG, and I would expect our charter to turn feedback into action much like WG-Batch has succeeded at doing in the last few years.
 

Furthermore, I was in the WG AI meeting of 4th April 2024 and folks are planning to work on a whitepaper doing a survey of using Kubernetes as a scheduler for AI workloads and identifying gaps that these users face, and I think efforts like that can be invaluable feedback to WG Serving.

Looking forward to seeing that.
 

PS - I've dropped a similar note to them as well.

Thank you again, and I hope I can get involved and do my part,
Madhav

On Wed, Apr 3, 2024 at 2:44 PM haosdent <haos...@gmail.com> wrote:
+1

On Wed, Apr 3, 2024 at 1:51 AM 'John Belamaric' via kubernetes-sig-scheduling <kubernetes-s...@googlegroups.com> wrote:
+1 from me, with sig-arch hat

On Tue, Apr 2, 2024 at 7:50 AM Niteesh Rao <nitees...@gmail.com> wrote:
+1

On Tue, Apr 2, 2024 at 9:31 PM Derek Carr <dec...@redhat.com> wrote:
+1

On Tue, Apr 2, 2024 at 8:50 AM Xing Yang <xingy...@gmail.com> wrote:
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CADwwA9tgrtfWnAgzQV%3DQyf1LOLjrNj4MUtsdeMy9wsMosWt%2Bbw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAHROWxTMzAk941utOre834UFP9przYet1Xf%2Bh6dgNnbi43ARsw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/CAF180f15iz7f%2B%3D14PfvpPeyguanSZOSxuZXyjBJwSGMyaWodwQ%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAC_RkjxdSBZxoRWKsgOW4Qij49XZBexf-ArXyevSm6wnghxJjw%40mail.gmail.com.


--
Best Regards,
Haosdent Huang

--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAOvzwNwiJKo5mK0DiabusaVFCmjcyXTMsjCKAM7F4DxfoxsAMg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages