Making Kubernetes great for accelerated workloads: a serving working group

497 views
Skip to first unread message

Sergey Kanzhelev

unread,
Apr 2, 2024, 4:49:18 AMApr 2
to kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com

This is a formal proposal to create a new working group that discusses and enhances support of inference serving for accelerated workloads. This document sets the context and outlines goals of the Working Group. Please share your feedback and suggestions, especially your own goals and use cases, so we can gauge the level of interest.


This email is also a google doc. If you want to comment on specific wording or make suggestions via comments in google doc, here is the link: Making Kubernetes great for accelerated workloads: a serving working group 


Context and proposed goals:


Kubernetes is in high demand for training and serving large language model (Generative AI) workloads using accelerators. While WG-Batch has been working for several years to enable large scale batch jobs for training and the DRA effort under SIG-Node has generated a number of important enabling scenarios, the inference side of the equation has been more diffusely represented or a secondary consideration among discussions in multiple SIGs. We believe there is a need for a working group to concentrate discussions on simplicity, reliability, and efficiency of accelerated inference workloads.


LLM workloads tend to be 1:1 with the size of the node or span multiple nodes in a single replica, put heavy demands on autoscaling of pods and clusters, can fail more often / take longer to start than regular web applications, and need/leverage a variety of workload primitives (StatefulSets are heavily used by large GenAI inference workloads) that are not always designed to support them.


The suggestion to start the WG-Serving was discussed at KubeCon EU in various forums and there was clear interest in addressing the proposed problems. After speaking with the SIG Apps leads and the connection to existing workload controllers, we believe it is the best suited SIG to host this working group, but it will depend on work in multiple SIGs just like WG-Batch.


Proposed goals:


  • Provide concrete input to other SIGs and WG around needs of inference workloads.

  • Gather requirements for serving workloads (inference primarily, but benefiting other non-batch use cases where possible) that have broad community alignment from practitioners, distros, and vendors.

  • Directly improve key kubernetes workload controllers when used with accelerators and the most common inference serving frameworks and model servers.

  • Partner with existing ecosystem projects like kServe, Seldon, Kaito, and others to identify, extract, or implement common shared problems (like Kueue abstracted deferred scheduling for multiple batch frameworks).

  • Explore new projects that improve orchestration, scaling, and load balancing of inference workloads and compose well with other workloads on Kubernetes


Many use cases are collected in this document: Use cases proposed for WG-Serving. Summarizing a few here:


  • Better workload controllers for inference workloads, especially those that span more than one host (e.g. LeaderWorkerSet)

  • Autoscaling and load balancing accelerated workloads is very important for cost, but it is weakly supported and slow

  • Running smaller pre-production serving workloads is hard relative to batch

    • Because dev/test/prototype serving workloads aren’t quite interruptible, but don’t run forever and need to scale to zero when unused

    • Because it’s hard to configure accelerators for sharing / it doesn’t work well 

  • Accelerators are hard to use consistently across multiple clouds for workloads (which are mostly serving workloads and pre-prod workloads)

  • Large accelerated workloads are more vulnerable to disruption, slower to start, and need better primitives for mitigating disruption (with limited capacity)


We would like to gather feedback from the involved SIGs and propose a charter that would ensure Kubernetes is an excellent foundation to run inference workloads.  The working group would run until inference workloads are as well supported as microservices, stateful workloads, or batch - which we believe based on experience in WG-Batch will take 1-2 years.


Answers to the working group governance questions:


> What is the exact problem this group is trying to solve?


The context above sets the context and goals of the Working Group. Making Kubernetes the natural choice when choosing the platform to run Serving Workload is a long term goal of the Working Group.

> What is the artifact that this group will deliver, and to whom?


We envision contributions to various SIGs to address near term pain points and allow proper extensibility for Serving workloads. The Working Group may also own a new repository under kubernetes-sigs or contribute to open projects implementing primitives that support simple, reliable, and efficient accelerated inference. Specifics of this will be a point of discussion for the charter of the working group.


> How does the group know when the problem solving process is completed, and it is time for the Working Group to dissolve?


When existing serving frameworks all converge on a set of common components or a new serving framework will choose the k8s as a first platform to run inference.


> Who are all of the stakeholder SIGs involved in this problem this group is trying to solve?


  • SIG Apps as an primary SIG

  • SIG Architecture

  • SIG Node

  • SIG Scheduling

  • SIG Autoscaling

  • SIG Network

  • SIG Storage


> What are the meeting mechanics (frequency, duration, roles)?


The plan to meet at least bi-weekly. 


> Does the goal of the Working Group represent the needs of the project as a whole, or is it focused on the interests of a narrow set of contributors or companies?


The goal of the WG to make Kubernetes to be the natural choice when thinking of serving accelerated workloads and to reduce the operational toil involved. Given the broad experimentation and exploration with open-weight large language models, as well as the extensive use of Kubernetes to host foundation models, we believe this will benefit most large platform teams. Individual companies will be able to innovate as an extension to the baseline support.


> Who will chair the group, and ensure it continues to meet these requirements?


The question is still open, but Sergey Kanzhelev volunteered to chair the working group.

 

> Is diversity well-represented in the Working Group?

We welcome and encourage contributors of all backgrounds and geographies to participate. As for corp diversity, a few companies already expressed interest to participate and contribute and we are very interested in others who wish to advise the work.


/Sergey, Clayton, with contributions of many


Davanum Srinivas

unread,
Apr 2, 2024, 8:34:07 AMApr 2
to Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1 from me Sergey, folks!

--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.


--
Davanum Srinivas :: https://twitter.com/dims

Xing Yang

unread,
Apr 2, 2024, 8:49:55 AMApr 2
to Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1. Great initiative!

Thanks,
Xing


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CANw6fcELihKjcS4YBGqBE1rJdOw0u9mTZDNqcuhzGbe4RLS5Kg%40mail.gmail.com.

Derek Carr

unread,
Apr 2, 2024, 9:31:03 AMApr 2
to Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CADwwA9tgrtfWnAgzQV%3DQyf1LOLjrNj4MUtsdeMy9wsMosWt%2Bbw%40mail.gmail.com.

Cloud Melon

unread,
Apr 2, 2024, 9:48:50 AMApr 2
to Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1 ;)

Best wishes from Paris 
Mélony

Le 2 avr. 2024 à 15:31, Derek Carr <dec...@redhat.com> a écrit :


You received this message because you are subscribed to the Google Groups "kubernetes-sig-apps" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.

Niteesh Rao

unread,
Apr 2, 2024, 10:58:17 AMApr 2
to Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com

Mrunal

unread,
Apr 2, 2024, 10:58:20 AMApr 2
to Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com

Tim Hockin

unread,
Apr 2, 2024, 11:43:54 AMApr 2
to Wei Huang, Derek Carr, Mrunal, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

On Tue, Apr 2, 2024 at 8:38 AM Wei Huang <hwe...@gmail.com> wrote:
>
> +1
>
> Regards,
> --------------
> Wei Huang
> hwe...@gmail.com
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAMp598U1qO7gdgLdkpsjwEc5K9-u6p5ix61XEBVhvZJrme5CTQ%40mail.gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/60bd4bd0-bd62-41ae-a938-978e0f5d0916%40Spark.

Wei Huang

unread,
Apr 2, 2024, 11:43:54 AMApr 2
to Derek Carr, Mrunal, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

Regards,
--------------
Wei Huang
hwe...@gmail.com
You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAMp598U1qO7gdgLdkpsjwEc5K9-u6p5ix61XEBVhvZJrme5CTQ%40mail.gmail.com.

Marlow Weston

unread,
Apr 2, 2024, 11:57:44 AMApr 2
to Tim Hockin, Wei Huang, Derek Carr, Mrunal, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-batch, clayton...@google.com

Ricardo Rocha

unread,
Apr 2, 2024, 11:57:48 AMApr 2
to Tim Hockin, Wei Huang, Derek Carr, Mrunal, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1
signature.asc

Abdullah Gharaibeh

unread,
Apr 2, 2024, 12:46:38 PMApr 2
to Ricardo Rocha, Tim Hockin, Wei Huang, Derek Carr, Mrunal, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-sig-architecture, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-sig-scheduling, kubernetes-si...@googlegroups.com, kubernetes-sig-apps, kubernetes-...@googlegroups.com, wg-batch, Clayton Coleman

Antonio Ojea

unread,
Apr 2, 2024, 1:04:02 PMApr 2
to Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com

Daniel Vega-Myhre

unread,
Apr 2, 2024, 1:41:22 PMApr 2
to Antonio Ojea, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

Aldo Culquicondor

unread,
Apr 2, 2024, 1:43:07 PMApr 2
to Daniel Vega-Myhre, Antonio Ojea, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1
Aldo

Rupeng Liu

unread,
Apr 2, 2024, 1:55:38 PMApr 2
to Aldo Culquicondor, Daniel Vega-Myhre, Antonio Ojea, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

You received this message because you are subscribed to the Google Groups "kubernetes-sig-apps" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.

John Belamaric

unread,
Apr 2, 2024, 1:55:42 PMApr 2
to Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1 from me, with sig-arch hat

You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/CAF180f15iz7f%2B%3D14PfvpPeyguanSZOSxuZXyjBJwSGMyaWodwQ%40mail.gmail.com.

Fei Guo

unread,
Apr 2, 2024, 3:41:37 PMApr 2
to John Belamaric, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

'John Belamaric' via kubernetes-sig-node <kubernete...@googlegroups.com> 于2024年4月2日周二 10:51写道:

haosdent

unread,
Apr 3, 2024, 5:15:00 AMApr 3
to John Belamaric, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
+1

You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAC_RkjxdSBZxoRWKsgOW4Qij49XZBexf-ArXyevSm6wnghxJjw%40mail.gmail.com.


--
Best Regards,
Haosdent Huang

Aldo Culquicondor

unread,
Apr 5, 2024, 4:37:02 PMApr 5
to Clayton Coleman, Madhav Jivrajani, haosdent, John Belamaric, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
> In my mind, bullets 1 and 3 would be more in the WG-Batch domain (but are very relevant).

Indeed. We welcome you to open issues in the Kueue repo and to come to the wg-batch meetings if you have specific proposals.

Aldo


On Fri, Apr 5, 2024 at 3:30 PM 'Clayton Coleman' via kubernetes-sig-scheduling <kubernetes-s...@googlegroups.com> wrote:
Thank you Madhav for the connections, that will be very helpful.  A few comments inline

On Fri, Apr 5, 2024 at 1:31 PM Madhav Jivrajani <madha...@gmail.com> wrote:
Hi folks,

Thank you so much for starting this, needless to say +1!

What I did want to drop in and say was that I had the opportunity to hold an unconference session at AI Hub in KubeCon specifically for what we need out of Kubernetes to help better support these workloads. It was great having this session attended by folks running inference workloads in eclectic ways and the outcomes of this session are summarised below:
  • Folks wanted to use schedulers like Volcano for some aspects of the model lifecycle and other projects like kueue for other aspects and integrating these was not really easy.
  • There was quite a lot of feedback around projects like KubeRay being used but not necessarily interfacing well with the default Kubernetes scheduler.
  • Slurm on Kubernetes was also brought up quite a few times.
In my mind, bullets 1 and 3 would be more in the WG-Batch domain (but are very relevant).  For 2, KubeRay and specifically RayService should be mentioned in the context of the first use case - standard primitives that multiple components would benefit from.  I.e. the need for RayService to have a nested deployment template as an escape hatch is a challenge that other projects like Kaito share - no way for workload CRDs to simultaneously abstract a pod template and allow users to provide arbitrary parameters, and that was something discussed at this week's SIG-API-Machinery call as an area of collaboration.

As far as scheduling, a gap I expect WG-Serving to take on is that preemption of serving workloads is currently very one dimensional, and there is no way to defend a workload's SLO while simultaneously leveraging the slack headroom all workloads are built around (every time you specify an HPA utilization of 60% for autoscaling, you are implicitly leaving a 33% headroom).  Better description of workload objectives on core workload primitives and stronger automation to defend those objectives (such as backpressure on successive disruption events) will allow better density on clusters, better sharing with batch, and reduce the need for higher level scheduler frameworks to directly place workloads.
 
And a bunch of other items that are probably more suited to WG Device Management.
Thank you Kante Yin for attending the session and providing your insight from a SIG Scheduling perspective!

I have brought this feedback up with Working Group AI and TAG Runtime in the CNCF as well and the point I'd like to make is that considering that one of the goals of WG Serving is to talk to other projects in the ecosystem, I would urge folks getting involved here to also talk to WG AI in the CNCF. One of the reasons for this is also because WG AI and TAG Runtime have invited projects like KubeRay and SkyPilot to present at their forums establishing means for feedback, and I think WG Serving is an excellent opportunity for our project to solicit feedback from these groups.

Agree, soliciting feedback from user groups is an explicit goal of the WG, and I would expect our charter to turn feedback into action much like WG-Batch has succeeded at doing in the last few years.
 

Furthermore, I was in the WG AI meeting of 4th April 2024 and folks are planning to work on a whitepaper doing a survey of using Kubernetes as a scheduler for AI workloads and identifying gaps that these users face, and I think efforts like that can be invaluable feedback to WG Serving.

Looking forward to seeing that.
 

PS - I've dropped a similar note to them as well.

Thank you again, and I hope I can get involved and do my part,
Madhav

Madhav Jivrajani

unread,
Apr 6, 2024, 8:15:47 PMApr 6
to haosdent, John Belamaric, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
Hi folks,

Thank you so much for starting this, needless to say +1!

What I did want to drop in and say was that I had the opportunity to hold an unconference session at AI Hub in KubeCon specifically for what we need out of Kubernetes to help better support these workloads. It was great having this session attended by folks running inference workloads in eclectic ways and the outcomes of this session are summarised below:
  • Folks wanted to use schedulers like Volcano for some aspects of the model lifecycle and other projects like kueue for other aspects and integrating these was not really easy.
  • There was quite a lot of feedback around projects like KubeRay being used but not necessarily interfacing well with the default Kubernetes scheduler.
  • Slurm on Kubernetes was also brought up quite a few times.
And a bunch of other items that are probably more suited to WG Device Management.
Thank you Kante Yin for attending the session and providing your insight from a SIG Scheduling perspective!

I have brought this feedback up with Working Group AI and TAG Runtime in the CNCF as well and the point I'd like to make is that considering that one of the goals of WG Serving is to talk to other projects in the ecosystem, I would urge folks getting involved here to also talk to WG AI in the CNCF. One of the reasons for this is also because WG AI and TAG Runtime have invited projects like KubeRay and SkyPilot to present at their forums establishing means for feedback, and I think WG Serving is an excellent opportunity for our project to solicit feedback from these groups.

Furthermore, I was in the WG AI meeting of 4th April 2024 and folks are planning to work on a whitepaper doing a survey of using Kubernetes as a scheduler for AI workloads and identifying gaps that these users face, and I think efforts like that can be invaluable feedback to WG Serving.

PS - I've dropped a similar note to them as well.

Thank you again, and I hope I can get involved and do my part,
Madhav
On Wed, Apr 3, 2024 at 2:44 PM haosdent <haos...@gmail.com> wrote:

Clayton Coleman

unread,
Apr 6, 2024, 8:15:57 PMApr 6
to Madhav Jivrajani, haosdent, John Belamaric, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
Thank you Madhav for the connections, that will be very helpful.  A few comments inline

On Fri, Apr 5, 2024 at 1:31 PM Madhav Jivrajani <madha...@gmail.com> wrote:
Hi folks,

Thank you so much for starting this, needless to say +1!

What I did want to drop in and say was that I had the opportunity to hold an unconference session at AI Hub in KubeCon specifically for what we need out of Kubernetes to help better support these workloads. It was great having this session attended by folks running inference workloads in eclectic ways and the outcomes of this session are summarised below:
  • Folks wanted to use schedulers like Volcano for some aspects of the model lifecycle and other projects like kueue for other aspects and integrating these was not really easy.
  • There was quite a lot of feedback around projects like KubeRay being used but not necessarily interfacing well with the default Kubernetes scheduler.
  • Slurm on Kubernetes was also brought up quite a few times.
In my mind, bullets 1 and 3 would be more in the WG-Batch domain (but are very relevant).  For 2, KubeRay and specifically RayService should be mentioned in the context of the first use case - standard primitives that multiple components would benefit from.  I.e. the need for RayService to have a nested deployment template as an escape hatch is a challenge that other projects like Kaito share - no way for workload CRDs to simultaneously abstract a pod template and allow users to provide arbitrary parameters, and that was something discussed at this week's SIG-API-Machinery call as an area of collaboration.

As far as scheduling, a gap I expect WG-Serving to take on is that preemption of serving workloads is currently very one dimensional, and there is no way to defend a workload's SLO while simultaneously leveraging the slack headroom all workloads are built around (every time you specify an HPA utilization of 60% for autoscaling, you are implicitly leaving a 33% headroom).  Better description of workload objectives on core workload primitives and stronger automation to defend those objectives (such as backpressure on successive disruption events) will allow better density on clusters, better sharing with batch, and reduce the need for higher level scheduler frameworks to directly place workloads.
 
And a bunch of other items that are probably more suited to WG Device Management.
Thank you Kante Yin for attending the session and providing your insight from a SIG Scheduling perspective!

I have brought this feedback up with Working Group AI and TAG Runtime in the CNCF as well and the point I'd like to make is that considering that one of the goals of WG Serving is to talk to other projects in the ecosystem, I would urge folks getting involved here to also talk to WG AI in the CNCF. One of the reasons for this is also because WG AI and TAG Runtime have invited projects like KubeRay and SkyPilot to present at their forums establishing means for feedback, and I think WG Serving is an excellent opportunity for our project to solicit feedback from these groups.
Agree, soliciting feedback from user groups is an explicit goal of the WG, and I would expect our charter to turn feedback into action much like WG-Batch has succeeded at doing in the last few years.
 
Furthermore, I was in the WG AI meeting of 4th April 2024 and folks are planning to work on a whitepaper doing a survey of using Kubernetes as a scheduler for AI workloads and identifying gaps that these users face, and I think efforts like that can be invaluable feedback to WG Serving.
Looking forward to seeing that.
 

abhishek malvankar

unread,
Apr 6, 2024, 8:16:03 PMApr 6
to Aldo Culquicondor, Clayton Coleman, Madhav Jivrajani, haosdent, John Belamaric, Niteesh Rao, Derek Carr, Xing Yang, Davanum Srinivas, Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
Great discussion. While preemption is important, I think that supporting gangs/podgroups natively in scheduling is an important use case that could benefit both serving and batch.

Abhishek

Gaurav Singh

unread,
Apr 6, 2024, 8:16:08 PMApr 6
to abhishek malvankar, Aldo Culquicondor, Clayton Coleman, Davanum Srinivas, Derek Carr, John Belamaric, Madhav Jivrajani, Niteesh Rao, Sergey Kanzhelev, Xing Yang, haosdent, kubernete...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernetes-...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
+1 on what @abhishek is talking about . We need gang capability in scheduler . 

axe zhan

unread,
Apr 7, 2024, 8:30:41 AMApr 7
to Gaurav Singh, abhishek malvankar, Aldo Culquicondor, Clayton Coleman, Davanum Srinivas, Derek Carr, John Belamaric, Madhav Jivrajani, Niteesh Rao, Sergey Kanzhelev, Xing Yang, haosdent, kubernete...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernetes-...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
+1 for having gang capability in scheduler.

Gaurav Singh <gaus...@redhat.com> 于2024年4月6日周六 18:29写道:

Sergey Kanzhelev

unread,
Apr 17, 2024, 5:12:13 AMApr 17
to kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io, clayton...@google.com
Hi,

With a lot of support for the idea, the next steps are to formalize the WG creation and pick a time slot for a meeting. We hope to get the first meeting on the calendar next week.

As discussed today at the WG Device Management kick-off meeting, there is a short timeframe to give requirements for 1.30, so we expect an intense and packed meeting agenda from the beginning. And we want to start meetings before the WG will formally be created. 

PR with charter is out: https://github.com/kubernetes/community/pull/7823
Time slot response form: https://forms.gle/27mqbTC1xBP5QPSV9

/Sergey

Sergey Kanzhelev

unread,
Apr 22, 2024, 3:39:30 PMApr 22
to kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
Hi,

As anybody can imagine, finding time for a meeting is hard. The results of a poll showed that Wed 9-10 (Pacific) works for the majority of people. Thank you for being flexible!

The kick off meeting for the WG will happen this Wed, Apr 24, 9:00AM- 10:00AM.

Calendar invite is here

I also hope to get the Slack channel going soon: https://github.com/kubernetes/community/pull/7830 so there is a place to coordinate things.

/Sergey

Clayton Coleman

unread,
Apr 22, 2024, 4:31:53 PMApr 22
to Sergey Kanzhelev, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
I look forward to speaking with everyone who can make it then.  I added two agenda items, but since this is the first meeting I also welcome additional suggestions on clarifications / goals before we dive in.

--
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.

Sergey Kanzhelev

unread,
May 3, 2024, 3:31:13 PMMay 3
to Clayton Coleman, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, kubernetes-s...@googlegroups.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com, kubernetes-...@googlegroups.com, wg-b...@kubernetes.io
Hi,


Please join our WG google group mailing list and slack channel. And see you at the meeting this Wednesday!


Reply all
Reply to author
Forward
0 new messages