This is a formal proposal to create a new working group that discusses and enhances support of inference serving for accelerated workloads. This document sets the context and outlines goals of the Working Group. Please share your feedback and suggestions, especially your own goals and use cases, so we can gauge the level of interest.
This email is also a google doc. If you want to comment on specific wording or make suggestions via comments in google doc, here is the link: Making Kubernetes great for accelerated workloads: a serving working group
Context and proposed goals:
Kubernetes is in high demand for training and serving large language model (Generative AI) workloads using accelerators. While WG-Batch has been working for several years to enable large scale batch jobs for training and the DRA effort under SIG-Node has generated a number of important enabling scenarios, the inference side of the equation has been more diffusely represented or a secondary consideration among discussions in multiple SIGs. We believe there is a need for a working group to concentrate discussions on simplicity, reliability, and efficiency of accelerated inference workloads.
LLM workloads tend to be 1:1 with the size of the node or span multiple nodes in a single replica, put heavy demands on autoscaling of pods and clusters, can fail more often / take longer to start than regular web applications, and need/leverage a variety of workload primitives (StatefulSets are heavily used by large GenAI inference workloads) that are not always designed to support them.
The suggestion to start the WG-Serving was discussed at KubeCon EU in various forums and there was clear interest in addressing the proposed problems. After speaking with the SIG Apps leads and the connection to existing workload controllers, we believe it is the best suited SIG to host this working group, but it will depend on work in multiple SIGs just like WG-Batch.
Proposed goals:
Provide concrete input to other SIGs and WG around needs of inference workloads.
Gather requirements for serving workloads (inference primarily, but benefiting other non-batch use cases where possible) that have broad community alignment from practitioners, distros, and vendors.
Directly improve key kubernetes workload controllers when used with accelerators and the most common inference serving frameworks and model servers.
Partner with existing ecosystem projects like kServe, Seldon, Kaito, and others to identify, extract, or implement common shared problems (like Kueue abstracted deferred scheduling for multiple batch frameworks).
Explore new projects that improve orchestration, scaling, and load balancing of inference workloads and compose well with other workloads on Kubernetes
Many use cases are collected in this document: Use cases proposed for WG-Serving. Summarizing a few here:
Better workload controllers for inference workloads, especially those that span more than one host (e.g. LeaderWorkerSet)
Autoscaling and load balancing accelerated workloads is very important for cost, but it is weakly supported and slow
Running smaller pre-production serving workloads is hard relative to batch
Because dev/test/prototype serving workloads aren’t quite interruptible, but don’t run forever and need to scale to zero when unused
Because it’s hard to configure accelerators for sharing / it doesn’t work well
Accelerators are hard to use consistently across multiple clouds for workloads (which are mostly serving workloads and pre-prod workloads)
Large accelerated workloads are more vulnerable to disruption, slower to start, and need better primitives for mitigating disruption (with limited capacity)
We would like to gather feedback from the involved SIGs and propose a charter that would ensure Kubernetes is an excellent foundation to run inference workloads. The working group would run until inference workloads are as well supported as microservices, stateful workloads, or batch - which we believe based on experience in WG-Batch will take 1-2 years.
Answers to the working group governance questions:
> What is the exact problem this group is trying to solve?
The context above sets the context and goals of the Working Group. Making Kubernetes the natural choice when choosing the platform to run Serving Workload is a long term goal of the Working Group.
> What is the artifact that this group will deliver, and to whom?
We envision contributions to various SIGs to address near term pain points and allow proper extensibility for Serving workloads. The Working Group may also own a new repository under kubernetes-sigs or contribute to open projects implementing primitives that support simple, reliable, and efficient accelerated inference. Specifics of this will be a point of discussion for the charter of the working group.
> How does the group know when the problem solving process is completed, and it is time for the Working Group to dissolve?
When existing serving frameworks all converge on a set of common components or a new serving framework will choose the k8s as a first platform to run inference.
> Who are all of the stakeholder SIGs involved in this problem this group is trying to solve?
SIG Apps as an primary SIG
SIG Architecture
SIG Node
SIG Scheduling
SIG Autoscaling
SIG Network
SIG Storage
> What are the meeting mechanics (frequency, duration, roles)?
The plan to meet at least bi-weekly.
> Does the goal of the Working Group represent the needs of the project as a whole, or is it focused on the interests of a narrow set of contributors or companies?
The goal of the WG to make Kubernetes to be the natural choice when thinking of serving accelerated workloads and to reduce the operational toil involved. Given the broad experimentation and exploration with open-weight large language models, as well as the extensive use of Kubernetes to host foundation models, we believe this will benefit most large platform teams. Individual companies will be able to innovate as an extension to the baseline support.
> Who will chair the group, and ensure it continues to meet these requirements?
The question is still open, but Sergey Kanzhelev volunteered to chair the working group.
> Is diversity well-represented in the Working Group?
We welcome and encourage contributors of all backgrounds and geographies to participate. As for corp diversity, a few companies already expressed interest to participate and contribute and we are very interested in others who wish to advise the work.
/Sergey, Clayton, with contributions of many
--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.
--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CANw6fcELihKjcS4YBGqBE1rJdOw0u9mTZDNqcuhzGbe4RLS5Kg%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CADwwA9tgrtfWnAgzQV%3DQyf1LOLjrNj4MUtsdeMy9wsMosWt%2Bbw%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "kubernetes-sig-apps" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-apps/CAHROWxTMzAk941utOre834UFP9przYet1Xf%2Bh6dgNnbi43ARsw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAHROWxTMzAk941utOre834UFP9przYet1Xf%2Bh6dgNnbi43ARsw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CAHROWxTMzAk941utOre834UFP9przYet1Xf%2Bh6dgNnbi43ARsw%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAMp598U1qO7gdgLdkpsjwEc5K9-u6p5ix61XEBVhvZJrme5CTQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAO_Rewa%2B9TCM0BGWrO%3DwCiCKdOrc6LeHTzYj9%3D1ejPVP-bb07g%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/ZgwqOwRwEhv%2BZxpa%40rochax1.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAF180f15iz7f%2B%3D14PfvpPeyguanSZOSxuZXyjBJwSGMyaWodwQ%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "kubernetes-sig-apps" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-apps/CAJi5r2q1sabvQhPuBvfSQtdaHBKy62sFH5zcW-G9f%2BcXXQD%3D6g%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/CAF180f15iz7f%2B%3D14PfvpPeyguanSZOSxuZXyjBJwSGMyaWodwQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-node/CAC_RkjxdSBZxoRWKsgOW4Qij49XZBexf-ArXyevSm6wnghxJjw%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAC_RkjxdSBZxoRWKsgOW4Qij49XZBexf-ArXyevSm6wnghxJjw%40mail.gmail.com.
Hi folks,Thank you so much for starting this, needless to say +1!What I did want to drop in and say was that I had the opportunity to hold an unconference session at AI Hub in KubeCon specifically for what we need out of Kubernetes to help better support these workloads. It was great having this session attended by folks running inference workloads in eclectic ways and the outcomes of this session are summarised below:
- Folks wanted to use schedulers like Volcano for some aspects of the model lifecycle and other projects like kueue for other aspects and integrating these was not really easy.
- There was quite a lot of feedback around projects like KubeRay being used but not necessarily interfacing well with the default Kubernetes scheduler.
- Slurm on Kubernetes was also brought up quite a few times.
And a bunch of other items that are probably more suited to WG Device Management.Thank you Kante Yin for attending the session and providing your insight from a SIG Scheduling perspective!I have brought this feedback up with Working Group AI and TAG Runtime in the CNCF as well and the point I'd like to make is that considering that one of the goals of WG Serving is to talk to other projects in the ecosystem, I would urge folks getting involved here to also talk to WG AI in the CNCF. One of the reasons for this is also because WG AI and TAG Runtime have invited projects like KubeRay and SkyPilot to present at their forums establishing means for feedback, and I think WG Serving is an excellent opportunity for our project to solicit feedback from these groups.
Furthermore, I was in the WG AI meeting of 4th April 2024 and folks are planning to work on a whitepaper doing a survey of using Kubernetes as a scheduler for AI workloads and identifying gaps that these users face, and I think efforts like that can be invaluable feedback to WG Serving.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAOvzwNwiJKo5mK0DiabusaVFCmjcyXTMsjCKAM7F4DxfoxsAMg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CA%2BhQ723OiLeFubFJ%3DrpfnngH6-55NiOfwpcZ2_Dy%3DU%3DvGKxsXQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAKVvAvUFHsXKSnv%2B50xFEXaqTma4qweYeEJPKA11w99om4L__w%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/CA%2Bsr0%2BAw_Kzg0Thh7n9uhp5J72bw-YOdWCTuvNbDLmc09LNXbg%40mail.gmail.com.
Meeting time: weekly Wed, 9AM PT
Mailing list: https://groups.google.com/a/kubernetes.io/g/wg-serving
Slack: #wg-serving
Notes and agenda: https://docs.google.com/document/d/1aExJFtaLnO-TM6_2uILgI8NI0IjOm7FcwLABBKEMEo0/edit