Hi all,
The Kubernetes Working Group Serving was created to support development of AI inference stack on Kubernetes. The goal of this working group is to ensure that the Kubernetes is an orchestration platform of choice for inference workload. This goal was accomplished and we are disbanding the working group.
The WG Serving formed workstreams to collect requirements from various model servers, hardware providers, and inference vendors. This work resulted in a common understanding of inference workload specifics and trends and laid the foundation for improvements across many SIGs in Kubernetes.
The working group oversaw several key evolutions to the role of load balancing and workloads - the inference gateway was adopted as a request scheduler, multiple groups have worked to standardize AI gateway functionality, and early inference gateway participants went on to seed agent networking in SIG Network. The use cases and problem statements informed the design of AIBrix [1], now a CNCF hosted project. And many of the unresolved problems in distributed inference - especially benchmarking and recommended best practices - have been picked up by the llm-d [2] project which hybridizes the infrastructure and ML ecosystems and is better able to steer model server co-evolution.
In particular, we believe llm-d and AIBrix represent more appropriate forums for driving requirements to Kubernetes SIGs than this working group. llm-d's goal is to provide well-lit paths for achieving state-of-the-art inference and aims to provide recommendations that can compose into existing inference user platforms. AIBrix provides a complete platform solution for cost efficient LLM inference.
WG Serving helped with Kubernetes AI Conformance [3] requirements and llm-d leveraging multiple components from the profile and making recommendations to end users consistent with Kubernetes direction (Kueue, inference gateway, LWS, DRA, etc.). Widely adopted patterns and solutions are expected to go into the conformance program.
All the efforts currently running inside the WG Serving can be migrated to other WGs or to SIGs directly, requirements for them will be discussed in SIGs and llm-d community. Specifically:
Autoscaling related questions - mostly related to fast bootstrap - will be either SIG Node or SIG Scheduling.
Multi-host, multi-node work can continue as part of the SIG Apps (e.g. for LWS project) and DRA requirements discussed in WG Device Management.
Orchestration will be covered by SIG Scheduling and SIG Node.
Requirements for DRA will be discussed in WG Device Management.
The Gateway API Inference Extension [4] project is already sponsored by SIG Network and it will stay this way.
The Serving Catalog [5] work can be moved to the Inference Perf [6] project. Originally it was designed for a larger scope, but was used mostly for Inference perf since.
The Inference Perf project is sponsored by SIG Scalability and no change of ownership is needed.
Cheers,
Yuan Tang On behalf of Kubernetes WG Serving Co-Chairs
[1] https://github.com/vllm-project/aibrix
[2] https://github.com/llm-d/llm-dÂ
[3] https://github.com/cncf/k8s-ai-conformanceÂ
[4] https://github.com/kubernetes-sigs/gateway-api-inference-extensionÂ
[5] https://github.com/kubernetes-sigs/wg-serving/tree/main/serving-catalog
[6] https://github.com/kubernetes-sigs/inference-perf--
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/CAELAyfb4-zQKmOOkKXr3tAOxyrymZP_235GjCS2ggZsBMZBEPw%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/dev/CANw6fcGNb9SDZv74UGhDn9JRa2zx-oYBGa%2BRGeV7bTrOMzUKew%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "sig-apps" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sig-apps+u...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-apps/CALSq1yWD5BAPGkpc1YpnCO3BY4jmuj_3rvS%3DU84rP4nJ_0xnww%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sig-architectu...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-architecture/CAAdXToQC78Q5Mej57K0jCSnhuF0rLSmw_AKqZp8eCJh9nc93ew%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "sig-apps" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sig-apps+u...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-apps/CAFBRNcMgKvaBBKE-3q0-FvQBTrqeeXyWDo2zoWT7xaHwLy1xOw%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/dev/CAOvzwNx51Yit75BBg98-78_L_DoUiawUzo9EY18vVgms%2BLTUgg%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/CAELAyfb4-zQKmOOkKXr3tAOxyrymZP_235GjCS2ggZsBMZBEPw%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "wg-serving" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wg-serving+...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/wg-serving/CAELAyfb4-zQKmOOkKXr3tAOxyrymZP_235GjCS2ggZsBMZBEPw%40mail.gmail.com.
Fantastic! Great cap to all your hard work
Gerry Seidman President, AuriStor 917-501-8287 ge...@auriStor.com
----
You received this message because you are subscribed to the Google Groups "sig-storage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sig-storage...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-storage/CAELAyfb4-zQKmOOkKXr3tAOxyrymZP_235GjCS2ggZsBMZBEPw%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sig-node+u...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-node/546f130b-ead6-4d5e-b626-c673cdc283aa%40auristor.com.
Fantastic! Great cap to all your hard work
Gerry Seidman President, AuriStor 917-501-8287 ge...@auriStor.com
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/dev/CAOZRXm9P0pW1%3D74CEuTPUB2STudJYZNHes6zPLiJ8YexiLGQzw%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sig-network...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-network/CAELAyfb4-zQKmOOkKXr3tAOxyrymZP_235GjCS2ggZsBMZBEPw%40mail.gmail.com.