KubeCon follow-up -- DRA Working Group

978 views
Skip to first unread message

Kevin Klues

unread,
Nov 13, 2023, 11:11:59 AM11/13/23
to dev, kubernetes-sig-node

image.png

Dynamic Resource Allocation (DRA) was a hot topic at KubeCon last week. The primary reason being that DRA promises to unlock a whole host of use-cases that require fine-grained sharing and custom configuration of accelerated hardware in Kubernetes. If I had to pick a general, overall theme of this KubeCon it was "How do we make Kubernetes THE best platform for running LLMs and GenAI workloads?" -- and having access to the flexible resource management that DRA provides is a key component of this.

However, such flexibility comes at a cost (specifically as it pertains to scheduling and cluster auto-scaling), and that has raised questions that need to be addressed before moving DRA to beta (and eventually GA).

To help resolve these issues (and help move DRA forward in general), @pohly and I are going to create a formal WG for DRA with ourselves appointed as the "Organizers":
https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md#creation-process-description

More info (including a poll of when / how often people think we should meet) will be coming soon.

Looking forward to working with all of you more closely on this!

Kevin & Patrick

Aldo Culquicondor

unread,
Nov 13, 2023, 11:19:56 AM11/13/23
to klu...@gmail.com, dev, kubernetes-sig-node
I wonder if it makes sense to latch on the existing WG Batch, instead.
WG Batch is already attended by the relevant leads from SIG Scheduling and SIG Autoscaling and SIG Node (more specifically, the folks involved in topology manager), as well as users generally interested in running AI and HPC workloads on Kubernetes.
Furthermore, hardware support is already part of the charter https://github.com/kubernetes/community/blob/master/wg-batch/charter.md#deliverables

We could consider adding an additional meeting at a different time so that we can better cover the members around the globe.

Aldo


--
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@kubernetes.io.
To view this discussion on the web visit https://groups.google.com/a/kubernetes.io/d/msgid/dev/CAJR1fVqfG_A_5YkCHj5nugJUh1_sL%2BBTYRx03mdXngz9ADcaAg%40mail.gmail.com.

Aldo Culquicondor

unread,
Nov 13, 2023, 11:21:19 AM11/13/23
to klu...@gmail.com, wg-batch, dev, kubernetes-sig-node
I should have added +wg-batch :)
Aldo

v

unread,
Nov 13, 2023, 11:54:37 AM11/13/23
to Aldo Culquicondor, klu...@gmail.com, wg-batch, dev, kubernetes-sig-node
+1 to support an additional meeting time!

--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.

Vasubabu Kandimalla

unread,
Nov 13, 2023, 12:18:57 PM11/13/23
to vso...@gmail.com, Aldo Culquicondor, klu...@gmail.com, wg-batch, dev, kubernetes-sig-node


I'm really looking forward to hearing more. I wish to participate in the development. 

let me know how to become a contributor.


-Vasubabu. K






--







Thanks,
vasubabu.kandimalla

Kevin Klues

unread,
Nov 14, 2023, 6:25:50 AM11/14/23
to Vasubabu Kandimalla, vso...@gmail.com, Aldo Culquicondor, wg-batch
I glanced through the wg-batch charter, and DRA does seem to fit nicely within its scope.

Specifically this:
* Runtime and scheduling support for specialized hardware (GPUs, NUMA, RDMA, etc.)

I also chatted with Patrick, and he agreed that it makes sense to bundle these discussions under the umbrella of wg-batch (if for no other reason than to avoid the overhead of maintaining yet another working group for something that will (hopefully) be short-lived).

I have therefore moved dev@ and kubernetes-sig-node@ to BCC and kept wg-batch@ as the "primary" email for this discussion going forward.

For those interested in taking part in the DRA discussions, but are not yet part of wg-batch, please take a look at the following README to learn how to join
https://github.com/kubernetes/community/blob/master/wg-batch/README.md#meetings

Regarding a separate meeting time, focused specifically on DRA -- let's start with a discussion at the normal meeting time this Thursday and take it from there. I've added DRA to the agenda.

Thanks

Kevin
--
~Kevin
Reply all
Reply to author
Forward
0 new messages