Re: KubeCon follow-up -- DRA Working Group

1,122 views
Skip to first unread message

Aldo Culquicondor

unread,
Nov 13, 2023, 11:21:13 AM11/13/23
to klu...@gmail.com, wg-batch, dev, kubernetes-sig-node
I should have added +wg-batch :)
Aldo


On Mon, Nov 13, 2023 at 11:19 AM Aldo Culquicondor <aco...@google.com> wrote:
I wonder if it makes sense to latch on the existing WG Batch, instead.
WG Batch is already attended by the relevant leads from SIG Scheduling and SIG Autoscaling and SIG Node (more specifically, the folks involved in topology manager), as well as users generally interested in running AI and HPC workloads on Kubernetes.
Furthermore, hardware support is already part of the charter https://github.com/kubernetes/community/blob/master/wg-batch/charter.md#deliverables

We could consider adding an additional meeting at a different time so that we can better cover the members around the globe.

Aldo


On Mon, Nov 13, 2023 at 11:11 AM Kevin Klues <klu...@gmail.com> wrote:

image.png

Dynamic Resource Allocation (DRA) was a hot topic at KubeCon last week. The primary reason being that DRA promises to unlock a whole host of use-cases that require fine-grained sharing and custom configuration of accelerated hardware in Kubernetes. If I had to pick a general, overall theme of this KubeCon it was "How do we make Kubernetes THE best platform for running LLMs and GenAI workloads?" -- and having access to the flexible resource management that DRA provides is a key component of this.

However, such flexibility comes at a cost (specifically as it pertains to scheduling and cluster auto-scaling), and that has raised questions that need to be addressed before moving DRA to beta (and eventually GA).

To help resolve these issues (and help move DRA forward in general), @pohly and I are going to create a formal WG for DRA with ourselves appointed as the "Organizers":
https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md#creation-process-description

More info (including a poll of when / how often people think we should meet) will be coming soon.

Looking forward to working with all of you more closely on this!

Kevin & Patrick

--
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@kubernetes.io.
To view this discussion on the web visit https://groups.google.com/a/kubernetes.io/d/msgid/dev/CAJR1fVqfG_A_5YkCHj5nugJUh1_sL%2BBTYRx03mdXngz9ADcaAg%40mail.gmail.com.

v

unread,
Nov 13, 2023, 11:53:00 AM11/13/23
to Aldo Culquicondor, klu...@gmail.com, wg-batch, dev, kubernetes-sig-node
+1 to support an additional meeting time!

--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.

Vasubabu Kandimalla

unread,
Nov 13, 2023, 2:39:31 PM11/13/23
to vso...@gmail.com, Aldo Culquicondor, klu...@gmail.com, wg-batch, dev, kubernetes-sig-node


I'm really looking forward to hearing more. I wish to participate in the development. 

let me know how to become a contributor.


-Vasubabu. K






--







Thanks,
vasubabu.kandimalla

Kevin Klues

unread,
Nov 14, 2023, 6:25:41 AM11/14/23
to Vasubabu Kandimalla, vso...@gmail.com, Aldo Culquicondor, wg-batch
I glanced through the wg-batch charter, and DRA does seem to fit nicely within its scope.

Specifically this:
* Runtime and scheduling support for specialized hardware (GPUs, NUMA, RDMA, etc.)

I also chatted with Patrick, and he agreed that it makes sense to bundle these discussions under the umbrella of wg-batch (if for no other reason than to avoid the overhead of maintaining yet another working group for something that will (hopefully) be short-lived).

I have therefore moved dev@ and kubernetes-sig-node@ to BCC and kept wg-batch@ as the "primary" email for this discussion going forward.

For those interested in taking part in the DRA discussions, but are not yet part of wg-batch, please take a look at the following README to learn how to join
https://github.com/kubernetes/community/blob/master/wg-batch/README.md#meetings

Regarding a separate meeting time, focused specifically on DRA -- let's start with a discussion at the normal meeting time this Thursday and take it from there. I've added DRA to the agenda.

Thanks

Kevin
--
~Kevin

abhishek malvankar

unread,
Nov 14, 2023, 10:51:50 AM11/14/23
to Kevin Klues, Vasubabu Kandimalla, vso...@gmail.com, Aldo Culquicondor, wg-batch
This is great! Maybe in the meeting, we can discuss what features are needed from the user perspective and if possible prioritize them.

Thanks,

Abhishek

Kevin Klues

unread,
Nov 14, 2023, 11:03:30 AM11/14/23
to abhishek malvankar, Vasubabu Kandimalla, vso...@gmail.com, Aldo Culquicondor, wg-batch
It looks like this is one of the off-weeks for the wg-batch meeting, and next week is Thanksgiving (so I'm assuming there is no meeting then either).

I've therefore proposed it on the agenda for the next possible meeting (December 7th).

However, given that it was canceled last week and looks to be canceled next week as well, does it actually make sense to meet this week in a one-off meeting to discuss?

Kevin
--
~Kevin

v

unread,
Nov 14, 2023, 11:32:47 AM11/14/23
to Kevin Klues, abhishek malvankar, Vasubabu Kandimalla, Aldo Culquicondor, wg-batch
To not lose the idea, it would be really great to consider a second meeting time for batch, period. 7am Pacific is quite early, and I suspect there are others (like myself) that want to attend but haven't been able to because of the time.

Marlow Weston

unread,
Nov 14, 2023, 11:45:34 AM11/14/23
to v, Kevin Klues, Kevin Klues, abhishek malvankar, Vasubabu Kandimalla, Aldo Culquicondor, wg-batch
Can we simply record the session, and then have someone host another?  Do we want to step up the cadence, maybe just for this week, and have both times for this week?  @Kevin Klues  if you record a brief discussion of it to begin, maybe kick off the second session with that recording?


"Inspiration exists but it has to find you working."
--Pablo Picasso


Aldo Culquicondor

unread,
Nov 14, 2023, 12:05:24 PM11/14/23
to Marlow Weston, v, Kevin Klues, Kevin Klues, abhishek malvankar, Vasubabu Kandimalla, wg-batch
I don't think it's appropriate to host a meeting this week with such a short notice, and without having discussed with other organizers whether to change the frequency.

I would be ok with us hosting a meeting on the 23th. I'm not in the US, and neither is Maciej, Swati, Marcin or Abishek (please correct me if I'm wrong). Otherwise, the 7th should be fine.

Let's add cadence as the second topic in the agenda, once we have agreed on the time.

Aldo

abhishek malvankar

unread,
Nov 14, 2023, 2:34:37 PM11/14/23
to Aldo Culquicondor, Kevin Klues, Kevin Klues, Marlow Weston, Vasubabu Kandimalla, v, wg-batch

I am in the US EST time zone.

Abhishek 

Kevin Klues

unread,
Nov 15, 2023, 7:57:36 AM11/15/23
to abhishek malvankar, Aldo Culquicondor, Kevin Klues, Marlow Weston, Vasubabu Kandimalla, v, wg-batch
I can't make it next week, so let's just keep it scheduled for December 7th.

I will make sure to have some materials prepared to help summarize the state of things (including the outstanding issues) so that the conversation will be streamlined / focused.

In the meantime, we can chat about things in the #wg-batch channel on the Kubernetes slack. Please ping @dra-dev in any DRA-related discussions to ensure the right people get pulled in.

Thanks!

Kevin
--
~Kevin

Aldo Culquicondor

unread,
Nov 15, 2023, 9:02:09 AM11/15/23
to Kevin Klues, abhishek malvankar, Kevin Klues, Marlow Weston, Vasubabu Kandimalla, v, wg-batch
> I will make sure to have some materials prepared to help summarize the state of things (including the outstanding issues) so that the conversation will be streamlined / focused.

Could you please share when you have a draft? It might be useful to collaborate through a google doc.

Aldo

Reply all
Reply to author
Forward
0 new messages