Specify custom start index for an indexed Job

800 views
Skip to first unread message

v

unread,
Jun 13, 2023, 2:46:56 PM6/13/23
to wg-b...@kubernetes.io
Hi Batch Working Group!

We are working on bursting - where an indexed job running a type of HPC cluster (Flux Framework) locally can create a new Kubernetes cluster also running Flux, and then burst jobs to it. The issue we are running into is that ideally we could have control over the indices chosen. E.g., from the standpoint of Flux it might look like:

First cluster: "Hello! I'm the root of the tree. I know about the hostnames flux-sample[0-999], and flux-sample[0-9] are directly right here. Hi kids! 👋

Then a user comes along - "Oh wow, I gotta use those new machines in Google Cloud! Let's burst!" Does some request to Flux to ask to burst.

The ideal case:
Bursted Flux Cluster: "Hello, I'm the new bursted cluster! Here are hostnames flux-sample[10-19] to add to your family." (this would work)

The actual case:
Bursted Flux Cluster: "Hello, I'm the new bursted cluster! But I have to start at 0, so here are hostnames flux-sample[0-9] to add to your configuration."

And for that second case, the two sets of kids would kill each other (actually in practice, testing this last night they just kind of freak out and crash, I guess that's the same thing?). We are currently trying to force hacks to allow for differently named jobs, but nothing promising yet. We think if others are planning to potentially hook up different indexed jobs, and the names matter, this might make sense.

So TLDR: could it be possible to ask for an indexed job and control the start index? Here is an explicit example.

Best,

Vanessa

Aldo Culquicondor

unread,
Jun 13, 2023, 4:10:42 PM6/13/23
to v, wg-b...@kubernetes.io
We have discussed this before, https://github.com/kubernetes/kubernetes/issues/109131

We wanted to make it more generic so that you can specify particular indexes. The use case is that you can have an Indexed Job where some indexes fail (KEP-3850). Then, you can create a new Indexed Job and only retry the indexes that failed previously.

But we are focusing on KEP-3850 in this release. Hopefully we can get to what you request in the next release. But currently, it's up for grabs :)

Aldo


--
To unsubscribe from this group and stop receiving emails from it, send an email to wg-batch+u...@kubernetes.io.
Reply all
Reply to author
Forward
0 new messages