Hi Batch Working Group!
We are working on bursting - where an indexed job running a type of HPC cluster (Flux Framework) locally can create a new Kubernetes cluster also running Flux, and then burst jobs to it. The issue we are running into is that ideally we could have control over the indices chosen. E.g., from the standpoint of Flux it might look like:
First cluster: "Hello! I'm the root of the tree. I know about the hostnames flux-sample[0-999], and flux-sample[0-9] are directly right here. Hi kids! 👋
Then a user comes along - "Oh wow, I gotta use those new machines in Google Cloud! Let's burst!" Does some request to Flux to ask to burst.
The ideal case:
Bursted Flux Cluster: "Hello, I'm the new bursted cluster! Here are hostnames flux-sample[10-19] to add to your family." (this would work)
The actual case:
Bursted Flux Cluster: "Hello, I'm the new bursted cluster! But I have to start at 0, so here are hostnames flux-sample[0-9] to add to your configuration."
And for that second case, the two sets of kids would kill each other (actually in practice, testing this last night they just kind of freak out and crash, I guess that's the same thing?). We are currently trying to force hacks to allow for differently named jobs, but nothing promising yet. We think if others are planning to potentially hook up different indexed jobs, and the names matter, this might make sense.
Best,
Vanessa