Hi Batch Working Group!
I saw Aldo's
talk at Kubecon about Job API features and also recently joined the slack, and wanted to start some discussion about features of interest! I figure we can start here, and then possibly
open up issue(s) on GitHub. For some background, I'm working on the
Flux Operator and we are trying to deploy a Flux Framework "mini cluster" using an indexed job. Some quick notes about the design:
- A "Mini cluster" CRD originally set up the nodes and launched the job, and now it just sets up the nodes, and starts a
RESTful API to submit jobs to (
WIP).
- One Mini Cluster runs one
Flux Framework instance and is owned by one user.
- Each pod "node" is networked through a hack to populate the /etc/hosts of each pod, and includes other shared configs and assets (through volumes) for the nodes.
- There is one "main" broker that starts the RESTFul API via flux, the others start Flux and need to be discovered by the main broker. If there are N nodes total, they all ways to see N ip addresses populated in /etc/hosts.
- As stated above, we use an indexed job.
- In case it matters, I use minikube to develop (the emojis give me life!!) 😆🎉❤🦄
Some features I think we'd like:
- An ability to define
multiple pod templates with a worker / driver pattern within the same Indexed job to allow for a startup sequence. Right now I have one
wait.sh script that basically says "If I'm index 0, start the server, otherwise just start Flux and expect to be discovered." I suspect others have scripts like this that are essentially if->else noodles. It's a bit messy and I expect to see issues when we try to scale it.
- This would be hugely supported by some ability to look at states of the different pods in the Job. E.g., I would have the workers boot up first, ensure they are all ready, and then I could start the server that expects them to be there. If a worker fails to start, I can re-create without worry.
- An alternative to the above (if we cannot have a worker / driver pattern) would be to allow creation of some N pods first, and then an incremental addition of pods (with an ability to check when the first set are ready).
- And finally, more plug and play ability to have a networked set of pods in the Mini Cluster. I've looked at other operators (e.g., the mpi-operator) and they all have "
tricks" to do this. It would be super nice to just have an option that will automatically allow the pods to see one another (a basic ping should work).
Those are the main features that would be really nice to have! I have a few more ideas but I want to keep this email shorter. Thanks y'all for the great discussion, and for working on these APIs! I'm new to developing operators and for Kubernetes and I'm loving it.
Best,
Vanessa