Hi Arnuld
It is most important to keep the Slurm version the same across the board.
As you are mentioning the "deb" package I am assuming all of your nodes are of a debian-based distribution that should be close enough for each other. However, Debian based distros are not as "binary compatible" as RHEL based distros (Say, RHEL, Alma, Rocky, CentOS, Oracle, Fedora etc.), thus even though they all use "deb" package, it would be better to avoid sharing deb across different distros.
If all of your distros have a similar package version for the
dependencies (say, at least on glibc level), except for different way to
name a package (e.g. apache2 - httpd), that would potentially allow you
to run the same slurm on other distros. In this case, you may work
around them by using the DEBIAN/control Depends field to list all of the
potential names for each dependency.
Static linking packages or using a conda-like environment may help you more if those distros are more different and require a rebuild per distro. Otherwise, it would probably make more sense to just build them on each and every node based on the feature they need (say, ROCm or nvml makes no sense on a node without such devices).
More complex structure does indeed require more maintenance work. I
got quite tired of it and decided to just ship with RHEL-family OS for
all computer nodes and let those who are more familiar with whatever
distro to start one up with singularity or docker by themselves.
Sincerely,
S. Zhang
--
slurm-users mailing list -- slurm...@lists.schedmd.com
To unsubscribe send an email to slurm-us...@lists.schedmd.com
--
Hi Arnuld,
What I would probably do is to build one for each distro and
install them either directly into /usr/local or using deb package.
The DEBIAN/control is used by apt to manage a couple of things, such as indexing so apt search shows what this package is for, which package it could replace, which packages are recommended to be installed with it, and which packages need to be installed before this can work.
For those machines with a certain brand of GPU, you would need a
slurm that is configured and compiled with such option ON, and
such device driver in the DEBIAN/control to allow apt to check the
driver on the machine meets the requirement of your deb package.
You can forget about the second part if you are not using deb
packages and just compile - run the slurm on the client machine.
The last thing he mentioned is about the slurm versions. A slurm client of lower version (say 23.02) should be able to talk to a slurmctld of higher version (say 23.11) just fine, though the reverse do not apply. For dependency management it is of such complexity that maintaining a distribution of Linux is quite some work - I knew it as I am a maintainer of a Linux distro that uses dpkg packages, but without a debian root and uses a different cli tool etc.
In fact I am more worried about how the users would benefit from such a mixture of execution environments - a misstep in configuration or a user submitting job without specifying enough info on what they asks for would probably make the user's job works or does not work purely by chance of which node it got executed, and which environment the job's executables are built against. It probably need a couple of "similar" nodes to allow users benefiting from the job queue to send their job to the place where available.
Good luck with your setup
Sincerely,
S. Zhang