Hi Paul,
Thanks for the thought but no, we'd restarted all slurmctld, slurmdbd and
slurmd daemons since changing any of the slurm config files.
I have a very cut-down slurm.conf on the non-slurmctld nodes, which seems
to be consulted when running srun (regardless of whether slurmd is running
or not).
Removing the simplified NodeName lines from the cut-down slurm.conf causes
srun to immediately return to its "can't find address for host" behaviour
I outlined at the start. Seen this both on clients running slurmd and
those that don't.
The cut-down slurm.conf is slowly growing: I've found that I also need to
add GresTypes, otherwise srun/sbatch don't know what users can put in
their "--gres" flag and so reject it. I guess at least that makes sense -
the tools need to get that information from somewhere.
Interesting!
Best,
Mark
On Fri, 12 Nov 2021, Paul Brunk wrote:
> [EXTERNAL EMAIL]