nid[10-20]:4
will prevent 4 usable nodes (i.e IDLE and not DOWN, DRAINING or already powered down) in the set nid[10-20]
from being powered down.As I understand it, that setting means "Always have at least X nodes up", which includes running jobs. So it stops any wait time for the first X jobs being submitted, but any jobs after that will need to wait for the power_up sequence.
Brian Andrus
Sorry for the late reply.
For my site, I used the optional ":" separator to ensure at least
4 nodes were up. Eg: nid[10-20]:4
This means at least 4 nodes.. those nodes do not have to be the
same 4 at any time, so if one is down that used to be idle, but 4
are up, that 1 will not be brought back up. I don't see this
setting having much of anything to do with bringing nodes up at
all with the exception of when you first start slurmctld and the
settings are not met. Once there are jobs running on any of the
listed nodes, they count toward the number. That is my experience
with the small numbers I used. YMMV.
I have also explicitly stated nodes without the separator, which does work. I do that when I am trying to look at a node that is idle without a job on it. That stops slurm from shutting it down while I am looking at it.
Although, I do agree, the functionality of being able to have "keep at least X nodes up and idle" would be nice, that is not how I see this documented or working.
Brian Andrus