Hello,
We've got a few nodes defined in our slurm.conf in 'FUTURE' state as
it's a new hardware type we're working on brining into service.
The nodes are currently all allocated to a dedicated partition. The
partition is configured as 'state=UP'. As we've built the new nodes and
started slurmd+munge, they've appeared in an idle state in the new
partition as expected. All good so far.
However if the slurmctld is restarted the nodes go back to being in
'FUTURE' state, and do not transition to idle, accept jobs etc.
The slurm daemon on the new nodes can clearly still talk to the
slurmctld, s* commands on the new nodes work as expected but remain in
FUTURE state - until slurmd on each node is restarted.
I could have misunderstood something about the FUTURE state but I was
expecting them to go back to idle; I understand that slurmctld doesn't
communicate out to nodes in FUTURE state but I at least expected them
to be picked up when they communicate _in_ to the slurmctld.
Is this expected behaviour or perhaps a bug? The reason I've defined
the new nodes this way so I don't have to update slurm.conf and restart
slurmctld as each is built, but can do that as a single job once
everything is finished, however it seems less useful if they can
'disappear' from the cluster as far as users are concerned.
Cheers,
Steve
--
slurm-users mailing list --
slurm...@lists.schedmd.com
To unsubscribe send an email to
slurm-us...@lists.schedmd.com