Just off the top of my head here.
I would expect you need to have no jobs currently running on the
node, so you could could submit a job to the node that sets the
node to drain, does any local things needed, then exits. As part
of the EpilogSlurmctld script, you could check for drained nodes
based on some reason (like 'MIG reconfig') and do the head node
steps there, with a final bit of bringing it back online.
Or just do all those steps from a script outside slurm itself, on
the head node. You can use ssh/pdsh to connect to a node and
execute things there while it is out of the mix.
Brian Andrus
|
You don't often get email from toom...@gmail.com.
Learn why this is important
|
You shouldn't have to change any parameters if you have it configured in the defaults. Just systemctl stop/start slurmd as needed.
something like:
scontrol update state=drain nodename=<node_to_change> reason="MIG reconfig"
<wait for it to be drained>
ssh <node_to_change> "systemctl stop slurmd"
<run reconfig stuff>
ssh <node_to_change> "systemctl start slurmd"
Not sure what would make you feel slurmd cannot run as a service
on a dynamic node. As long as you added the options to the systemd
defaults file for it, you should be fine (usually
/etc/defaults/slurmd)
Brian