When we reboot the entire cluster, we
1) scontrol update NodeName=<all nodes> State=Down
2) Halt all the nodes
3) scontrol update NodeName=<all nodes> State=Resume
4) Boot the nodes
5) Wait for sinfo to report everyone is idle
... but this doesn't work. the scontrol update State=Resume
seems to remember that it had talked to some nodes recently and
puts them right into Idle, rather than Idle* where I would expect.
My intuition is that I want a way for scontrol to tell slurmctld that
it should put nodes in down* until it hears from them (which will
be when /etc/init.d/slurmd runs on each node.)
... or a way to tell slurmctld to query all the nodes <right now>
instead of waiting for the polling interval.
--
-Larry / Sector IX
I'm assuming that the process below does not include a restart of
the slurmctld daemon. Is that correct?
If so and you have ReturnToService=1 (or yes) then you can probably
just boot the nodes and skip steps 1-3.
Thanks. In the meantime, I have evolved the following strategy:
Index: src/slurmctld/slurmctld.h =================================================================== --- src/slurmctld/slurmctld.h (revision 14732) +++ src/slurmctld/slurmctld.h (working copy) @@ -237,6 +237,7 @@ extern bitstr_t *idle_node_bitmap; /* bitmap of idle nodes */ extern bitstr_t *share_node_bitmap; /* bitmap of sharable nodes */ extern bitstr_t *up_node_bitmap; /* bitmap of up nodes, not DOWN */ +extern bool ping_nodes_now; /* if set, ping nodes immediately */ /*****************************************************************************\ * PARTITION parameters and data structures Index: src/slurmctld/controller.c =================================================================== --- src/slurmctld/controller.c (revision 14732) +++ src/slurmctld/controller.c (working copy) @@ -143,6 +143,7 @@ char *slurmctld_cluster_name = NULL; /* name of cluster */ void *acct_db_conn = NULL; int accounting_enforce = 0; +bool ping_nodes_now = false; /* Local variables */ static int daemonize = DEFAULT_DAEMONIZE; @@ -1097,12 +1098,13 @@ unlock_slurmctld(node_write_lock); } } - - if (difftime(now, last_ping_node_time) >= ping_interval) { + if ((difftime(now, last_ping_node_time) >= ping_interval) || + ping_nodes_now) { static bool msg_sent = false; if (is_ping_done()) { msg_sent = false; last_ping_node_time = now; + ping_nodes_now = false; lock_slurmctld(node_write_lock); ping_nodes(); unlock_slurmctld(node_write_lock); Index: src/slurmctld/ping_nodes.c =================================================================== --- src/slurmctld/ping_nodes.c (revision 14732) +++ src/slurmctld/ping_nodes.c (working copy) @@ -217,7 +217,8 @@ continue; } - if (node_ptr->last_response >= still_live_time) + if ((!no_resp_flag) && + (node_ptr->last_response >= still_live_time)) continue; /* Do not keep pinging down nodes since this can induce Index: src/slurmctld/node_mgr.c =================================================================== --- src/slurmctld/node_mgr.c (revision 14732) +++ src/slurmctld/node_mgr.c (working copy) @@ -1067,9 +1067,13 @@ node_ptr->node_state &= (~NODE_STATE_DRAIN); node_ptr->node_state &= (~NODE_STATE_FAIL); base_state &= NODE_STATE_BASE; - if (base_state == NODE_STATE_DOWN) + if (base_state == NODE_STATE_DOWN) { state_val = NODE_STATE_IDLE; - else + node_ptr->node_state |= + NODE_STATE_NO_RESPOND; + node_ptr->last_response = now; + ping_nodes_now = true; + } else state_val = base_state; } if (state_val == NODE_STATE_DOWN) {
jet...@llnl.gov wrote:
Thanks. In the meantime, I have evolved the following strategy:
scontrol nodename=<all> state=down
halt nodes
/etc/init.d/slurmctld stop
rm /tmp/node_state
/etc/init.d/slurmctld start
boot nodes
which seems to set them all to "unknown" until they check in
Opinion? I'd rather not have unneccessary source divergence.
--
-Larry / Sector IX
--
jet...@llnl.gov wrote:
Thanks. In the meantime, I have evolved the following strategy:
scontrol nodename=<all> state=down
halt nodes
/etc/init.d/slurmctld stop
rm /tmp/node_state
/etc/init.d/slurmctld start
boot nodes
which seems to set them all to "unknown" until they check in
Opinion? I'd rather not have unneccessary source divergence.--
-Larry / Sector IX
Your strategy will result in all node state being lost (e.g. drainednodes, reasons, etc), which is not ideal. I'll get my patch in toslurm v1.3.7 and would recommend that you upgrade when that isavailable (probably this week).--