[slurm-users] Slurm strigger configuration

910 views
Skip to first unread message

Jodie H. Sprouse

unread,
Sep 19, 2018, 11:47:26 AM9/19/18
to slurm...@schedmd.com
Good morning. 
I’m struggling with getting strigger working correctly. 
My end goal sounds fairly simple: to get a mail notification if a node gets set into ‘drain’ mode. 

The man page for strigger states it must be run by the set slurmuser which is slurm:
#  scontrol show config | grep SlurmUser
SlurmUser               = slurm(990)

# grep slurm /etc/passwd
slurm:x:990:984:SLURM resource manager:/etc/slurm:/sbin/nologin

I created the file per the man page (I’m first trying to get it to work if a node goes down after receiving “option —drain does not exist”):
# cat /usr/sbin/slurm_admin_notify

#!/bin/bash
# Submit trigger for next event
 strigger --set --node --down \
         --program=/usr/sbin/slurm_admin_notify
# Notify administrator using by e-mail
/bin/mail OurSit...@OurEmailServer.edu -s NodesDown:$*
———
If I run manually, I receive:
slurm_set_trigger: Access/permission denied

If I add: “runuser -l slurm -c” in front  of the command strigger, I  receive: 
This account is currently not available.

The man page also states: “Trigger events are not processed instantly, but a check is performed for trigger events on a periodic basis (currently every 15 seconds). 
This leads me to believe I am missing something possibly in my install for where is that 15 seconds set?

Any suggestions would be greatly appreciated! How are folks accomplishing this?
Thank you!
Jodie

Christopher Benjamin Coffey

unread,
Sep 19, 2018, 12:20:02 PM9/19/18
to Slurm User Community List, slurm...@schedmd.com
Hi Jodie,

The only thing that I've gotten working so far is this:

sudo -u slurm bash -c "strigger --set -D -n cn15 -p /common/adm/slurm/triggers/nodestatus"

So, that will run the nodestatus script which emails when the node cn15 gets set into drain state. What I'd like to do, which I haven't put time into figuring out, is how to setup a persistent trigger that can run when ANY node goes into drain state. Let me know if you figure that out. As you can see above, the trigger has to be setup by the slurm user.

Best,
Chris


Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167


On 9/19/18, 8:48 AM, "slurm-users on behalf of Jodie H. Sprouse" <slurm-use...@lists.schedmd.com on behalf of jh...@cornell.edu> wrote:




Good morning.
I’m struggling with getting strigger working correctly.
My end goal sounds fairly simple: to get a mail notification if a node gets set into ‘drain’ mode.


The man page for strigger states it must be run by the set slurmuser which is slurm:
# scontrol show config | grep SlurmUser
SlurmUser = slurm(990)



# grep slurm /etc/passwd
slurm:x:990:984:SLURM resource manager:/etc/slurm:/sbin/nologin



I created the file per the man page (I’m first trying to get it to work if a node goes down after receiving “option —drain does not exist”):
# cat /usr/sbin/slurm_admin_notify


#!/bin/bash
# Submit trigger for next event
strigger --set --node --down \
--program=/usr/sbin/slurm_admin_notify
# Notify administrator using by e-mail
/bin/mail
OurSit...@OurEmailServer.edu <mailto:OurSit...@OurEmailServer.edu> -s NodesDown:$*

Kilian Cavalotti

unread,
Sep 19, 2018, 1:00:22 PM9/19/18
to Slurm User Community List, slurm...@schedmd.com
On Wed, Sep 19, 2018 at 9:21 AM Christopher Benjamin Coffey
<Chris....@nau.edu> wrote:
> The only thing that I've gotten working so far is this:
> sudo -u slurm bash -c "strigger --set -D -n cn15 -p /common/adm/slurm/triggers/nodestatus"
>
> So, that will run the nodestatus script which emails when the node cn15 gets set into drain state. What I'd like to do, which I haven't put time into figuring out, is how to setup a persistent trigger that can run when ANY node goes into drain state. Let me know if you figure that out. As you can see above, the trigger has to be setup by the slurm user.

strigger takes a "--flags=perm" option, which makes the trigger
permanent and doesn't purge it after the event happened. So it doesn't
need tb re-armed in the trigger script for the next event.

Also, not specifying any job id nor node name in the strigger command
will make the trigger apply to all nodes. That's how we set our
default "down" trigger for all nodes on our cluster:
# su -s /bin/bash -c "strigger --set --down
--program=/share/admin/scripts/slurm/triggers/down.sh --flags=perm"
slurm

Also note that the script given to "--program" needs to be executable
by user Slurm on the controller node(s).

Cheers,
--
Kilian

Christopher Benjamin Coffey

unread,
Sep 19, 2018, 5:14:44 PM9/19/18
to Slurm User Community List, slurm...@schedmd.com
Killian, thank you very much! Never noticed the perm flag!

Best,
Chris


Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167


Jodie H. Sprouse

unread,
Sep 20, 2018, 4:00:54 PM9/20/18
to Slurm User Community List, slurm...@schedmd.com
Thank you to both Kilian and Chris,
I have running on the slurm server to report once when any of the nodes go into “Drain” State:

sudo -u slurm bash -c “strigger --set -D -p /etc/slurm/triggers/slurm_admin_notify --flags=perm"
/bin/mail -s “ClusterName DrainedNode:$*” our_admin_email_address

Exactly what we needed. Success.
Thank you!
Jodie
Center for Advanced Computing
Cornell University
Reply all
Reply to author
Forward
0 new messages