[slurm-dev] Admin reservation on busy nodes

0 views
Skip to first unread message

Jacqueline Scoggins

unread,
Nov 12, 2013, 12:12:03 PM11/12/13
to slurm-dev
Running slurm 2.5.7 and tried to reserve the nodes of the cluster because of hardware issues that needed to be repaired.  Some of the nodes were allocated with jobs and others were not. Tried to do the following but got an error that the Nodes were busy and the reservation was not set.

scontrol create reservation flags=ignore_jobs,maint starttime=now endtime=yyyy-mm-ddThh:mm
partition=blah

It would not work.

Is there a way of setting system reservations on a partition even if there running jobs allocated to nodes?

Thanks

Jackie

Paul Edmon

unread,
Nov 12, 2013, 12:16:54 PM11/12/13
to slurm-dev
Include the ignore_jobs flag.  That will force the reservation.

-Paul Edmon-

Jacqueline Scoggins

unread,
Nov 12, 2013, 12:27:57 PM11/12/13
to slurm-dev
I tried that and it stated that the nodes were busy.

Jackie

Eckert, Phil

unread,
Nov 12, 2013, 12:35:07 PM11/12/13
to slurm-dev
I see the nodes busy message only if I am trying to create a reservation on top of another reservation that includes the same nodes. You might try adding the overlap flag if this is the case.

Phil Eckert
LLNL

Jacqueline Scoggins

unread,
Nov 12, 2013, 1:59:12 PM11/12/13
to slurm-dev
I also believe I tried that one as well as the other two and each time I got Nodes busy message. If the nodes are in alloc state will either of these flags work?  From what I saw they would not work in this case.

Jackie

Eckert, Phil

unread,
Nov 12, 2013, 2:50:11 PM11/12/13
to slurm-dev
Jackie,

I was trying this with an earlier version of SLURM, I just build a 2.5.7 test system and tried it again, and I am seeing the same failures that you do when any of the nodes in the partition are allocated. A workaround is to use the "nodes=" option, ie:

scontrol create reservation flags=ignore_jobs nodes="tnodes[32-591]" starttime=now endtime=tomorrow partition=pbatch user=eckert

Jacqueline Scoggins

unread,
Nov 12, 2013, 4:30:39 PM11/12/13
to slurm-dev
Here is my problem Phil,

My node name is not like n00[00-91] instead we have a suffix added to our hostname like n0000.jackie0.  Since we have multiple n0000 nodes we had to add the cluster they were associated to on the FQDN.  And when I tried nodes='n00[00-91].jackie0' I got a message that the name of the names were not valid.  So I tried only the partition and it still did not wok.

Thanks

Jackie

Eckert, Phil

unread,
Nov 12, 2013, 5:20:30 PM11/12/13
to slurm-dev
Jackie,

it looks like in 2.5.7, according to the scontrol man page,  the correct syntax would be:

scontrol create reservation flags=PART_NODES,IGNORE_JOBS nodes=ALL starttime=now endtime=tomorrow partitionname=pbatch  user=eckert

but unfortunately, that doesn't work either.

Phil

Jacqueline Scoggins

unread,
Nov 12, 2013, 5:25:22 PM11/12/13
to slurm-dev
oh you made me feel like a solution was here until I read "but unfortunately, that doesn't work either".

Ok.

Thanks

Jackie
 
Reply all
Reply to author
Forward
0 new messages