[slurm-users] Granular or dynamic control of partitions?

347 views
Skip to first unread message

Pacey, Mike

unread,
Aug 4, 2023, 10:41:23 AM8/4/23
to Slurm User Community List

Hi folks,

 

We’re currently moving our cluster from Grid Engine to SLURM, and I’m having trouble finding the best way to perform a specific bit of partition maintenance. I’m not sure if I’m simply missing something in the manual or if I need to be thinking in a more SLURM-centric way. My basic question: is it possible to ‘disable’ specific partition/node combinations rather than whole nodes or whole partitions? Here’s an example of the sort of thing I’m looking to do:

 

I have node ‘node1’ with two partitions ‘x’ and ‘y’. I’d like to remove partition ‘y’, but there are currently user jobs in that partition on that node. With Grid Engine, I could disable specific queue instances (ie, I could just run “qmod -d y@node1’ to disable queue/partition y on node1 and wait for the jobs to complete and then remove the partition. That would be the least disruptive option because:

  • Queue/partition ‘y’ on other nodes would be unaffected
  • User jobs for queue/partition ‘x’ would still be able to launch on node1 the whole time

 

I can’t seem to find a functional equivalent of this in SLURM:

  • I can set the whole node to Drain
  • I can set the whole partition to Inactive

 

Is there some way to ‘disable’ partition y just on node1?

 

Regards,

Mike

ben.p...@science.ru.nl

unread,
Aug 4, 2023, 12:35:15 PM8/4/23
to Slurm User Community List
Hi Mike,

If it is to be permanent, why not remove the node from the partition definition in slurm.conf?

Regards,
Ben

Josef Dvoracek

unread,
Aug 4, 2023, 12:52:39 PM8/4/23
to slurm...@lists.schedmd.com

Just remove given node from partition.

Already running jobs will continue without interruption..

HTH

josef

On 04. 08. 23 16:40, Pacey, Mike wrote:
..

Feng Zhang

unread,
Aug 4, 2023, 2:36:41 PM8/4/23
to Slurm User Community List
You can try command as:

scontrol update partition mypart Nodes=node[1-90],ab,ac #exclude the
one you want to remove

"Changing the Nodes in a partition has no effect upon jobs that have
already begun execution."


Best,

Feng

Pacey, Mike

unread,
Aug 7, 2023, 5:30:31 AM8/7/23
to Slurm User Community List

Hi Feng,

Thanks - that's what I was looking for, though for my version of SLURM (23.02.0) it looks like the syntax is "scontrol update partition=mypart". Good to know that SLURM can cope with on-the-fly changes without affecting jobs.

With the "live" config now being different from the static I guess best practice is to ensure slurm.conf's partition definitions also need to be edited?

Regards,
Mike

-----Original Message-----
From: slurm-users <slurm-use...@lists.schedmd.com> On Behalf Of Feng Zhang
Sent: Friday, August 4, 2023 7:36 PM
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: [External] Re: [slurm-users] Granular or dynamic control of partitions?

This email originated outside the University. Check before clicking links or attachments.

Tina Friedrich

unread,
Aug 7, 2023, 9:57:31 AM8/7/23
to slurm...@lists.schedmd.com
Hi Mike,

I moved from Grid Engine to SLURM a couple of years ago & it took me a
while to get my head around this :)

Yes - and you could also just edit slurm.conf and restart the
controller. That will not affect running jobs. It's - both in my
experience and from all I read - absolutely safe to restart any of the
daemons (slurmd on the nodes, slurmctld, ...) in operation, with jobs
running, it shouldn't affect them.

(I these days think of a quick change to slurm.conf & a restart of the
controller daemon as equivalent to a quick qconf command.)

Tina
Reply all
Reply to author
Forward
0 new messages