[slurm-users] Distribute a single node resources across multiple partitons

725 views
Skip to first unread message

Purvesh Parmar

unread,
Jun 26, 2023, 2:46:00 AM6/26/23
to Slurm User Community List
Hi,

I have slurm 20.11 in a cluster of 4 nodes, with each node having 16 cpus. I want to create two partitions (ppart and cpart) and want that 8 cores from each of the 4 nodes should be part of part of ppart and remaining 8 cores should be part of cpart, this means, I want to distribute each node's resources across multiple partitions exclusively.  How to go about this? 


--
Purvesh

Purvesh Parmar

unread,
Jul 6, 2023, 8:22:07 AM7/6/23
to Slurm User Community List
Hi,

Do I need separate slurmctld and slurmd to run for this? I am struggling for this. Any pointers.

--
Purvesh

Loris Bennett

unread,
Jul 6, 2023, 8:56:21 AM7/6/23
to Slurm User Community List
Hi Purvesh,
I am not aware that you can do this. My understanding is as follows:

A single node N can be a member of two partitions, say A and B, but as
soon as a job starts on N in partition A, then while the job is running,
any remaining resources on the node are only available via partition A.
A second job can only start on N in partition B if no jobs on N are running
in partition A.

Regards

Loris Bennett

--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin

Jason Simms

unread,
Jul 6, 2023, 8:59:20 AM7/6/23
to Slurm User Community List
Hello Purvesh,

I'm not an expert in this, but I expect a common question would be, why are you wanting to do this? More information would be helpful. On the surface, it seems like you could just allocate two full nodes to each partition. You must have a reason why that is unacceptable, however.

My first inclination, without more information, is to say, "don't do that." If you must, one way I can think to (sort of) accomplish what you want is to configure the partitions with the MaxCPUsPerNode option:

PartitionName=ppart Nodes=node[01-04] MaxCPUsPerNode=8
PartitionName=cpart Nodes=node[01-04] MaxCPUsPerNode=8

I don't think this guarantees which specific CPUs are assigned to each partition, though I do believe there may be a way to do that. In any case, this might work for your needs.

Warmest regards,
Jason

On Thu, Jul 6, 2023 at 8:24 AM Purvesh Parmar <purves...@gmail.com> wrote:
Hi,

Do I need separate slurmctld and slurmd to run for this? I am struggling for this. Any pointers.

--
Purvesh


On Mon, 26 Jun 2023 at 12:15, Purvesh Parmar <purves...@gmail.com> wrote:
Hi,

I have slurm 20.11 in a cluster of 4 nodes, with each node having 16 cpus. I want to create two partitions (ppart and cpart) and want that 8 cores from each of the 4 nodes should be part of part of ppart and remaining 8 cores should be part of cpart, this means, I want to distribute each node's resources across multiple partitions exclusively.  How to go about this? 


--
Purvesh


--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research Computing
Swarthmore College
Information Technology Services
Schedule a meeting: https://calendly.com/jlsimms

Sam Gallop (NBI)

unread,
Jul 7, 2023, 7:35:35 AM7/7/23
to Slurm User Community List

Hi Purvesh,

 

Something might be possible but it's a bit of a kludge. To do this cgroups and ConstrainCores needs to be configured.

 

Say you have a node called tux that has 16 cores and 512GB, and you want to split it into two logical nodes of 8 cores and 256GB.

 

In slurm.conf add the NodeNames as you want them (in this case tux01 and tux02) but point the NodeAddr to the hostname or IP of the actual host. Divide up the resources as you wish. Note, the CPUSpecList is used to reserve cores for system use but we can use it to mask the cores we would like to access on the other logical node. Also note, the documentation does say that the use of the Port option "is not generally recommended except for development or testing purposes".

NodeName=tux01 NodeAddr=tux Port=6001 CPUs=16 SocketsPerBoard=2 CoresPerSocket=8  ThreadsPerCore=1 RealMemory=262144 CPUSpecList=0-7

NodeName=tux02 NodeAddr=tux Port=6002 CPUs=16 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=262144 CPUSpecList=8-15

 

Then add the nodes to the partitions.

PartitionName=ppart Nodes=tux01 ...

PartitionName=cpart Nodes=tux02 ...

 

You'll then need to run two slurmd services per node and use the '-N' option to run the daemon with the given hostname, for example 'slurmd -N tux01'.

 

Like I say, it's a bit of a kludge.

 

thanks,
Sam

Reply all
Reply to author
Forward
0 new messages