[slurm-users] Exposing only requested CPUs to a job on a given node.

1,413 views
Skip to first unread message

Luis R. Torres

unread,
May 14, 2021, 4:35:46 PM5/14/21
to slurm...@schedmd.com
Hi Folks,

We are currently running on SLURM 20.11.6 with cgroups constraints for memory and CPU/Core.  Can the scheduler only expose the requested number of CPU/Core resources to a job?  We have some users that employ python scripts with the multi processing modules, and the scripts apparently use all of the CPU/Cores in a node, despite using options to constraint a task to just a given number of CPUs.    We would like several multiprocessing jobs to run simultaneously on the nodes, but not step on each other.

The sample script I use for testing is below; I'm looking for something similar to what can be done with the GPU Gres configuration where only the number of GPUs requested are exposed to the job requesting them. 


#!/usr/bin/env python3

import multiprocessing


def worker():

    print("Worker on CPU #%s" % multiprocessing.current_process

().name)

    result=0

    for j in range(20):

      result += j**2

    print ("Result on CPU {} is {}".format(multiprocessing.curr

ent_process().name,result))

    return 


if __name__ == '__main__':

    pool = multiprocessing.Pool()

    jobs = []

    print ("This host exposed {} CPUs".format(multiprocessing.c

pu_count()))

    for i in range(multiprocessing.cpu_count()):

        p = multiprocessing.Process(target=worker, name=i).star

t()


Thanks,
--
----------------------------------------
Luis R. Torres

Rodrigo Santibáñez

unread,
May 14, 2021, 6:17:15 PM5/14/21
to Slurm User Community List
Hi you all,

I'm replying to have notifications answering this question. I have a user whose python script used almost all CPUs, but configured to use only 6 cpus per task. I reviewed the code, and it doesn't have an explicit call to multiprocessing or similar. So the user is unaware of this behavior (and also me).

Running slurm 20.02.6

Best!

Renfro, Michael

unread,
May 14, 2021, 6:41:18 PM5/14/21
to Slurm User Community List

Untested, but prior experience with cgroups indicates that if things are working correctly, even if your code tries to run as many processes as you have cores, those processes will be confined to the cores you reserve.

 

Try a more compute-intensive worker function that will take some seconds or minutes to complete, and watch the reserved node with 'top' or a similar program. If for example, the job reserved only 1 core and tried to run 20 processes, you'd see 20 processes in 'top', each at 5% CPU time.

 

To make the code a bit more polite, you can import the os module and create a new variable from the SLURM_CPUS_ON_NODE environment variable to guide Python into starting the correct number of processes:

 

                cpus_reserved = int(os.environ['SLURM_CPUS_ON_NODE'])

 

From: slurm-users <slurm-use...@lists.schedmd.com> on behalf of Rodrigo Santibáñez <rsantiban...@gmail.com>
Date: Friday, May 14, 2021 at 5:17 PM
To: Slurm User Community List <slurm...@lists.schedmd.com>
Subject: Re: [slurm-users] Exposing only requested CPUs to a job on a given node.

External Email Warning

This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.


Ryan Cox

unread,
May 14, 2021, 6:52:18 PM5/14/21
to slurm...@lists.schedmd.com
You can check with something like this inside of a job:  cat /sys/fs/cgroup/cpuset/slurm/uid_$UID/job_$SLURM_JOB_ID/cpuset.cpus.  That lists which cpus you have access to.
-- 
Ryan Cox
Director
Office of Research Computing
Brigham Young University

Luis R. Torres

unread,
Jul 1, 2021, 6:12:52 PM7/1/21
to slurm...@schedmd.com
Hi Folks,

Thank you for your responses, I wrote the following configuration in cgroup.conf along the appropriate slurm.conf 
changes and I wrote a program to verify affinity whe queued or running in the cluster.  results are below.  Thanks so much.

###

#

# Slurm cgroup support configuration file

#

# See man slurm.conf and man cgroup.conf for further

# information on cgroup configuration parameters

#--

CgroupAutomount=yes

CgroupMountpoint=/sys/fs/cgroup

#ConstrainCores=no

ConstrainCores=yes

ConstrainRAMSpace=yes

ConstrainDevices=no

ConstrainKmemSpace=no #Avoid a known kernel issue

ConstrainSwapSpace=yes

TaskAffinity=no #Use task/affinity plugin instead

-----

srun --tasks=1 --cpus-per-task=1 --partition=long show-affinity.py 

pid 1122411's current affinity mask: 401


=====================================

CPUs in system:  20

PID:  1122411

Allocated CPUs/Cores:  2

Affinity List:  {0, 10}

=====================================

srun --tasks=1 --cpus-per-task=4 --partition=long show-affinity.py 

pid 1122446's current affinity mask: c03


=====================================

CPUs in system:  20

PID:  1122446

Allocated CPUs/Cores:  4

Affinity List:  {0, 1, 10, 11}

=====================================


srun --tasks=1 --cpus-per-task=6 --partition=long show-affinity.py 

pid 1122476's current affinity mask: 1c07


=====================================

CPUs in system:  20

PID:  1122476

Allocated CPUs/Cores:  6

Affinity List:  {0, 1, 2, 10, 11, 12}

=====================================

Sid Young

unread,
Jul 1, 2021, 6:22:32 PM7/1/21
to Slurm User Community List, slurm...@schedmd.com
Hi Luis,

I have exactly the same issue with a user who needs the reported cores to reflect the requested cores. If you find a solution that works please share. :)

Thanks

Sid Young
Translational Research Institute

Christopher Samuel

unread,
Jul 1, 2021, 7:41:11 PM7/1/21
to slurm...@lists.schedmd.com
On 7/1/21 3:26 pm, Sid Young wrote:

> I have exactly the same issue with a user who needs the reported cores
> to reflect the requested cores. If you find a solution that works please
> share. :)

The number of CPUs in teh system vs the number of CPUs you can access
are very different things. You can use the "nproc" command to find the
number of CPUs you can access.

From a software side of things this is why libraries like "hwloc"
exist, so you can determine what is accessible in a portable way.

https://www.open-mpi.org/projects/hwloc/

It live on the Open-MPI website, but it doesn't use Open-MPI (Open-MPI
uses it).

All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Reply all
Reply to author
Forward
0 new messages