[slurm-users] ntasks and cpus-per-task

11,364 views
Skip to first unread message

Miguel Gutiérrez Páez

unread,
Feb 22, 2018, 2:50:48 AM2/22/18
to Slurm User Community List
Hi all,

I'm quite noob in slurm and I'm still learning how slurm works.
There are some concepts that get me confused. For example, the difference between ntasks and cpus-per-task in sbatch and/or srun. I've noticed that cpus-per-task (and ntasks=1) allocates cpus (cores) within the same compute node. A value of cpus-per-task higher than the max number of cores of any node, will fail, since it seems that tries to allocate cores within the same node. With ntasks is different. If I set a ntasks value higher than the number of cores of a node, slurm allocates cores within several nodes, in a linear mode.
So, should I understand cpus-per-task as the number of cpus/cores/thread I want to use? What's the real meaning of ntasks? Has cpus-per-task and ntasks the same meaning in sbatch and srun?

Best regards.

Christopher Samuel

unread,
Feb 22, 2018, 3:16:27 AM2/22/18
to slurm...@lists.schedmd.com
On 22/02/18 18:49, Miguel Gutiérrez Páez wrote:

> What's the real meaning of ntasks? Has cpus-per-task and ntasks the
> same meaning in sbatch and srun?

--ntasks is for parallel distributed jobs, where you can run lots of
independent processes that collaborate using some form of communication
between the processes (usually MPI for HPC).

So inside your batch script you would use "srun" to start up the tasks.

However, unless you code is written to make use of that interface then
it's not really going to help you, and so for any multithreaded
application you need to use --cpus-per-task instead.

Now, you can get fancy and have hybrid applications which can be split
up across nodes (individual processes) but each one of those tasks can
also use multiple cores at the same time by multi-threading.

Of course if your application isn't multithreaded in the first place
then you only need to ask for 1 task and 1 core for it anyway! :-)

Best of luck,
Chris

Loris Bennett

unread,
Feb 22, 2018, 3:51:18 AM2/22/18
to Christopher Samuel, slurm...@lists.schedmd.com
Hi Chris,

Christopher Samuel <ch...@csamuel.org> writes:

> On 22/02/18 18:49, Miguel Gutiérrez Páez wrote:
>
>> What's the real meaning of ntasks? Has cpus-per-task and ntasks the
>> same meaning in sbatch and srun?
>
> --ntasks is for parallel distributed jobs, where you can run lots of
> independent processes that collaborate using some form of communication
> between the processes (usually MPI for HPC).
>
> So inside your batch script you would use "srun" to start up the tasks.
>
> However, unless you code is written to make use of that interface then
> it's not really going to help you, and so for any multithreaded
> application you need to use --cpus-per-task instead.

[snip (11 lines)]

But does it make any difference for a multithreaded program if I have

#SBATCH --ntasks=4
#SBATCH --nodes=1-1

rather than

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4

Up to now I have only thought of --cpus-per-task in connection with
hybrid MPI/OpenMP jobs, which we don't actually have. Thus I tend to
tell users to think always in terms of tasks, regardless of whether
these are MPI processes or just threads.

One downside of my approach is that if the user forgets to specify
--nodes and --ntasks is greater than 1, non-MPI jobs can be assigned to
multiple nodes.

Cheers,

Loris

--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris....@fu-berlin.de

Miguel Gutiérrez Páez

unread,
Feb 22, 2018, 4:07:24 AM2/22/18
to Slurm User Community List
Hi,

That was just I thought, ntasks for mpi and cpus-per-task for multithreading.
So, for example, if every node has 24 cores, is there any difference between these commands?

sbatch --ntasks 24 [...]
sbatch --ntasks 1 --cpus-per-task 24 [...]

regards.

Christopher Benjamin Coffey

unread,
Feb 22, 2018, 8:50:58 AM2/22/18
to Slurm User Community List
Loris,

It’s simple, tell folks only to use -n for mpi jobs, and -c otherwise (default).

It’s a big deal if folks use -n when it’s not an mpi program. This is because the non mpi program is launched n times (instead of once with internal threads) and will stomp over logs and output files (uncoordinated) leading to poor performance and incorrect results.

Best,
Chris

Paul Edmon

unread,
Feb 22, 2018, 10:13:28 AM2/22/18
to slurm...@lists.schedmd.com
At least from my experience wonky things can happen with slurm
(especially if you have thread affinity on) if you don't rightly divide
between -n and -c.  In general I've been telling our users that -c is
for threaded applications and -n is for rank based parallelism.  This
way the thread affinity works out properly.

-Paul Edmon-

Loris Bennett

unread,
Feb 22, 2018, 10:24:43 AM2/22/18
to Slurm User Community List
Hi, Other Chris,

Christopher Benjamin Coffey <Chris....@nau.edu> writes:

> Loris,
>
> It’s simple, tell folks only to use -n for mpi jobs, and -c otherwise (default).
>
> It’s a big deal if folks use -n when it’s not an mpi program. This is
> because the non mpi program is launched n times (instead of once with
> internal threads) and will stomp over logs and output files
> (uncoordinated) leading to poor performance and incorrect results.

But that's only the case if the program is started with srun or some
form of mpirun. Otherwise the program just gets started once on one
core and the other cores just idle. However, I could argue that this is
worse than starting multiple instances, because the user might think
everything is OK and go on wasting resources.

So maybe it is a good ideas to tell users that if they don't know what
MPI is, then they should forget about multiple tasks and just set
--tasks-per-cpu for the default 1 task. That way I wouldn't get users
running a single-threaded application on 100 cores (the 1000-core job
got stuck in the queue :-/ ).

I think I'm convinced.

Cheers,

Loris

[snip (53 lines)]

Loris Bennett

unread,
Feb 22, 2018, 10:40:06 AM2/22/18
to Paul Edmon, slurm...@lists.schedmd.com
Hi Paul,

Paul Edmon <ped...@cfa.harvard.edu> writes:

> At least from my experience wonky things can happen with slurm
> (especially if you have thread affinity on) if you don't rightly
> divide between -n and -c.  In general I've been telling our users that
> -c is for threaded applications and -n is for rank based parallelism. 
> This way the thread affinity works out properly.

Actually we have do an issue with some applications not respecting the
CPU mask. I always assumed it was something to do with the way the
multithreading was programmed in certain applications, but maybe we
should indeed be getting the users to use multiple CPUs with a single
task.

Thanks for the info.

Paul Edmon

unread,
Feb 22, 2018, 11:13:50 AM2/22/18
to Loris Bennett, slurm...@lists.schedmd.com
Yeah, I've found that in those situations to have people wrap their
threaded programs in srun inside of sbatch.  That way the scheduler
knows which process specifically gets the threading.

-Paul Edmon-

Mjelde, Matthew J

unread,
Feb 22, 2018, 12:31:20 PM2/22/18
to Slurm User Community List

Howdy,

 

Yes there is a difference between those two submissions.  You are correct that usually ntasks is for mpi and cpus-per-task is for multithreading, but let’s look at your commands.

 

For your first example, the “sbatch --ntasks 24 […]”, this will allocated a job with 24 tasks.  These tasks in this case are only 1 CPUs, but may be split across multiple nodes. So you get a total of 24 CPUs across multiple nodes.

For your second example, the “sbatch --ntasks 1 --cpus-per-task 24 [...]”, this will allocated a job with 1 task and 24 CPUs for that task.  So you will get a total of 24 CPUs on a single node.

 

So in other words, a task cannot be split across multiple nodes.  So using --cpus-per-task will ensure it gets allocated to the same node, while using --ntasks can and may allocate it to multiple nodes.

 

Hope this helps.

Matthew

Christopher Benjamin Coffey

unread,
Feb 22, 2018, 3:58:36 PM2/22/18
to Slurm User Community List
Hi Loris,

"But that's only the case if the program is started with srun or some form of mpirun. Otherwise the program just gets started once on one core and the other cores just idle."

Yes, maybe that’s true about what you say when not using srun. I'm not sure, as we tell everyone to use srun to launch every type of task.

Best,
Chris

Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

Patrick Goetz

unread,
Feb 22, 2018, 4:29:59 PM2/22/18
to slurm...@lists.schedmd.com
On 02/22/2018 07:50 AM, Christopher Benjamin Coffey wrote:
> It’s a big deal if folks use -n when it’s not an mpi program. This is because the non mpi program is launched n times (instead of once with internal threads) and will stomp over logs and output files (uncoordinated) leading to poor performance and incorrect results.
>

I have a LAMMPS user who has been bugging me for weeks about a problem
where if she runs the code by hand she gets useable output, but when she
submits it with sbatch she gets garbage and couldn't figure out why. So,
I finally made her come to my office to show me what she's doing, and
sure enough, I found this in her submit file:

#SBATCH --ntasks=4

and wouldn't have thought about it if I hadn't read your email this morning.

So, thanks! <:)


Chris Samuel

unread,
Feb 22, 2018, 4:59:25 PM2/22/18
to slurm...@lists.schedmd.com
On Friday, 23 February 2018 7:57:54 AM AEDT Christopher Benjamin Coffey wrote:

> Yes, maybe that’s true about what you say when not using srun. I'm not sure,
> as we tell everyone to use srun to launch every type of task.

I've not done that out of habit with users, but it has one really useful side-
effect which is you can see what it is doing whilst the job is running with
sstat (as long as you're using slurmdbd to store accounting data I suspect).

cheers!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC


Loris Bennett

unread,
Feb 23, 2018, 5:51:02 AM2/23/18
to Slurm User Community List
Hi Chris,

Christopher Benjamin Coffey <Chris....@nau.edu> writes:

> Hi Loris,
>
>> But that's only the case if the program is started with srun or some
>> form of mpirun. Otherwise the program just gets started once on one
>> core and the other cores just idle.
>
> Yes, maybe that’s true about what you say when not using srun. I'm not
> sure, as we tell everyone to use srun to launch every type of task.

OK, I'm confused now. Our main culprit for producing processes with
incorrect affinity is ORCA [1]. It uses OpenMPI but also likes to start
processes asynchronously via SSH within the node set. Our users run
their jobs via batch files containing, say

#SBATCH --ntasks=8
...
$ORCA_PATH/orca ...

However, if I run an ORCA job with 'srun', i.e.

#SBATCH --ntasks=8
...
srun $ORCA_PATH/orca ...

this results in the program being run 8 times with all of them writing
to the same log and output files.

Is ORCA just a pathological exception to the idea that it's always good
to use 'srun'? (As it causes well over 95% of our affinity problems, it
is already pathological in that sense.)

Cheers,

Loris

Footnotes:
[1] https://orcaforum.cec.mpg.de/

Christopher Samuel

unread,
Feb 23, 2018, 7:54:11 AM2/23/18
to slurm...@lists.schedmd.com
On 23/02/18 21:50, Loris Bennett wrote:

> OK, I'm confused now. Our main culprit for producing processes with
> incorrect affinity is ORCA [1]. It uses OpenMPI but also likes to
> start processes asynchronously via SSH within the node set.

In that case (and for the general case where there are wrappers
involved) I'd leave it as is. As long as the OpenMPI it is using
is compiled against Slurm you're probably OK for those components.

You can also use pam_slurm_adopt to ensure SSH sessions are captured
into a users job cgroup too, which might help to limit damage.

Best of luck!
Chris

Christopher Benjamin Coffey

unread,
Feb 26, 2018, 1:02:09 PM2/26/18
to Slurm User Community List
Yes this is why we've suggested it for folks to use as the default, for the sstat feature! ( And to make it simple for everyone.


Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167

Chris Samuel : https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=02%7C01%7Cchris.coffey%40nau.edu%7C0d070fe1c9454f59e7df08d57a3f9dc0%7C27d49e9f89e14aa099a3d35b57b2ba03%7C0%7C0%7C636549336007834062&sdata=%2FCV8ED36qTz0368E%2B2QLHNKWsZD4egSBxiWQAEh3WlU%3D&reserved=0 : Melbourne, VIC




Reply all
Reply to author
Forward
0 new messages