[slurm-users] cpus-per-task behaviour of srun after 22.05

199 views
Skip to first unread message

Michael Müller

unread,
Oct 20, 2023, 4:57:50 AM10/20/23
to slurm...@lists.schedmd.com
Hello,

I haven't really seen this discussed anywhere, but maybe I didn't look
in the right places.

After our upgrade from 21.08 to 23.02 we had users complaining about
srun not using the specified --cpus-per-task given in sbatch-directives.
The changelog of 22.05 mentions this change and explains the need to set
the Environment variable SRUN_CPUS_PER_TASK. The environment variable
SLURM_CPUS_PER_TASK will be set by the sbatch-directive, but is ignored
by srun.

Does anyone know why this behaviour was changed? Imo the expectation
that an sbatch-directive is the default for the whole job-script is
reasonable.

Is there a config option to reenable the old behaviour, or do we have to
find a workaround with a job_submit script or a profile.d script? If so,
have any of you already implemented such a workaround?


With kind regards
Michael

--
Michael Müller
Application Developer

Dresden University of Technology
Center of Information Services and High Performance Computing (ZIH)
Department of Interdisciplinary Application Development and Coordination (IAK)
01062 Dresden

phone: (0351)463-35261
www:www.tu-dresden.de/zih

Jason Simms

unread,
Oct 22, 2023, 12:49:25 PM10/22/23
to Slurm User Community List
Hello Michael,

I don't have an elegant solution, but I'm writing mostly to +1 this. I didn't catch this in the release notes but am concerned if it is indeed the new behavior. Researchers use scripts that rely on --cpus-per-task (or -c) as part of, e.g., SBATCH directives. I suppose you could simply include something like this, unless someone knows why it wouldn't work, but even if so it seems inelegant:

SRUN_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK

A related question I have, which has come up a couple of times in various other contexts, is truly understanding the difference, in a submit script, between including srun and not, for example:

srun myscript
myscript

People have asked whether srun is required, or what the difference is if it is not included, and honestly it seems like the common reply is that "it doesn't matter that much." But, nobody that I've seen (and I've not done an exhaustive search) has articulated whether it actually matters to use srun within a batch script. Because if this is now the behavior, it appears that simply not using srun will still permit the task to use --cpus-per-task.

Warmest regards,
Jason
--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research Computing
Swarthmore College
Information Technology Services
Schedule a meeting: https://calendly.com/jlsimms

Ryan Novosielski

unread,
Oct 22, 2023, 1:20:04 PM10/22/23
to Slurm User Community List
What we say at our site is that you should use srun, if you don’t use srun, you will see limited, if any, output on resource usage in the various places you can see it (sacct, etc), and I learned recently that sattach won’t work either. I find it’s also easier to make mistakes with resource use if you don’t.

We also recommend using it to launch MPI jobs, instead of mpirun/mpiexec/etc. and that is our supported means of operation/the way all of the centrally built MPI stacks work. 

Sent from my iPhone

On Oct 22, 2023, at 12:52, Jason Simms <jsi...@swarthmore.edu> wrote:



William Brown

unread,
Oct 22, 2023, 6:50:58 PM10/22/23
to Slurm User Community List
In the examples we provide to researchers we suggest that the point of using srun within a script submitted with sbatch is that you would append an ampersand so that you can run multiple job steps in parallel. 

The allocations in the #SBATCH directives provide at least the sum of resources for parallel steps.

It does assume many things such as parallel job steps having similar duration.

If you do not use srun to create job steps then one may as well just use separate sbatch jobs with job dependencies.

I can imagine in some busy clusters users might prefer job steps so that if a job starts it finishes promptly without waiting for each step to be scheduled. 

Having said all that we have a tiny cluster and relatively few users, and none have ever used srun so far as I  know.  But that probably reflects more on the lack of training offered than any reluctance on their part.

There is probably no 'right way' as it depends so much on the program being run.

William Brown
Reply all
Reply to author
Forward
0 new messages