[slurm-users] Conflicting --nodes and --nodelist

1,645 views
Skip to first unread message

Diego Zuccato

unread,
Jun 1, 2021, 7:16:05 AM6/1/21
to Slurm User Community List
Hello all.

I just found that if an user tries to specify a nodelist (say including
2 nodes) and --nodes=1, the job gets rejected with
sbatch: error: invalid number of nodes (-N 2-1)
The expected behaviour is that slurm schedules the job on the first node
available from the list.
I've found conflicting info about the issue. Is it version-dependant?
If so, we're currently using 18.08.5-2 (from Debian stable). Should we
expect changes when Debian will ship a newer version? Is it possible to
have the expected behaviour?

Tks.

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Marcus Boden

unread,
Jun 1, 2021, 7:44:42 AM6/1/21
to slurm...@lists.schedmd.com
Hi,

as per
https://slurm.schedmd.com/archive/slurm-18.08.5/sbatch.html#OPT_nodelist

> Request a specific list of hosts. The job will contain *all* of these hosts and possibly additional hosts as needed to satisfy resource requirements.

So at least in the sbatch manpage it explicitly states that all nodes
are in the allocation. This is the same in the latest version, so I
guess there are not many changes to be expected.

The only way I currently see to do that from user side is to exclude all
the other nodes with -x/--exclude. If this is for testing and more from
an admin side, you could also create a reservation or temporary partition.

Best,
Marcus

On 01.06.21 13:15, Diego Zuccato wrote:
> Hello all.
>
> I just found that if an user tries to specify a nodelist (say including
> 2 nodes) and --nodes=1, the job gets rejected with
> sbatch: error: invalid number of nodes (-N 2-1)
> The expected behaviour is that slurm schedules the job on the first node
> available from the list.
> I've found conflicting info about the issue. Is it version-dependant?
> If so, we're currently using 18.08.5-2 (from Debian stable). Should we
> expect changes when Debian will ship a newer version? Is it possible to
> have the expected behaviour?
>
> Tks.
>

--
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience, HPC-Team
Tel.: +49 (0)551 201-2191, E-Mail: mbo...@gwdg.de
-------------------------------------------------------------------------
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
Am Faßberg 11, 37077 Göttingen, URL: https://www.gwdg.de

Support: Tel.: +49 551 201-1523, URL: https://www.gwdg.de/support
Sekretariat: Tel.: +49 551 201-1510, Fax: -2150, E-Mail: gw...@gwdg.de

Geschäftsführer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau
Sitz der Gesellschaft: Göttingen
Registergericht: Göttingen, Handelsregister-Nr. B 598

Zertifiziert nach ISO 9001
-------------------------------------------------------------------------

Brian Andrus

unread,
Jun 1, 2021, 9:37:50 AM6/1/21
to slurm...@lists.schedmd.com
That is expected behavior as Marcus pointed out.

I suspect you may be doing something like targeting one of two systems
that each have a node-locked license for some software, or have some
different specs.

In this case, you may want to use the FEATURES option when defining
those nodes and then request that feature when submitting your job.

Brian Andrus

Diego Zuccato

unread,
Jun 3, 2021, 2:29:38 AM6/3/21
to Slurm User Community List, Brian Andrus
Il 01/06/2021 15:37, Brian Andrus ha scritto:

Tks Brian and Marcus.
Seems I misinterpreted the docs.
We need to target one of two identical nodes for performance comparisons
between runs (students have to learn how to scale their jobs).
I'll have to use a feature, since it seems requesting a partition is
overridden by environment var $SBATCH_PARTITION (that I'm using to
specify a default partition *set*, with each partition containing only
omogeneus nodes, so that by default jos only get omogeneus nodes from a
single partition).

BYtE,
Diego
Reply all
Reply to author
Forward
0 new messages