[slurm-users] Job array start time and SchedNodes

199 views
Skip to first unread message

Thekla Loizou

unread,
Dec 7, 2021, 3:03:28 AM12/7/21
to slurm...@lists.schedmd.com
Dear all,

I have noticed that SLURM schedules several jobs from a job array on the
same node with the same start time and end time.

Each of these jobs requires the full node. You can see the squeue output
below:

          JOBID     PARTITION  ST   START_TIME          NODES
SCHEDNODES   NODELIST(REASON)

          124841_1       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
          124841_2       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
          124841_3       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
          124841_4       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
          124841_5       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
          124841_6       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
          124841_7       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
          124841_8       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)
          124841_9       cpu     PD 2021-12-11T03:58:00      1
cn06                 (Priority)

Is this a bug or am I missing something? Is this because the jobs have
the same JOBID and are still in pending state? I am aware that the jobs
will not actually all run on the same node at the same time and that the
scheduler somehow takes into account that this job array has 9 jobs that
will need 9 nodes. I am creating a timeline with the start time of all
jobs and when the job array jobs will start running no other jobs are
set to run on the remaining nodes (so it "saves" the other nodes for the
jobs of the array even if they are all scheduled to run on the same node
based on squeue or scontrol).

Regards,
Thekla Loizou
HPC Systems Engineer
The Cyprus Institute

Loris Bennett

unread,
Dec 7, 2021, 5:17:23 AM12/7/21
to Thekla Loizou, slurm...@lists.schedmd.com
Hi Thekla,
In general jobs from an array will be scheduled on whatever nodes
fulfil their requirements. The fact that all the jobs have

cn06

as NODELIST however seems to suggest that you have either specified cn06
as the node the jobs should run on, or cn06 is the only node which
fulfils the job requirements.

I'm not sure what you mean about '"saving" the other nodes'.

Cheers,

Loris

--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris....@fu-berlin.de

Thekla Loizou

unread,
Dec 7, 2021, 9:21:38 AM12/7/21
to Loris Bennett, slurm...@lists.schedmd.com
Dear Loris,

There is no specific node required for this array. I can verify that
from "scontrol show job 124841" since the requested node list is empty:
ReqNodeList=(null)

Also, all 17 nodes of the cluster are identical so all nodes fulfill the
job requirements, not only node cn06.

By "saving" the other nodes I mean that the scheduler estimates that the
array jobs will start on 2021-12-11T03:58:00. No other jobs are
scheduled to run during that time on the other nodes. So it seems that
somehow the scheduler schedules the array jobs on more than one nodes
but this is not showing in the squeue or scontrol output.

Regards,

Thekla

Loris Bennett

unread,
Dec 7, 2021, 10:17:39 AM12/7/21
to Slurm Users Mailing List
Dear Thekla,

Thekla Loizou <t.lo...@cyi.ac.cy> writes:

> Dear Loris,
>
> There is no specific node required for this array. I can verify that from
> "scontrol show job 124841" since the requested node list is empty:
> ReqNodeList=(null)
>
> Also, all 17 nodes of the cluster are identical so all nodes fulfill the job
> requirements, not only node cn06.
>
> By "saving" the other nodes I mean that the scheduler estimates that the array
> jobs will start on 2021-12-11T03:58:00. No other jobs are scheduled to run
> during that time on the other nodes. So it seems that somehow the scheduler
> schedules the array jobs on more than one nodes but this is not showing in the
> squeue or scontrol output.

My guess is that there is something wrong with either the job
configuration or the node configuration, if Slurm thinks 9 jobs which
require a whole node can all be started simultaneously on same node.

Cheers,

Loris

Thekla Loizou

unread,
Dec 9, 2021, 4:18:06 AM12/9/21
to slurm...@lists.schedmd.com
Dear Loris,

Thank you for your reply. I don't believe that there is something wrong
with the job configuration or the node configuration to be honest.

I have just submitted a simple sleep script:

#!/bin/bash

sleep 10

as below:

sbatch --array=1-10 --ntasks-per-node=40 --time=09:00:00 test.sh

and squeue shows:

          131799_1       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_2       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_3       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_4       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_5       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_6       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_7       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_8       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
          131799_9       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)
         131799_10       cpu  test.sh   thekla PD N/A      1
cn04                 (Priority)

All of the jobs seem to be scheduled on node cn04.

When they start running they run on separate nodes:

          131799_1       cpu  test.sh   thekla  R       0:02 1 cn01
          131799_2       cpu  test.sh   thekla  R       0:02 1 cn02
          131799_3       cpu  test.sh   thekla  R       0:02 1 cn03
          131799_4       cpu  test.sh   thekla  R       0:02 1 cn04

Regards,

Thekla

Loris Bennett

unread,
Dec 9, 2021, 6:04:47 AM12/9/21
to Slurm User Community List
Dear Thekla,

Yes, I think you are right. I have found a similar job on my system and
this does seem to be the normal, slightly confusing behaviour. It looks
as if the pending elements of the array get assigned a single node,
but then start on other nodes:

$ squeue -j 8536946 -O jobid,jobarrayid,reason,schednodes,nodelist,state | head
JOBID JOBID REASON SCHEDNODES NODELIST STATE
8536946 8536946_[401-899] Resources g002 PENDING
8658719 8536946_400 None (null) g006 RUNNING
8658685 8536946_399 None (null) g012 RUNNING
8658625 8536946_398 None (null) g001 RUNNING
8658491 8536946_397 None (null) g006 RUNNING
8658428 8536946_396 None (null) g003 RUNNING
8658427 8536946_395 None (null) g003 RUNNING
8658426 8536946_394 None (null) g007 RUNNING
8658425 8536946_393 None (null) g002 RUNNING

This strikes me as a bit odd.

Cheers,

Loris

Thekla Loizou

unread,
Dec 9, 2021, 6:45:12 AM12/9/21
to slurm...@lists.schedmd.com
Dear Loris,

Yes it is indeed a bit odd. At least now I know that this is how SLURM
behaves and not something that has to do with our configuration.

Regards,

Thekla
Reply all
Reply to author
Forward
0 new messages