[slurm-users] Jobs cancelled due to job requeue

1,276 views
Skip to first unread message

Nicolas Sonoda

unread,
Sep 2, 2022, 2:53:32 PM9/2/22
to slurm...@lists.schedmd.com
Hi!

I'm submiting a job but after a few seconds it got cancelled and the Slurm output file show this message:

slurmstepd: error: *** JOB 23883 ON gn01 CANCELLED AT 2022-09-02T14:28:19 DUE TO JOB REQUEUE ***

After this the job turn into PD state on queue, with the reason: BeginTime:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
23884       gpu Memb.LS1     vhpc PD       0:00      1 (BeginTime)

And after a while the job stay on RH state with JobHoldMaxRequeue reason.

I'm attaching my script and input files.

Can you help me with that?

Thank you.
Nícolas
MD_200_GPU.slurm
README

Ole Holm Nielsen

unread,
Sep 3, 2022, 4:00:10 AM9/3/22
to slurm...@lists.schedmd.com
You could look in the slurmctld.log file and the node's slurmd.log file
to see what they say about the job.

Check your slurm.conf requeue configuration:

$ scontrol show config | grep Requeue

/Ole

Reply all
Reply to author
Forward
0 new messages