[slurm-users] Restarting jobs

775 views
Skip to first unread message

Nicolas Sonoda

unread,
Aug 18, 2022, 1:54:05 PM8/18/22
to slurm...@schedmd.com
Hi!

In this week, my machines rebooted and the jobs that was running restarted and I've lost the progress that it made. So, can I prevent that restart of jobs? For example if my machines reboot the jobs get cancelled.

Thanks you.
Nícolas

Paul Brunk

unread,
Aug 19, 2022, 8:24:18 AM8/19/22
to Slurm User Community List

Hi Nicolas!

 

In Slurm lingo this is "job requeueing".  The JobRequeue

slurm.conf parameter controls whether Slurm tries to start those

jobs again (requeue vs. job exit).

 

The slurm.conf doc puts it nicely:

 

This option controls the default ability for batch jobs to be

requeued. Jobs may be requeued explicitly by a system

administrator, after node failure, or upon preemption by a

higher priority job. If JobRequeue is set to a value of 1, then

batch jobs may be requeued unless explicitly disabled by the

user. If JobRequeue is set to a value of 0, then batch jobs will

not be requeued unless explicitly enabled by the user. Use the

sbatch --no-requeue or --requeue option to change the default

behavior for individual jobs. The default value is 1.

 

--

Paul Brunk, system administrator

Advanced Computing Resource Center

Enterprise IT Svcs, the University of Georgia

Nicolas Sonoda

unread,
Aug 19, 2022, 1:38:00 PM8/19/22
to Slurm User Community List
Hi Paul!

Thank you very much for the explanation!

De: slurm-users <slurm-use...@lists.schedmd.com> em nome de Paul Brunk <pbr...@uga.edu>
Enviado: sexta-feira, 19 de agosto de 2022 09:23
Para: Slurm User Community List <slurm...@lists.schedmd.com>
Assunto: Re: [slurm-users] Restarting jobs
 
Reply all
Reply to author
Forward
0 new messages