Hi Nicolas!
In Slurm lingo this is "job requeueing". The JobRequeue
slurm.conf parameter controls whether Slurm tries to start those
jobs again (requeue vs. job exit).
The slurm.conf doc puts it nicely:
This option controls the default ability for batch jobs to be
requeued. Jobs may be requeued explicitly by a system
administrator, after node failure, or upon preemption by a
higher priority job. If JobRequeue is set to a value of 1, then
batch jobs may be requeued unless explicitly disabled by the
user. If JobRequeue is set to a value of 0, then batch jobs will
not be requeued unless explicitly enabled by the user. Use the
sbatch --no-requeue or --requeue option to change the default
behavior for individual jobs. The default value is 1.
--
Paul Brunk, system administrator
Advanced Computing Resource Center
Enterprise IT Svcs, the University of Georgia