We have an HPC where the average job length is measured in days, not hours.
Users are careful to add checkpoints to their jobs but even in that case, preempting a job that is close to its walltime (max: 14 days) can be very disruptive.
I checked what options preemption offers but none seem to protect jobs near their finishing line.
PreemptExempTime ensures a minimum job runtime and GraceTime allows for a grace time period after the job has been selected for preemption.
Is there anything I am missing to achieve what I want?
Thank you!
--
slurm-users mailing list --
slurm...@lists.schedmd.com
To unsubscribe send an email to
slurm-us...@lists.schedmd.com