[slurm-users] Decreasing time limit of running jobs (notification)
776 views
Skip to first unread message
Amjad Syed
unread,
Jul 6, 2023, 11:54:06 AM7/6/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List
Hello
We were trying to increase the time limit of a slurm running job
scontrol update job=<jobid> TimeLimit=16-00:00:00
But we accidentally got it to 16 hours
scontrol update job=<jobid> TimeLimit=16:00:00
This actually timeout and killed the running job and did not give any notification
Is this a bug, should not the user be warned that this job will be killled ?
Amjad
Jason Simms
unread,
Jul 6, 2023, 12:05:10 PM7/6/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List
No, not a bug, I would say. When the time limit is reached, that's it, job dies. I wouldn't be aware of a way to manage that. Once the time limit is reached, it wouldn't be a hard limit if you then had to notify the user and then... what? How long would you give them to extend the time? Wouldn't be much of a limit if a job can be extended, plus that would throw off the scheduler/estimator. I'd chalk it up to an unfortunate typo.
Jason
--
Jason L. Simms, Ph.D., M.P.H.
Manager of Research Computing
Swarthmore College Information Technology Services
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List
Yes, the initial End Time was 7-00:00:00 but it allowed the typo (16:00:00) which caused the jobs to be killed without warning
Amjad
Jason Simms
unread,
Jul 6, 2023, 1:14:52 PM7/6/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List
An unfortunate example of the “with great power comes great responsibility” maxim. Linux will gleefully let you rm -fr your entire system, drop production databases, etc., provided you have the right privileges. Ask me how I know…
Still, I get the point. Would it be possible to somehow ask for confirmation if you are setting a max time that is less than the current walltime? Perhaps. Could you script that yourself? Yes, I’m certain of it. Those kind of built-in safeguards aren’t super common, however.
Jason
Amjad Syed
unread,
Jul 6, 2023, 1:38:39 PM7/6/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List
Agreed the point of greater responsibility but even rm -r ( without f) gives a warning. In this case should slurm have that option ( forced) especially if it can immediately kill a running job?
Jason Simms
unread,
Jul 6, 2023, 1:43:53 PM7/6/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List
My opinion is no, at least not forced.
Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
unread,
Jul 6, 2023, 1:48:32 PM7/6/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List
Given that the usual way to kill a job that's running is to use scancel, I would tend to agree that killing by shortening the walltime to below the already used time is likely to be an error, and deserves a warning.
Davide DelVento
unread,
Jul 10, 2023, 12:07:51 PM7/10/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Slurm User Community List
Actually rm -r does not give ANY warning, so in plain Linux "rm -r /" run as root would destroy your system without notice. Your particular Linux distro may have implemented safeguards with a shell alias such as `alias rm='rm -i'` and that's a common thing, but not guaranteed to be there