Schedule nodes for downtime

23 views
Skip to first unread message

Gabriel Bretschner

unread,
Jun 17, 2021, 3:34:48 AM6/17/21
to google-cloud-...@googlegroups.com
Hi,
Whenever I update the image that is used on the nodes, we may still have nodes running with jobs on the old image.
Is there a way to block these nodes for new jobs, such that the node goes down once all the jobs that are currently running stopped?

I only found ways to cancel the jobs that are running on the node and force it to go down, but I don't want to disturb the jobs that run on the node.

Thanks,
Gabriel

Alex Chekholko

unread,
Jun 17, 2021, 1:53:02 PM6/17/21
to Gabriel Bretschner, google-cloud-slurm-discuss
Hi Gabriel, 

Sure, set each of the "old" nodes to state "drain".  Though I am not sure if they will be power saved after they drain; have not tried it myself.

See node states in 'man sinfo' for more information.

Regards,
Alex

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/614C8E79-3722-42C3-8DDF-F406F4A28129%40lilt.com.

Nick Ihli

unread,
Jun 17, 2021, 2:25:08 PM6/17/21
to Alex Chekholko, Gabriel Bretschner, google-cloud-slurm-discuss
Alex's recommendation is a good one. Once drained, the node will be power saved.

--Nick



Nick Ihli
Director, Cloud and Sales Engineering
ni...@schedmd.com


Gabriel Bretschner

unread,
Jun 18, 2021, 2:57:02 AM6/18/21
to Nick Ihli, Alex Chekholko, google-cloud-slurm-discuss
Hi,
Thanks Alex and Nick. That was the solution. Not sure why I did not see that before, the documentation is quite obvious. Thanks for pointing me in the right direction!

best
Gabriel
Reply all
Reply to author
Forward
0 new messages