Hi,
I've just finished setup of a single node "cluster" with slurm on ubuntu 20.04. Infrastructural limitations prevent me from running it 24/7, and it's only powered on during business hours.
Currently, I have a cron job running that hibernates that sole node before closing time.
The hibernation is done with standard systemd, and hibernates to the swap partition.
I have not run any lengthy slurm jobs on it yet. Before I do, can I get some thoughts on a couple of things?
If it hibernated when slurm still had jobs running/queued, would they resume properly when the machine powers back on?
Note that my swap space is bigger than my RAM.
Is it necessary to perhaps setup a pre-hibernate script for systemd to iterate scontrol to suspend all the jobs before hibernating and resume them post-resume?
What about the wall times? I'm uessing that slurm will count the downtime as elapsed for each job. Is there a way to config this, or is the only alternative a post-hibernate script that iteratively updates the wall times of the running jobs using scontrol again?
Thanks for your attention.
Regards
AR