[slurm-dev] node in down state from "unexpected reboot"

9 views
Skip to first unread message

Trevor Gale

unread,
Jul 29, 2015, 3:57:51 PM7/29/15
to slurm-dev

Hello all,

I recently rebooted one of my nodes, and when it came back up slurm was running fine but when I run “sinfo” I see that it’s state i set to down. When I run scontrol show node compute0 it says that the reason is “unexpectedly rebooted”.

How do I go about bringing the node back up?

Thanks,
Trevor

Alejandro Sanchez

unread,
Jul 29, 2015, 4:02:00 PM7/29/15
to slurm-dev
Hi Trevor,

maybe this link is useful: http://slurm.schedmd.com/faq.html#reboot

Regards,
Alejandro

Trevor Gale

unread,
Jul 29, 2015, 4:13:53 PM7/29/15
to slurm-dev
awww thank you very much!

Thanks,
Trevor

Robbert Eggermont

unread,
Jul 31, 2015, 10:09:54 AM7/31/15
to slurm-dev

On 07/29/2015 09:57 PM, Trevor Gale wrote:
> I recently rebooted one of my nodes, and when it came back up slurm was running fine but when I run “sinfo” I see that it’s state i set to down. When I run scontrol show node compute0 it says that the reason is “unexpectedly rebooted”.

I have the same problem. According to the slurm.conf man page (Slurm
14.11.8), when I reboot a node using 'scontrol reboot_nodes <node>', it
should be returned to normal use, but instead it stays down (Reason=Node
unexpectedly rebooted)?

I should not have to set ReturnToService=2 for this, right?

Thanks,

Robbert

--
Robbert Eggermont Intelligent Systems
R.Egg...@tudelft.nl Electr.Eng., Mathematics & Comp.Science
+31 15 27 83234 Delft University of Technology
Reply all
Reply to author
Forward
0 new messages