Cloudlab Node stuck in recovery mode

156 views
Skip to first unread message

Gregory Cusack

unread,
May 16, 2021, 7:23:06 PM5/16/21
to cloudlab-users
Hi!
I am running a 4-node cloudlab setup: 'DC-Ubuntu18-4node/dc-c220g5' and one of the nodes (node-1) has become unresponsive. The Cloudlab web interface says the node is "Ready" but I can only boot into recovery mode. I can't seem to turn recovery mode off. The Cloudlab interface just keeps saying "do you want to boot into recovery mode". I am running my own kernel version, so that may be an issue, but I've booted into that kernel version many times. If I power cycle the node or reboot it, it just boots into recovery mode again.

Any idea on how I can get this node rebooted not into recovery mode or reset? Any help would be appreciated! 
Thank you so much for your time!
- Greg Cusack

Mike Hibler

unread,
May 16, 2021, 9:16:26 PM5/16/21
to cloudla...@googlegroups.com
The filesystem on that node does not appear to be in great shape. You can
login when it is in recovery mode and run "e2fsck" and see.

As for why it doesn't leave recovery mode, I am not sure. I thought that
the recovery checkbox was a toggle so if you check it when it is in recovery
mode, it would turn it off. But I could be wrong on that!

But, you need to fix up your filesystem first.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/d1a2317d-c990-400f-a6e7-033c5427ea42n%40googlegroups.com.

Gregory Cusack

unread,
May 16, 2021, 9:57:42 PM5/16/21
to cloudlab-users
Hi Mike,
Thanks for the quick reply, I appreciate it!
Ahh ok, not sure why that's the case. I went in and repaired the /dev/sda1 filesystem via fsck. It was broken as you pointed out! /dev/sda4 and /dev/sdb seemed to be good to go. 

However, the /dev/sda partition looks like it may be screwed up. When I run: sudo /sbin/e2fsck -b 32768 /dev/sda , I get the following error:
/sbin/e2fsck: Bad magic number in super-block while trying to open /dev/sda

I got the backup superblock number from sudo /sbin/mke2fs -n /dev/sda. I tried different superblocks and all produced the same error. Any idea why e2fsck is failing? I'm not super familiar with filesystems so am almost certainly doing something wrong/missing something.

And, ya I remember the recovery mode being a toggle as well but for some reason this time it's not giving me the option to toggle it back into non-recovery. My guess is it's due to my filesystem I screwed up.

Thanks for your help Mike!
- Greg Cusack

Mike Hibler

unread,
May 17, 2021, 9:53:03 AM5/17/21
to cloudla...@googlegroups.com
/dev/sda refers to the whole disk on which sda1, sda4 are partitions.
So you should not be checking sda. So if you cleaned up sda1, you should
be in good shape.
> cloudlab-users/fc18c7a5-7642-4afc-b3da-93c9466f1949n%40googlegroups.com.

Gregory Cusack

unread,
May 17, 2021, 10:41:35 AM5/17/21
to cloudla...@googlegroups.com
Hi Mike,
Ahh ok sounds good. That makes sense. Thank you! Unfortunately, I still can't get the node to boot out of Recovery. I keep getting the following message every time I click the Actions->Recovery Mode button. If I remember correctly, after I boot into recovery mode, if I then go to Actions->Recovery, it should boot out of recovery mode, right? But, it keeps saying I'm about to boot INTO Recovery mode even though I'm already in recovery mode. Do I need to take another step?
Screen Shot 2021-05-17 at 8.37.19 AM.png                    Screen Shot 2021-05-17 at 8.37.50 AM.png

Thanks so much for your time and help, Mike!
- Greg

You received this message because you are subscribed to a topic in the Google Groups "cloudlab-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cloudlab-users/fQF2YnMzO0M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cloudlab-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloudlab-users/20210517135300.GB89075%40flux.utah.edu.

Leigh Stoller

unread,
May 17, 2021, 10:50:40 AM5/17/21
to cloudla...@googlegroups.com
at 7:41 AM, Gregory Cusack <Gregory...@colorado.edu> wrote:

> Ahh ok sounds good. That makes sense. Thank you! Unfortunately, I still can't get the node to boot out of Recovery. I keep getting the following message every time I click the Actions->Recovery Mode button. If I remember correctly, after I boot into recovery mode, if I then go to Actions->Recovery, it should boot out of recovery mode, right? But, it keeps saying I'm about to boot INTO Recovery mode even though I'm already in recovery mode. Do I need to take another step?

Hi. Reload the status page and try again please. Down in the lower right
of the Topology diagram is a “Refresh Status” button. Sometimes the Portal
gets a bit out of sync with reality, and that buttons brings it back in sync.
I clicked on that button for you …

Leigh

Gregory Cusack

unread,
May 17, 2021, 11:07:58 AM5/17/21
to cloudla...@googlegroups.com
Hi Leigh and Mike,
It worked! I am good to go! Thank you so much for helping me through this and doing it so quickly! Had a brief panic attack last night lol. 
Cloudlab and the people that run it are amazing! Thank you thank you!
Have a great week!
- Greg

--
You received this message because you are subscribed to a topic in the Google Groups "cloudlab-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cloudlab-users/fQF2YnMzO0M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cloudlab-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages