Kernel upgrade/swap fails for UBUNTU 20

85 views
Skip to first unread message

Rajveer B

unread,
Mar 26, 2023, 7:14:55 PM3/26/23
to cloudlab-users
Hello,

Machine:

c4130 - Wisconsin cluster.
OS release - standard ubuntu 20.04 image from Emulab
kernel version - 5.4.0-100-generic

I am trying to swap the default kernel version, stated above, with version kernel version 5.4.229

After swapping kernel successfully and booting, the machine works fine for a while before it drops SSH connection and does not allow to SSH again. The machine status remains ready, but it is not possible to make an SSH connection even after rebooting once again.

Please let me know how to resolve this.

Thanks.

- Rajveer

Mike Hibler

unread,
Mar 26, 2023, 11:53:12 PM3/26/23
to cloudla...@googlegroups.com
See this message (and the whole thread).

https://groups.google.com/g/cloudlab-users/c/B6rNj7Vhltk/m/rwkHf_kwAgAJ

There is a good chance that the NetworkManager got installed and is
interfering with the Cloudlab control network setup.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/7186e4c7-ab62-4c50-8c2e-e471877ad370n%40googlegroups.com.

Rajveer B

unread,
Apr 15, 2023, 7:24:02 PM4/15/23
to cloudlab-users
Hello Mike,


I tried the option suggested in the link you sent, but none them work, I am unable to SSH into my node.

Mike Hibler

unread,
Apr 15, 2023, 8:31:39 PM4/15/23
to cloudla...@googlegroups.com
It looks like your kernel is not even booting, so this is not a problem
with the network manager. You can see this in the console log: in the
Topology View left click on the node and choose Console Log. You can see
that it never makes it out of the BIOS.

Since you installed a new kernel, you may have clobbered the first-level
Grub install. Grub normally wants to install in the MBR at the beginning
of the disk. However, because of the way the Cloudlab PXE boot loader works,
it needs to be installed at the beginning of the OS partition.

For directions on how to fix this, see:

https://gitlab.flux.utah.edu/emulab/emulab-devel/-/wikis/faq/Using-the-Testbed/Using-the-Recovery-MFS

in particular, the "reinstalling grub" section (but don't skip the earlier
"mounting the root filesystem" and "using chroot to run on-disk programs"
sections).
> cloudlab-users/efb92457-25ce-47ab-872f-7adbb4b589edn%40googlegroups.com.

Rajveer B

unread,
Apr 16, 2023, 1:30:25 AM4/16/23
to cloudlab-users
Sorry I forgot to state this earlier, but I tried to install CUDA and CuDNN, before I do the swap, which caused this.

I actually followed recovery steps in the link including "reinstalling grub steps" after which I did the following:
1. Exit recovery mode.
2. Reboot into normal state (successfully)
But, it hangs up suddenly after a few mins.

Mike Hibler

unread,
Apr 16, 2023, 9:16:40 PM4/16/23
to cloudla...@googlegroups.com
So the kernel was booting all along, it is just the console is misconfigured.
It seems to be just one this one machine, which I have fixed now. Your current
experiment should now have a working console.

I was about to say that the node was up and running again, but it siezed up
right then. The newly responsive console is no longer responsive, so it could
be a HW problem. But I also notice that the Nvidia stuff fires up an X server.
I rebooted again and stopped and disabled the gdm.service which seems to have
stopped the X server and associated goo. I wonder if something is putting
the machine in sleep or power save mode. C-states are disabled in the BIOS,
but who knows?

It is up and running again right now, we will see if it lasts.
> > cloudlab-users/efb92457-25ce-47ab-872f-7adbb4b589edn%40googlegroups.com.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/91a6e92c-b265-4d71-b264-7b7d25cb74d9n%40googlegroups.com.

Rajveer B

unread,
Apr 17, 2023, 5:04:47 PM4/17/23
to cloudlab-users
Thanks. It's still up :)
Reply all
Reply to author
Forward
0 new messages