After "sudo reboot", the node stuck at "booted, changing and Pending"

23 views
Skip to first unread message

Gaokai Zhang

unread,
Jun 11, 2024, 3:47:26 PM (7 days ago) Jun 11
to cloudlab-users
After "sudo reboot", the node stuck at "booted, changing and Pending"
What should I do?

Leigh Stoller

unread,
Jun 11, 2024, 4:00:38 PM (7 days ago) Jun 11
to 'Nurlan Nazaraliyev' via cloudlab-users

> After "sudo reboot", the node stuck at "booted, changing and Pending"
> What should I do?

Hi. We can not help you without a link to the experiment status page.
You should also tell us what software you installed, did you install
a different kernel, etc.

You can click on the node in the topology and look at the
console, that is often very helpful.

Leigh


Gaokai Zhang

unread,
Jun 11, 2024, 4:05:43 PM (7 days ago) Jun 11
to cloudlab-users
https://www.cloudlab.us/status.php?uuid=e0d53971-280b-11ef-9f39-e4434b2381fc I have only ran:
sudo apt update && \ sudo apt install -y apt-transport-https ca-certificates curl software-properties-common && \ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/docker-archive-keyring.gpg > /dev/null && \ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null && \ sudo apt update && \ sudo apt install -y docker-ce sudo apt install -y ubuntu-drivers-common && \ sudo ubuntu-drivers autoinstall && \ sudo reboot

Mike Hibler

unread,
Jun 11, 2024, 4:47:31 PM (7 days ago) Jun 11
to cloudla...@googlegroups.com
You apparently did update the kernel and the new kernel version does not have
the Broadcom "bnxt" driver which is needed for the Internet-facing control
network.
> --
> You received this message because you are subscribed to the Google Groups
> "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to cloudlab-user...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> cloudlab-users/8912c80b-080b-4b6c-bde9-c14481742bc0n%40googlegroups.com.

Message has been deleted
Message has been deleted

Gaokai Zhang

unread,
Jun 12, 2024, 8:29:11 AM (7 days ago) Jun 12
to cloudlab-users
Oh, I didn't realize about it, because I was just using some script that seems to work before(I used it some times on small-lan with c240g5).
May I ask what is the correct way to do so?

Gaokai Zhang

unread,
Jun 12, 2024, 8:29:14 AM (7 days ago) Jun 12
to cloudlab-users
Yeah, you are correct, I just figure that out, but I've been using the command on small-lan and other profiles on some c240g5 but they worked fine, and I don't know why;
btw, do you know how to do it correctly?

On Tuesday, June 11, 2024 at 3:47:31 PM UTC-5 Mike Hibler wrote:

David M Johnson

unread,
Jun 12, 2024, 10:15:03 AM (7 days ago) Jun 12
to cloudla...@googlegroups.com
On 6/11/24 16:49, Gaokai Zhang wrote:
> Yeah, you are correct, I just figure that out, but I've been using the
> command on small-lan and other profiles on some c240g5 but they worked
> fine, and I don't know why;
> btw, do you know how to do it correctly?

Looks like you reloaded your node's disk and are back to the default
kernel. What kernel did you install? FWIW, I have never used the
ubuntu-drivers package to install the nvidia tools, so I can't really
say what it might have done, but a packaged kernel missing the `bnxt`
driver seems really unlikely.

> On Tuesday, June 11, 2024 at 3:47:31 PM UTC-5 Mike Hibler wrote:
>
> You apparently did update the kernel and the new kernel version does
> not have
> the Broadcom "bnxt" driver which is needed for the Internet-facing
> control
> network.
>
> On Tue, Jun 11, 2024 at 01:05:42PM -0700, Gaokai Zhang wrote:
> >
> https://www.cloudlab.us/status.php?uuid=e0d53971-280b-11ef-9f39-e4434b2381fc <https://www.cloudlab.us/status.php?uuid=e0d53971-280b-11ef-9f39-e4434b2381fc> I
> > have only ran:
> > sudo apt update && \ sudo apt install -y apt-transport-https
> ca-certificates
> > curl software-properties-common && \ curl -fsSL
> https://download.docker.com/ <https://download.docker.com/>
> > linux/ubuntu/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/
> > docker-archive-keyring.gpg > /dev/null && \ echo "deb [arch=amd64
> signed-by=/
> > usr/share/keyrings/docker-archive-keyring.gpg]
> https://download.docker.com/ <https://download.docker.com/>
> https://groups.google.com/d/msgid/ <https://groups.google.com/d/msgid/>
> >
> cloudlab-users/8912c80b-080b-4b6c-bde9-c14481742bc0n%40googlegroups.com <http://40googlegroups.com>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "cloudlab-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cloudlab-user...@googlegroups.com
> <mailto:cloudlab-user...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cloudlab-users/7801b121-4cb8-43f3-8416-b3b16e0978a5n%40googlegroups.com <https://groups.google.com/d/msgid/cloudlab-users/7801b121-4cb8-43f3-8416-b3b16e0978a5n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Mike Hibler

unread,
Jun 12, 2024, 10:21:24 AM (7 days ago) Jun 12
to cloudla...@googlegroups.com
The different node types have different hardware. The experiment in question
is running on an instance of the "d8545" node and not a "c240g5". The former
has a control net interface with the newer Broadcom chipset while the latter
has an Intel-based control net NIC.
> cloudlab-users/7801b121-4cb8-43f3-8416-b3b16e0978a5n%40googlegroups.com.

David M Johnson

unread,
Jun 12, 2024, 10:30:01 AM (7 days ago) Jun 12
to cloudla...@googlegroups.com
On 6/12/24 08:14, David M Johnson wrote:
> On 6/11/24 16:49, Gaokai Zhang wrote:
>> Yeah, you are correct, I just figure that out, but I've been using the
>> command on small-lan and other profiles on some c240g5 but they worked
>> fine, and I don't know why;
>> btw, do you know how to do it correctly?
>
> Looks like you reloaded your node's disk and are back to the default
> kernel. What kernel did you install? FWIW, I have never used the
> ubuntu-drivers package to install the nvidia tools, so I can't really
> say what it might have done, but a packaged kernel missing the `bnxt`
> driver seems really unlikely.

Mike said it looked like you had built a kernel from source. It is
pretty hard to capture required kernel config options to get your custom
kernel to run on all Cloudlab machines. My standard advice is therefore
to copy the stock Ubuntu kernel config to linux/.config and `make
olddefconfig`, if the two kernel versions are reasonably close. Then
you can customize your custom kernel options further. Make sure to
install both your kernel and initramfs, and update /boot/grub/grub.cfg
(sudo grub-mkconfig -o /boot/grub/grub.cfg.new ; verify
/boot/grub/grub.cfg.new looks correct; then mv /boot/grub/grub.cfg.new
/boot/grub/grub.cfg).
Reply all
Reply to author
Forward
0 new messages