Warewulf bootstrap: Network hardware was not recognized!

40 views
Skip to first unread message

Daniel Mare

unread,
Feb 21, 2023, 4:04:45 AM2/21/23
to Warewulf
I have a cluster with 100 nodes, but when I tried to add a new node recently with slightly different hardware, it failed to boot with the following error messages appearing:
Loading drivers: uci-hcd ohci-hcd ehci-hcd whci-hcd isp1362-hcd ci-hc4 s1811-hcd sd_mod
Detecting hardware: mlx4_core ahci
Bringing up local loopback network:
ERROR: Network hardware was not recognized!

In a Debug shell, I only see the loopback interface:
ls -l /sys/class/net/
lo -> ../../devices/virtual/net/lo
cat /sys/devices/virtual/net/lo/address
00:00:00:00:00:00

This new Gigabyte Z690 UD DDR4 V2 motherboard has a Realtek 2.5GbE NIC instead of 1GbE NICs I have on other nodes. (I use this onboard NIC just for the PXE network. They do all have separate high-throughput InfiniBand cards.) I have tried putting a different network card in, but the motherboard can't PXE boot from discrete NIC unless CSM Support is enabled and I have had trouble enabling CSM Support. I think it needs a discrete graphics card for this and even then I'm not sure if this path will work and seems unnecessary for something I should be able to fix in software.

I tried to figure out which driver we need by booting from a Live Ubuntu USB and ran ethtool
ethtool -i eth0
driver: r8169
version: 5.15.0-43-generic
firmware-version: rtl8125b-2_0.0.2 07/13/20
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

I believe that r8169 driver is already installed, because on a different node using the same image I do find this file present:
/lib/modules/3.10.0-957.1.3.el7.x86_64/kernel/drivers/net/ethernet/realtek/r8169.ko.xz

I have also edited /etc/warewulf/bootstrap.conf and added a line like this:
modprobe += r8169

Then I rebuilt the bootstrap with wwsh bootstrap rebuild <mybootstrapname>

Still, the results are the same with it not booting with that same error message.

Any advice about how to further troubleshoot or what else I could try?

Jason Stover

unread,
Feb 21, 2023, 11:54:06 AM2/21/23
to ware...@lbl.gov
We had some 3com NICs that we needed to pull firmware from a newer
kernel and put into place... Is there a firmware change in a newer
kernel that lets you see the NIC?

-J
> --
> You received this message because you are subscribed to the Google Groups "Warewulf" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
> To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/32038058-e43d-4bdc-b789-d88d6db85772n%40lbl.gov.

Daniel Mare

unread,
Feb 21, 2023, 7:27:25 PM2/21/23
to Warewulf, Jason Stover
Hi Jason, 

Quite possibly.  I haven't compared the firmware files, but I will give this a try.

Should I simply copy the r8169.ko.xz file from a newer Linux that sees the NIC and overwrite the current r8169.ko.xz file on the master node?  Or how did you go about pulling the firmware from a newer kernel? 

Jason Stover

unread,
Feb 22, 2023, 10:06:40 AM2/22/23
to Daniel Mare, Warewulf
Not the driver.... I think you'd need to see if there's an update in
(I *think*): /lib/firmware/rtl_nic/

Probably some rtl*.fw. file

-J
Reply all
Reply to author
Forward
0 new messages