IPMI connection lost

41 views
Skip to first unread message

Jose Bautista Barato

unread,
Mar 28, 2023, 8:35:16 AM3/28/23
to Warewulf
Hi folks.  I faced some issues during stateless provisioning with Dell servers.
## Context
I provision servers using warewulf 4.4.0-1 and rocky8 container image.
The servers have been configured with UEFI boot mode.
Once the server provisioning is finished, I lost idrac console. IPMI address is not pingable.
But I can acces to servers via the IP addresses configured in warewulf during node registration step.
I tried 2 servers and the same thing happened.


## Server configuration
Servers have 2 NICs. One is integrated NIC for node provisioning, another one is infiniband for BMC. I disabled pxe for BMC interface because it could request dhcp lease during pxe boot and IPMI ip address can be changed.
So only node provisioning NIC is enabled for PXE and PXE boot is success.
Then I lost the IPMI connection.

If there is any workaround for this, please share with me.
BR.
Jose

Jonathon Anderson

unread,
Mar 28, 2023, 11:44:04 AM3/28/23
to ware...@lbl.gov
Jose,

Do you have ipmi.write = true configured for these nodes in your nodes.conf? Warewulf has functionality to configure IPMI during boot, which may not be what you expect.

~jonathon


--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/ade0ede9-6401-4f5a-b6d6-f3486041b59dn%40lbl.gov.

John Hearns

unread,
Apr 6, 2023, 10:53:25 AM4/6/23
to ware...@lbl.gov
Dell servers can use a dedicated NIC port for iDRAC  or share one of the ethernet ports for idRAC  traffic.
From what you say you are not using the dedicated NIC port 
I would advise doing the following - look up iDRAC direct and connect a USB cable to the server. You should then be able to access the idRAC using a laptop and a feature called iDRAC Direct.
It could be that the iDRAC has now been configured to use the dedicated interface.

You can also get the setting for dedicated or shared NIC port using racadm or Redfish.
Do you have racadm available on your compute servers when they are booted up?
Try    racadm getniccfg


Why not consider a cheap 1Gbps switch for dedicated iDRAC ports?


--

Jorge L. Florit

unread,
Jul 5, 2023, 1:44:19 PM7/5/23
to Warewulf, John Hearns
Hi,

I have a very similar scenario with Warewulf 4.4.0-1-, Rocky8.6 image imported from docker://ghcr.io/hpcng/warewulf-rockylinux:8.6, Dell nodes with shared IPMI NIC (which I cannot change since I'm working remotely and I cannot have a direct serial/USB cable attached to reconfigure it if I loose connectivity).

The problem with this image is if there are unconfigured NICs they are set administratively down even if they are connected. I happens not only with the shared NIC but with other interfaces connected to a switch with link on it.
It is shown this way,
eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
And ethtool shows "Link detected: no" with is not true since it is up in the switch.

In the same node (and others with same hardware) with a manual installation on disk with CentOS 7, even if the ports are set with ONBOOT=no (NM: connection.autoconnect: no), they are unconfigured but UP, and there is no problem with any unconfigured but connected NIC and no IPMI connectivity issue.
em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
Also ethtool shows "Link detected: yes".

As a workaround I made an ifcfg-intname with BOTPROTO=none and ONBOOT=yes (same as it is in CentOS 7), but I lost the connection to the IPMI until the interface is UP in the OS, which does not happen in CentOS 7.
Also, in another type of nodes I disabled in the OS (set BIOS.IntegratedDevices.EmbNic1 DisabledOs) with racadm the shared NIC and it worked as expected without IPMI connection lost, but in this node there is only the BIOS.IntegratedDevices.IntegratedNetwork1 (there is no separated Embedded NIC) and it causes the lost of PXE boot options for all the NICs, since all of them are in the same card.


The question is, there is any configuration or an option or driver missing in the kernel or initrd or whatever that makes the interfaces not bringing up at boot if there is no explicit network configuration in the OS?


Regards

Jorge L. Florit

unread,
Jul 7, 2023, 10:58:34 AM7/7/23
to Warewulf, Jorge L. Florit, John Hearns
In the Warewulf Slack channel I was advised to change the EFI iPXE image with snponly.efi to avoid the network being reset by iPXE during the boot stages. I don't know much about this file nor the details and differences it has, but maybe someone else could explain it better.

To do this workaround you can follow these steps:
git clone https://github.com/ipxe/ipxe.git
cd ipxe/src/
sed -i.bak -e 's,//\(#define.*CONSOLE_SERIAL.*\),\1,' -e 's,//\(#define.*CONSOLE_FRAMEBUFFER.*\),\1, config/console.sh
sed -i.bak -e 's,//\(#define.*IMAGE_ZLIB.*\),\1,' -e 's,//\(#define.*IMAGE_GZIP.*\),\1,' -e 's,//\(#define.*VLAN_CMD.*\),\1,' config/general.h
make  bin-x86_64-efi/snponly.efi
cp -p bin-x86_64-efi/snponly.efi /var/lib/tftpboot/warewulf/   # Or wherever you are keeping this stuff.
mv /var/lib/tftpboot/warewulf/x86_64.efi{,.bak}
ln -s /var/lib/tftpboot/warewulf/snponly.efi /var/lib/tftpboot/warewulf/x86_64.efi


This way the unconfigured network interfaces are still administratively down but IPMI shared NIC is not set down at boot, and it seems there is no connectivity lost to it and remains up after booting the image.
Reply all
Reply to author
Forward
0 new messages