Problem booting a stateless node

306 views
Skip to first unread message

Fabio I. Zyserman

unread,
May 11, 2023, 12:45:37 PM5/11/23
to Warewulf
Hi all,

I am trying to set up a small (stateless) cluster from scratch.

I have downloaded the OpenHPC package (warewulf 3.9.0 + suse leap 15.4) and following the  instructions in the Cluster Building Recipes configured the head node.

I have problems to boot a first test node.

In one of the logs I can see the following

2023-05-11T11:52:05.874038-03:00 localhost dhcpd: DHCPDISCOVER from f0:4d:a2:01:1b:89 via eth1
2023-05-11T11:52:05.874257-03:00 localhost dhcpd: DHCPOFFER on 192.168.0.11 to f0:4d:a2:01:1b:89 via eth1
2023-05-11T11:52:09.938730-03:00 localhost dhcpd: DHCPREQUEST for 192.168.0.11 (192.168.0.1) from f0:4d:a2:01:1b:89 via eth1
2023-05-11T11:52:09.938917-03:00 localhost dhcpd: DHCPACK on 192.168.0.11 to f0:4d:a2:01:1b:89 via eth1
2023-05-11T11:52:09.949002-03:00 localhost systemd[1]: Started Tftp Server.
2023-05-11T11:52:09.965015-03:00 localhost in.tftpd[2788]: tftp: client does not accept options

This says that the dhcp server is able to assign the ip 192.168.0.11 to the node net
card, doesn't it? So, the dhcp part seems to be working ok...

The log file from apache shows no activity.

It seems that the Tftp server  doesn't get the PXE to bootstrap to iPXE...

I think I followed the instructions carefully, checked that the necessary files are where they should... I do not know how to debug this problem!

As a partial description of the installation, I have that

the tftp directory is /srv/tftpboot

In /srv/tftpboot/warewulf the installed files are:

./ipxe
./ipxe/bin-i386-pcbios
./ipxe/bin-i386-pcbios/undionly.kpxe
./ipxe/bin-x86_64-efi
./ipxe/bin-x86_64-efi/snp.efi
./ipxe/bin-i386-efi
./ipxe/bin-i386-efi/snp.efi

and in /srv/warewulf

./initramfs
./initramfs/x86_64
./initramfs/x86_64/base
./initramfs/x86_64/capabilities
./initramfs/x86_64/capabilities/provision-adhoc
./initramfs/x86_64/capabilities/provision-files
./initramfs/x86_64/capabilities/provision-selinux
./initramfs/x86_64/capabilities/provision-vnfs
./initramfs/x86_64/capabilities/setup-filesystems
./initramfs/x86_64/capabilities/transport-http
./initramfs/x86_64/capabilities/setup-ipmi
./ipxe
./ipxe/cfg
./ipxe/cfg/f0:4d:a2:01:1b:89
./ipxe/cfg/00:21:9b:a4:80:7c
./ipxe/cfg/f0:4d:a2:01:10:d9
./ipxe/cfg/f0:4d:a2:01:12:92
./ipxe/cfg/f0:4d:a2:01:1d:39
./ipxe/cfg/f0:4d:a2:01:1b:b6
./ipxe/cfg/18:03:73:f5:96:06
./bootstrap
./bootstrap/x86_64
./bootstrap/x86_64/5
./bootstrap/x86_64/5/initfs.gz
./bootstrap/x86_64/5/kernel
./bootstrap/x86_64/5/cookie

The config file for the first node, that is f0:4d:a2:01:1b:89 is:

#!ipxe
# Configuration for Warewulf node: geof1
# Warewulf data store ID: 10
echo Now booting geof1 with Warewulf bootstrap (5.14.21-150400.24.60-default)
set base http://192.168.0.1/WW/bootstrap
initrd ${base}/x86_64/5/initfs.gz
kernel ${base}/x86_64/5/kernel ro initrd=initfs.gz wwhostname=geof1 net.ifnames=0 biosdevname=0 console=tty0 console=ttyS1,115200 wwmaster=192.168.0.1 wwipaddr=192.168.0.11 wwnetmask=255.255.255.0 wwnetdev=eth1 wwhwaddr=f0:4d:a2:01:1b:89
boot

Finally, the relevant part of the dhcpd.conf is:

# DHCPD Configuration written by Warewulf. Do not edit this file, rather
# edit the template: /etc/warewulf/dhcpd-template.conf

allow booting;
allow bootp;
ddns-update-style interim;
authoritative;

# Declare the iPXE option space
option space ipxe;

# Tell iPXE to not wait for ProxyDHCP requests to speed up boot.
option ipxe.no-pxedhcp code 176 = unsigned integer 8;
option ipxe.no-pxedhcp 1;

option ipxe-encap-opts code 175 = encapsulate ipxe;

# iPXE feature flags, set in DHCP request packet
option ipxe.http      code 19 = unsigned integer 8;
option ipxe.bzimage   code 24 = unsigned integer 8;
option ipxe.efi       code 36 = unsigned integer 8;

option architecture-type   code 93  = unsigned integer 16;
if exists ipxe.http and ( exists ipxe.bzimage or exists ipxe.efi ) {
    filename "http://192.168.0.1/WW/ipxe/cfg/${mac}";
} else {
    if option architecture-type = 00:0B {
        filename "/warewulf/ipxe/bin-arm64-efi/snp.efi";
    } elsif option architecture-type = 00:0A {
        filename "/warewulf/ipxe/bin-arm32-efi/placeholder.efi";
    } elsif option architecture-type = 00:09 {
        filename "/warewulf/ipxe/bin-x86_64-efi/snp.efi";
    } elsif option architecture-type = 00:07 {
        filename "/warewulf/ipxe/bin-x86_64-efi/snp.efi";
    } elsif option architecture-type = 00:06 {
        filename "/warewulf/ipxe/bin-i386-efi/snp.efi";
    } elsif option architecture-type = 00:00 {
        filename "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe";
    }
}

Any help is immensely appreciated!

Cheers,

Fabio


IMG_20230505_153528067.jpg

Anderson, Richard O - (ric)

unread,
May 11, 2023, 2:46:40 PM5/11/23
to ware...@lbl.gov

That looks like a good DHCP transaction, followed by a failed tftpboot.  For PXE boots, lots of things can go sideways. 

You my find something in /var/log/<someplace> and/or

systemctl status xinetd

 

Finding a way to pass a verbosity flag to in.tftpd (in RHEL, that's an edit to /etc/xinetd.d/tftp, plus restarting xinetd) may provide more info.

 

There's also tcpdump with much verbosity to watch the tftp transaction, which may help you spot the option packet that's causing pain

 

Cheers,

Ric

--

 

From: Fabio I. Zyserman <f.zys...@gmail.com>
Date: Thursday, May 11, 2023 at 11:53
To: Warewulf <ware...@lbl.gov>
Subject: [EXT][Warewulf] Problem booting a stateless node

External Email

--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/b034eb90-6ea8-4582-ae74-f057a8675de3n%40lbl.gov.

Jonathon Anderson

unread,
May 11, 2023, 2:53:53 PM5/11/23
to ware...@lbl.gov
I don't have much Warewulf 3 experience, but it looks to me like DHCP is succeeding in the PXE boot rom, but then failing in iPXE. At least in Warewulf 4, DHCP is pulled twice, once in the boot rom, then again in iPXE. I wouldn't assume it works the same in Warewulf 3, but that certainly seems to be what's failing in your console log.

What kind of network interface are you booting over? We've seen similar problems, for example, when the boot interface is a native 10GbE interface that's plugged into a 1GbE network: iPXE sometimes doesn't auto-negotiate down to 1GbE correctly.

~jonathon


Fabio I. Zyserman

unread,
May 11, 2023, 4:33:34 PM5/11/23
to Warewulf, Anderson, Richard O - (ric)
Hi again,

thanks for your suggestions.

I increased the verbosity of tptpd, and got:

May 11 16:35:03 quipu systemd[1]: Started Tftp Server.
May 11 16:35:03 quipu in.tftpd[15676]: RRQ from 192.168.0.11 filename /warewulf/ipxe/bin-i386-pcbios/undionly.kpxe
May 11 16:35:03 quipu in.tftpd[15676]: tftp: client does not accept options
May 11 16:35:03 quipu in.tftpd[15677]: RRQ from 192.168.0.11 filename /warewulf/ipxe/bin-i386-pcbios/undionly.kpxe
May 11 16:50:04 quipu systemd[1]: tftp.service: Deactivated successfully.

It seems that the node is pulling the file but is unable to boot from it, isn't it?

What could be done here? I do not understand why, either, why it is calling the file twice...

Thanks for your help!

Fabio I. Zyserman

unread,
May 11, 2023, 4:46:05 PM5/11/23
to Warewulf, Jonathon Anderson
Thanks for your answer!

Both the headnode and node/s have 1Gb interfaces; the system is rather old...

Concerning my answer to Richard, could it be that the undionly.kpxe file has some information inside
that can't be processed by a rather old networking card ?

Cheers,

Fabio

Jonathon Anderson

unread,
May 12, 2023, 4:58:29 PM5/12/23
to Fabio I. Zyserman, Warewulf
Usually it works the other way: sometimes newer network cards aren't supported in iPXE yet.

Do you have the switch port configured as an access port? Sometimes STP delays Ethernet link enough that DHCP can't get through before iPXE times out.

Just how old is the server? Any chance you could switch to EFI mode? Just in case the card is better supported in EFI mode.

~jonathon

Fabio I. Zyserman

unread,
May 13, 2023, 12:18:04 PM5/13/23
to Warewulf, Anderson, Richard O - (ric), ware...@lbl.gov

Hi again,

the output of tcpdump is rather obscure to me, but from it, I believe that the undionly.kpxe is being transmitted
but some kind of problem arises because lots of messages like "[bad udp cksum 0x8185 -> 0xacfe!]" appears.
I am unable to see where the problem arises, whether in the server (quipu) or in the node (geof1)
Again, any help is welcome!

I copy some lines of the tcpdump:

         Client-Ethernet-Address f0:4d:a2:01:1b:89 (oui Unknown)
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: Request
            Requested-IP (50), length 4: geof1.localdomain
            Parameter-Request (55), length 24:
              Subnet-Mask (1), Time-Zone (2), Default-Gateway (3), IEN-Name-Server (5)
              Domain-Name-Server (6), RL (11), Hostname (12), BS (13)
              Domain-Name (15), SS (16), RP (17), EP (18)
              Vendor-Option (43), Server-ID (54), Vendor-Class (60), BF (67)
              Unknown (128), Unknown (129), Unknown (130), Unknown (131)
              Unknown (132), Unknown (133), Unknown (134), Unknown (135)
            MSZ (57), length 2: 1260
            Server-ID (54), length 4: quipu
            GUID (97), length 17: 0.68.69.76.76.57.0.16.87.128.51.198.192.79.75.78.49
            ARCH (93), length 2: 0
            NDI (94), length 3: 1.2.1
            Vendor-Class (60), length 32: "PXEClient:Arch:00000:UNDI:002001"
            END (255), length 0
            PAD (0), length 0, occurs 200
23:26:38.389435 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    quipu.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0xa6011b89, secs 32, Flags [Broadcast] (0x8000)
          Your-IP geof1.localdomain
          Server-IP quipu
          Client-Ethernet-Address f0:4d:a2:01:1b:89 (oui Unknown)
          file "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe"
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message (53), length 1: ACK
            Server-ID (54), length 4: quipu
            Lease-Time (51), length 4: 43200
            Subnet-Mask (1), length 4: 255.255.255.0
            Hostname (12), length 5: "geof1"
            END       PAD (0), length 0, occurs 31
23:26:38.392737 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has quipu tell geof1.localdomain, length 46
23:26:38.392766 ARP, Ethernet (len 6), IPv4 (len 4), Reply quipu is-at 00:15:17:f9:06:8e (oui Unknown), length 28
23:26:38.392907 IP (tos 0x0, ttl 20, id 5, offset 0, flags [none], proto UDP (17), length 89)
    geof1.localdomain.ah-esp-encap > quipu.tftp: [udp sum ok] TFTP, length 61, RRQ "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe" octet tsize 0
23:26:38.420462 IP (tos 0x0, ttl 64, id 10214, offset 0, flags [none], proto UDP (17), length 42)
    quipu.48121 > geof1.localdomain.ah-esp-encap: [bad udp cksum 0x8184 -> 0xd7f4!] UDP, length 14
23:26:38.420550 IP (tos 0x0, ttl 20, id 6, offset 0, flags [none], proto UDP (17), length 45)
    geof1.localdomain.ah-esp-encap > quipu.48121: [udp sum ok] UDP, length 17
23:26:38.420718 IP (tos 0x0, ttl 20, id 7, offset 0, flags [none], proto UDP (17), length 94)
    geof1.localdomain.acp-port > quipu.tftp: [udp sum ok] TFTP, length 66, RRQ "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe" octet blksize 1456
23:26:38.421764 IP (tos 0x0, ttl 64, id 18163, offset 0, flags [none], proto UDP (17), length 43)
    quipu.50818 > geof1.localdomain.acp-port: [bad udp cksum 0x8185 -> 0xacfe!] UDP, length 15
23:26:38.421900 IP (tos 0x0, ttl 20, id 8, offset 0, flags [none], proto UDP (17), length 32)
    geof1.localdomain.acp-port > quipu.50818: [udp sum ok] UDP, length 4
23:26:38.422014 IP (tos 0x0, ttl 64, id 18164, offset 0, flags [none], proto UDP (17), length 1488)
    quipu.50818 > geof1.localdomain.acp-port: [bad udp cksum 0x872a -> 0xdfae!] UDP, length 1460
23:26:38.422152 IP (tos 0x0, ttl 20, id 9, offset 0, flags [none], proto UDP (17), length 32)

Cheers,

Fabio
On Thursday, 11 May 2023 at 15:46:40 UTC-3 Anderson, Richard O - (ric) wrote:

John Hanks

unread,
May 14, 2023, 7:23:11 PM5/14/23
to ware...@lbl.gov, Anderson, Richard O - (ric)
Hi Fabio,

Old systems are more likely to have buggy TFTP clients, which are hard to troubleshoot. You can use this option in your tftpd config to see if blockize might matter:

--blocksize max-block-size, -B max-block-size
Specifies the maximum permitted block size. The permitted range for this parameter is from 512 to 65464. Some embedded clients request large block sizes and yet do not handle fragmented packets correctly; for these clients, it is recommended to set this value to the smallest MTU on your network minus 32 bytes (20 bytes for IP, 8 for UDP, and 4 for TFTP; less if you use IP options on your network.) For example, on a standard Ethernet (MTU 1500) a value of 1468 is reasonable.

griznog

Fabio I. Zyserman

unread,
May 15, 2023, 11:14:07 AM5/15/23
to ware...@lbl.gov, Anderson, Richard O - (ric)
Hi John, thanks for your suggestion. I implemented it, and regretfully nothing changed; I mean, still no boot.

I'll continue trying options and let you know when (and if) I succeed.

Cheers,

Fabio

Fabio I. Zyserman

unread,
May 15, 2023, 3:56:00 PM5/15/23
to ware...@lbl.gov, Anderson, Richard O - (ric)
Hi again.

I installed wireshark to see if the analysis of the packets can be easier.

I attach a couple of relevant screenshots. It seems that the undionly.kpxe file is
indeed transmitted to the node.

However, it does not boot, so what I guess is that this file can't be processed in
the network card of the node.

At this point, I am starting to evaluate to give up network booting, and install the
os in the nodes one by one...

Do you more experienced people have any clue on what to do otherwise?

Cheers,

Fabio

On Sun, 14 May 2023 at 20:23, John Hanks <gri...@gmail.com> wrote:
ws2.png
ws1.png

Jonathon Anderson

unread,
May 17, 2023, 3:00:16 AM5/17/23
to ware...@lbl.gov, Anderson, Richard O - (ric)
> However, it does not boot, so what I guess is that this file can't be processed in the network card of the node.

undionly.kpxe _does_ start; otherwise you wouldn't see the iPXE output at the console. But iPXE then fails to DHCP. The most likely cause of this that I can imagine is that you have STP enabled on your network switch access ports, and iPXE is timing out before it gets a DHCP reply. But you can look at the DHCP logs to see if those second-round DHCP requests are getting through in the first place.

Another possibility: are you using host-side vlan tagging? Maybe your network is tagged in your PXE boot rom, but those tags aren't retained in iPXE?

~jonathon


John Hanks

unread,
May 17, 2023, 8:35:49 AM5/17/23
to ware...@lbl.gov, Anderson, Richard O - (ric)
On Wed, May 17, 2023 at 2:00 AM 'Jonathon Anderson' via Warewulf <ware...@lbl.gov> wrote:
undionly.kpxe _does_ start; otherwise you wouldn't see the iPXE output at the console. But iPXE then fails to DHCP. The most likely cause of this that I can imagine is that you have STP enabled on your network switch access ports, and iPXE is timing out before it gets a DHCP reply. But you can look at the DHCP logs to see if those second-round DHCP requests are getting through in the first place

It might be possible to check this from the node console, CTRL-B will interrupt iPXE getting you a shell, from there you can repeatedly have it try to configure an interface and if STP is the issue, after 30 seconds or so of trying the link should come up.  Switch config or logs would be the ideal place to check, but just brute force retrying will eventually wait out the STP timeout. 

iPXE command line use described here: https://ipxe.org/cmdline

griznog

Fabio I. Zyserman

unread,
May 17, 2023, 9:47:41 AM5/17/23
to Warewulf, John Hanks, Anderson, Richard O - (ric), ware...@lbl.gob
Thanks again for your guidance. I'll check and tell you what comes out.

Cheers,

Fabio

John Hearns

unread,
May 17, 2023, 10:43:14 AM5/17/23
to ware...@lbl.gov, John Hanks, Anderson, Richard O - (ric), ware...@lbl.gob
Assuming you have a CAT5(6) network port on your head node you could do a quick and dirty test by connecting a cable direct from head node to sample compute node.
Just saying.
I surprise myself by saying I have never tried this myself....
What network port types are you using?

--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.

Fabio I. Zyserman

unread,
May 23, 2023, 1:14:14 PM5/23/23
to Warewulf, John Hearns, John Hanks, Anderson, Richard O - (ric), ware...@lbl.gob
Hi all,

I have made a -small- progress.

As John Hearns suggested, I connected directly the network ports of the headnode and the computing node, and there is
effectively a difference in behaviour. Bypassing the switch things move a little forward, but get stuck with the
following step, that is, with the execution of the script

#!ipxe
# Configuration for Warewulf node: geof1
# Warewulf data store ID: 12
echo Now booting geof1 with Warewulf bootstrap (5.14.21-150400.24.63-default)

set base http://192.168.0.1/WW/bootstrap
initrd ${base}/x86_64/5/initfs.gz
kernel ${base}/x86_64/5/kernel ro initrd=initfs.gz wwhostname=geof1 net.ifnames=0 biosdevname=0 wwmaster=192.168.0.1 wwipaddr=192.168.0.11 wwnetmask=255.255.255.0 wwnetdev=eth1 wwhwaddr=f0:4d:a2:01:1b:89
boot

-The directory /srv/warewulf/bootstrap/x86_64/5 contains the files initfs.gz, cookie and kernel
-apache2 is working ok (http://192.168.0.1 displays "It works!")

-I attach three images, two showing the difference during the booting process with or without the switch, and a third one,
a screenshot of wireshark (in the no-switch situation),  showing the traffic between the head and computing nodes,
which may be helpful to others more able than myself to debug the problem.

Any hint about this point? Of course, I've looked at ipxe.org/4c0a6035, got nothing useful from there.

As a side work,  I'll try to get the switch configured so that it works as it should...
wireshark.jpeg
without_switch.jpeg
with_switch.jpeg

John Hanks

unread,
May 24, 2023, 9:24:58 AM5/24/23
to Fabio I. Zyserman, Warewulf, John Hearns, Anderson, Richard O - (ric), ware...@lbl.gob
Those consoles really look like it just can't reach warewulfd, which I'd normally assume is firewall blocking or warewulfd not running. "Destination unreachable" also smells like a firewall issue. 

The ipxe script line:


also seems wrong, it should include the warewulfd port number. In my default.ipxe the base is

set uri_base http://{{.Ipaddr}}:{{.Port}}/provision/{{.Hwaddr}}?assetkey=${asset}&uuid=${uuid}

griznog

John Hearns

unread,
May 24, 2023, 10:21:59 AM5/24/23
to John Hanks, Fabio I. Zyserman, Warewulf, Anderson, Richard O - (ric), ware...@lbl.gob
Damn stupid question from me...  you are booting these nodes in BIOS mode. Can you change to IPXE mode?
Are you willing to say what hardware the nodes are?

Fabio I. Zyserman

unread,
May 24, 2023, 12:44:28 PM5/24/23
to John Hanks, Warewulf, John Hearns, Anderson, Richard O - (ric), ware...@lbl.gob
Well, the scripts are automatically generated by warewulf, I did nothing by hand.

I noticed that with different distributions/versions, the scripts are not exactly the same, but I preferred not
to modify anything. I am installing openhpc v 2.6, which  requires opensuse leap 15.3 (I downloaded leap 15.4)
.This downloads warewulf 3.9.0 (exaclty the same packages as in 15.3, I checked that) + some version of slurm, which I do not remember now.

The computing nodes are seven Dell PowerEdge R610, the headnode a standard pc. The switch is a dell powerconnect 5424.
Although old, the hardware was working ok, with an extremely outdated software, previously to installing openhpc.

I do not understand what would be booting pxe instead of bios. Do you mean choosing pxe as a first boot option in the computing node, or just
forcing it by hand?

I'll double check the firewall and warewulfd, I am pretty sure that there is no firewall service running, and the installation guide does not
mention a daemon for warewulf. Presently there is no one running.

Many thanks for your help.!!

Cheers,

Fabio

Fabio I. Zyserman

unread,
May 24, 2023, 1:43:48 PM5/24/23
to Warewulf, John Hearns, Anderson, Richard O - (ric), John Hanks

John Hanks

unread,
May 24, 2023, 4:39:11 PM5/24/23
to Fabio I. Zyserman, Warewulf, John Hearns, Anderson, Richard O - (ric)
I'm still a little worried about that ipxe script, if it's generated by warewulf it absolutely should have a port number in it as warewulfd will not be listening on port 80/443. In /etc/warewulf/ipxe the template should be there and have the port specified to be pulled in when the template is rendered for the node. 

griznog

John Hearns

unread,
May 25, 2023, 4:36:29 AM5/25/23
to John Hanks, Fabio I. Zyserman, Warewulf, Anderson, Richard O - (ric)
Fabio, this document may help  https://dl.dell.com/manuals/common/dellemc-boot-mode-bios-uefi.pdf

How are you configuring the BIOS settings ?  By hand ?
We can use racadm commands to alter the boot mode or boot device order.
Also it is possible to template the settings are roll this out - depends on your iDRAC license.

Have you looked at using Dell Omnia? R610 will not be officially supported on Omnia, but I would guess they would work.





John Hearns

unread,
May 25, 2023, 4:57:39 AM5/25/23
to John Hanks, Fabio I. Zyserman, Warewulf, Anderson, Richard O - (ric)
Please see section 4.1 in that manual. You can change the boot mode by using  the console or by using a racadm command.

Do you have your iDRACs configured so that you can remotely use the console?
Or are you using a keyboard/video/mouse in the server room?
R610 should support iDRAC Direct which is pretty funky 

Fabio I. Zyserman

unread,
May 29, 2023, 9:13:02 AM5/29/23
to John Hearns, John Hanks, Warewulf, Anderson, Richard O - (ric)
Hi all,
well, it seems that now I am making  real progress. However, I'm not completely through.

The node is loading the kernel that the headnode is offering, but it is not working ok. See the attached screenshot.
I guess this message means that the appropriate driver for the node network card is not included in the kernel warewulf builds...
I'll find out.

Concerning the previous problems, well, they were two:

1- the switch is not adquately configured, bypassing it allows for a correct communication between headnode and computing node.
2-  The issue of the node starting ipxe but not being able to find the files to boot, I guess it is a not completely consistent configuration of OpenHPC
 of where the files should be. I recursively copied the /srv/warewulf directory into /srv/www/htdocs,  accordingly changed the dhcpd.conf file and the
second line in the script

#!ipxe
# Configuration for Warewulf node: geof1
# Warewulf data store ID: 12
echo Now booting geof1 with Warewulf bootstrap (5.14.21-150400.24.63-default)
set base http://192.168.0.1/WW/bootstrap
initrd ${base}/x86_64/5/initfs.gz
kernel ${base}/x86_64/5/kernel ro initrd=initfs.gz wwhostname=geof1 net.ifnames=0 biosdevname=0 wwmaster=192.168.0.1 wwipaddr=192.168.0.11 wwnetmask=255.255.255.0 wwnetdev=eth1 wwhwaddr=f0:4d:a2:01:1b:89
boot

to point to the new directory, and voila, the node finds the files, and starts loading the kernel, although regretfully it does not yet completely boot.

Cheers,

Fabio
WhatsApp Image 2023-05-29 at 8.35.23 AM.jpeg
Reply all
Reply to author
Forward
0 new messages