DHCP address allocation issue using almalinux controller, vanilla client

Tony Brack

unread,

Apr 4, 2023, 12:00:42 AM4/4/23

to ClusterHAT

Hi Chris,

Sorry to be a PITA again, but I am still working through the RedHat based controller issues and have encountered an intermittent behavior on the client (Raspbian Bullseye), probably based on timing. I have manually set up the bridged connection on the controller, which I will automate and document later.

The client WAS not picking up the IP address via DHCP from the bridge, so I looked up how to configure this in the Debian documentation. I found this, and put it in place. On boot it is picking up the "default" address and I can't see any errors logged.
[ A reboot of the controller was what brought the failed DHCP configuration problem back. ]

Anyhow:

bracka@p99:~ $ cat /etc/systemd/network/dhcp.network
[Match]
Name=usb*

[Network]
DHCP=yes

The I rebooted and got this behavior:

bracka@p99:~ $ ifconfig -a
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 25 bytes 3458 (3.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 25 bytes 3458 (3.3 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.19.181.99 netmask 255.255.255.0 broadcast 172.19.181.255
inet6 fe80::4cf2:15b6:ff74:1232 prefixlen 64 scopeid 0x20<link>
ether 00:22:82:ff:ff:63 txqueuelen 1000 (Ethernet)
RX packets 216 bytes 58034 (56.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 51 bytes 11740 (11.4 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.199 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::bb6f:a3ea:b066:cec6 prefixlen 64 scopeid 0x20<link>
ether b8:27:eb:cf:a5:ec txqueuelen 1000 (Ethernet)
RX packets 177 bytes 52144 (50.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 101 bytes 14901 (14.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

bracka@p99:~ $ sudo ifconfig usb0 down
bracka@p99:~ $ sudo ifconfig usb0 up

bracka@p99:~ $ ifconfig -a usb0
usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.99 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::4cf2:15b6:ff74:1232 prefixlen 64 scopeid 0x20<link>
ether 00:22:82:ff:ff:63 txqueuelen 1000 (Ethernet)
RX packets 820 bytes 205473 (200.6 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 132 bytes 26609 (25.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

The last 3 times I have tried to reboot and repeat the error by getting the defaulted address, DHCP is able to provide the correct address, so I am guessing that there is an intermittent timing issue with the DHCP client. This is a relatively vanilla Pi Zero image going to a DHCP server on the net. The same sort of thing works fine with a Raspbian controller (obviously).

Console log is pretty clean:

[ 33.044454] NFSD: Using UMH upcall client tracking operations.
[ 33.044497] NFSD: starting 90-second grace period (net f0000000)
[ 33.137500] usb0: HOST MAC 00:22:82:ff:fe:63
[ 33.137548] usb0: MAC 00:22:82:ff:ff:63
[ 33.138062] dwc2 20980000.usb: bound driver configfs-gadget.ClusterCTRL
[ 33.170520] dwc2 20980000.usb: new device is high-speed
[ 33.438496] dwc2 20980000.usb: new device is high-speed
[ 33.528495] dwc2 20980000.usb: new device is high-speed
[ 33.594911] dwc2 20980000.usb: new address 13
[ 34.471825] brcmfmac: F1 signature read @0x18000000=0x1541a9a6
[ 34.868022] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
[ 34.869672] usbcore: registered new interface driver brcmfmac
[ 35.026940] Console: switching to colour dummy device 80x30
[ 35.095069] vc4-drm soc:gpu: bound 20400000.hvs (ops vc4_hvs_ops [vc4])
[ 35.153214] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43430/1 wl0: Jul 19 2021 03:24:18 version 7.45.98 (TOB) (56df937 CY) FWID 01-8e14b897
[ 35.165751] Registered IR keymap rc-cec
[ 35.166618] rc rc0: vc4-hdmi as /devices/platform/soc/20902000.hdmi/rc/rc0
[ 35.166939] input: vc4-hdmi as /devices/platform/soc/20902000.hdmi/rc/rc0/input0
[ 35.258987] vc4-drm soc:gpu: bound 20902000.hdmi (ops vc4_hdmi_ops [vc4])
[ 35.266363] vc4-drm soc:gpu: bound 20004000.txp (ops vc4_txp_ops [vc4])
[ 35.271095] vc4-drm soc:gpu: bound 20206000.pixelvalve (ops vc4_crtc_ops [vc4])
[ 35.282137] vc4-drm soc:gpu: bound 20207000.pixelvalve (ops vc4_crtc_ops [vc4])
[ 35.288100] vc4-drm soc:gpu: bound 20807000.pixelvalve (ops vc4_crtc_ops [vc4])
[ 35.305363] vc4-drm soc:gpu: bound 20c00000.v3d (ops vc4_v3d_ops [vc4])
[ 35.392506] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
[ 35.394227] vc4-drm soc:gpu: [drm] Cannot find any crtc or sizes
[ 37.445400] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 37.445442] Bluetooth: BNEP filters: protocol multicast
[ 37.445475] Bluetooth: BNEP socket layer initialized
[ 37.496215] Bluetooth: MGMT ver 1.22
[ 37.590244] NET: Registered PF_ALG protocol family
[ 39.206091] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[ 40.683876] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[ 42.715340] ICMPv6: process `dhcpcd' is using deprecated sysctl (syscall) net.ipv6.neigh.wlan0.retrans_time - use net.ipv6.neigh.wlan0.retrans_time_ms instead
bracka@p99:~$

Controller Interfaces (seem to be working):

[bracka@coruscant ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether 32:cf:2b:6d:e8:e8 brd ff:ff:ff:ff:ff:ff permaddr b8:27:eb:ba:96:50
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:22:82:ff:fe:63 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.198/24 brd 192.168.1.255 scope global dynamic noprefixroute br0
valid_lft 6023sec preferred_lft 6023sec
inet6 fe80::33d7:cac7:b14e:bd1f/64 scope link noprefixroute
valid_lft forever preferred_lft forever
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP group default qlen 1000
link/ether 00:e0:4c:68:05:9e brd ff:ff:ff:ff:ff:ff
6: ethpi99: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UNKNOWN group default qlen 1000
link/ether 00:22:82:ff:fe:63 brd ff:ff:ff:ff:ff:ff

Thanks,
Tony

Message has been deleted

Chris Burton

unread,

Apr 4, 2023, 3:49:18 PM4/4/23

to ClusterHAT

Hi,

bracka@p99:~ $ cat /etc/systemd/network/dhcp.network
[Match]
Name=usb*

[Network]
DHCP=yes

I don't know about the setup above but on the ClusterCTRL images you can increase the "reboot 15" line in /etc/dhcpcd.conf which will increase the time before it uses the fallback IP. If you're setting up the bridge manually you might not be doing it quick enough so it uses the fallback IP (once it uses the fallback IP dhcpcd won't try again until the service is restarted).

If you're not using dhcpcd maybe what you're using has a way of specifying how long it should try to get an IP using DHCP before using a fallback IP.

Chris.

Tony Brack

unread,

Apr 6, 2023, 3:04:48 PM4/6/23

to ClusterHAT

Hi Chris,

I set what I am assuming is a timeout to 60 (which should be massive overkill) and I am now reliably getting the desired DHCP provided configuration out of an otherwise vanilla Raspbian client system (thanks). Yes: I suppose I can tweak it down, but it is working reliably, so why mess with it?

Quick progress report:

1. On the RedHat controller, I used the instructions from the other thread and then all I had to do was set up the bridge using nmtui/nmcui and I have a fully functional cluster. I'm not sure the internal network (brint) is necessary since all of the hosts, even bridged, are inherently adjacent anyway.

2. The client, you already know about: working fine.

I can reverse engineer the gadget configuration on the client, but I am curious ... Is there a place I can look for docs on the "internals" on how it is configured? I don't think the usual g_serial, g_ether or g_multi is as functionally clean as your implementation. I am still plugging away with using NetworkManager exclusively, but it is no longer as urgent.

Again, thanks for the help!

Tony

Chris Burton

unread,

Apr 10, 2023, 4:39:23 AM4/10/23

to ClusterHAT

Hi,

1. On the RedHat controller, I used the instructions from the other thread and then all I had to do was set up the bridge using nmtui/nmcui and I have a fully functional cluster. I'm not sure the internal network (brint) is necessary since all of the hosts, even bridged, are inherently adjacent anyway.

The brint is only used to bridge to a VLAN interface on ethpiX when using USB boot (booting Pi Zeros without SD cards) for the NFS server which serves the root filesystems as I wanted to keep this internal to the Controller Pi and the Pi Zeros directly connected to it.

I can reverse engineer the gadget configuration on the client, but I am curious ... Is there a place I can look for docs on the "internals" on how it is configured? I don't think the usual g_serial, g_ether or g_multi is as functionally clean as your implementation. I am still plugging away with using NetworkManager exclusively, but it is no longer as urgent.

g_ether/g_multi was classed as deprecated last I looked which is why I went with the libcomposite way of configuring it.

https://openwrt.org/docs/guide-user/hardware/usb_gadget looks to have fairly good docs on configfs setup. which will look similar to what's in https://github.com/burtyb/clusterhat-image/blob/master/files/usr/sbin/composite-clusterctrl

I chose RNDIS as I wanted to be able to just plug a zero running a Px image into a windows machine and have it work there and I also needed to be able to set the MAC address on both sides so I could work out which device it was when it pops up as a random usbX interface to rename it to the right ethpiX based on the Px number.

Chris.

Tony Brack

unread,

Apr 28, 2023, 4:41:13 PM4/28/23

to ClusterHAT

Thanks Chris!

- much appreciated

FYI:

It all works great, except for the fact that the controller falls flat on its face when I start up the 4th or sometimes 3rd Pi Zero. I am guessing it's because I used a Pi3A+ and I'm sucking up all of memory (... or it may be a CPU issue since this matches the number of available CPUs). I tried monitoring, but it wedged before I could determine anything and powering off and cold starting was the only remedy.

Looking for an alternative board with more memory (or CPUs) that is compatible. I may hijack a Pi4B from another project for testing though, but I can't keep working around the supply chain crap. (sigh) and NOT paying $300+ either.

Peter Cross

unread,

Apr 28, 2023, 4:51:09 PM4/28/23

to clust...@googlegroups.com

to your issues speaking around the supply chain...I've had luck buying 5 raspberry pi's in the last 3 months using this website: https://rpilocator.com/.

You have to be religious about checking it. I've gotten a 4B 4Gb, 3B, 3B+, 3A, and a Zero Pi 2 W.

Cheers!

Peter J. Cross
San Antonio, TX

"Experience has taught mankind the necessity of auxiliary precautions"
-James Madison, Federalist Paper No. 51

Please consider the environment before printing this email

--
You received this message because you are subscribed to the Google Groups "ClusterHAT" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clusterhat+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/clusterhat/eb05e73e-e6f0-49d6-aa72-504b683bb9een%40googlegroups.com.

Chris Burton

unread,

Apr 30, 2023, 10:18:15 AM4/30/23

to ClusterHAT

Hi,

It all works great, except for the fact that the controller falls flat on its face when I start up the 4th or sometimes 3rd Pi Zero. I am guessing it's because I used a Pi3A+ and I'm sucking up all of memory (... or it may be a CPU issue since this matches the number of available CPUs). I tried monitoring, but it wedged before I could determine anything and powering off and cold starting was the only remedy.

There shouldn't be a problem using a Pi3A+ as the controller (it's what's used in the ClusterCTRL A+6 which has 5 node Pi connected to a controller Pi3A+).

My first guess would be power - after turning a few nodes on I'd advise checking the output of "vcgencmd get_throttled" is still 0x0 and watch the power LED on the controller Pi to see if there's any flickering which would also indicate a power issue.

Chris.

Tony Brack

unread,

Apr 30, 2023, 1:10:49 PM4/30/23

to ClusterHAT

Hi Chris,

Thanks for the advice. I don't think power is the issue, but I'll give your suggestion(s) a stab. The first thing I am going to do is to create a standard 64 bit Raspbian image and try running with that to see if everything comes up properly. I can also take the almalinux image and put it in an existing Pi4 cluster controller temporarily to see if it experiences the same problems. So far I'm not seeing anything physically unusual when it wedges.

-- -- --

Thanks Peter:
I have been using rpilocator. I have seen a few Pi3bs and very few Pi4s, but getting them is another story. Everyone either flat out will not ship to Canada or charges ridiculous fees from the States (UK seems OK). The usual suppliers are only able to come up with 512MB boards of various types, which I have plenty of. Banana Pi Zeros are readily available and are at least as reliable as Raspberry Zero 2Ws. I have lots of real Pi Zeros anyway, but am short one or two Pi3b+ boards (ideally). I may try a Banana Pi, Orange Pi or RADXA board and sacrifice some software support for a more capable board (trying to keep the same form factor and HAT support).

Thanks,

Tony

Tony Brack

unread,

May 1, 2023, 1:50:38 AM5/1/23

to ClusterHAT

Hi Again Chris,

I built a microSD for Raspbian Bullseye to test this cluster with and it seems to be working fine. I had concerns about the USB Hub/Ethernet, but it seems to be OK after having needed to be bounced once.

bracka@coruscant:/usr/local/etc $ vcgencmd get_throttled
throttled=0x0

Tony Brack

unread,

May 3, 2023, 12:36:21 PM5/3/23

to ClusterHAT

Hi Again,

I think I have found it. Enabling logins on the console port seems to have caused a loop with regards to echo between the controller and the client machines which eventually brings the whole shooting match down. Disabling echo/typeahead on the console ports on the the terminals has made the configuration work again.

I'm eventually moving on to using physical UARTs hanging off the GPIO (USB Console Stub or equivalent) as consoles (using right angled headers) instead as this is a cleaner solution and logs startup problems more cleanly.