Crash on usbboot

65 views
Skip to first unread message

Sebastian Gsänger

unread,
Jan 13, 2025, 6:52:46 PMJan 13
to ClusterHAT
Hi all,

i've recently set up a ClusterHat v2.5 system with a RPi4 host, 2 PiZeros and 2 PiZero 2s with usbboot to bookworm (current 2024-07-04 images) on all nodes.

On the first try, everything worked fine, the zeros booted and were accessible via SSH.
Since the second try however, i can't get _any_ of the nodes to usb boot anymore.

I've noticed multiple issues:
- Serial output on the modified Zero2 shows a kernel panic:
[   32.881810] skbuff: skb_over_panic: text:ffffffe9127cbe7c len:-522752 put:-522752 head:ffffff8002e70c00 data:ffffff8002e70c40 tail:0xfff80640 end:0x640 dev:usb0
[   32.896552] kernel BUG at net/core/skbuff.c:192!
[   32.901258] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[   32.908166] Modules linked in: cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep brcmfmac_wcc vc4 snd_soc_hdmi_codec drm_display_helper cec drm_dma_helper drm_kms_helper brcmfmac brcmutil snd_soc_core cfg80211 hci_uart snd_compress btbcm snd_pcm_dmaengine bluetootho
[   32.990421] CPU: 2 PID: 615 Comm: dhclient-script Tainted: G         C         6.6.31+rpt-rpi-v8 #1  Debian 1:6.6.31-1+rpt1
[   33.001747] Hardware name: Raspberry Pi Zero 2 W Rev 1.0 (DT)
[   33.007590] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   33.014675] pc : skb_panic+0x54/0x60
[   33.018329] lr : skb_panic+0x54/0x60
[   33.021971] sp : ffffffc080013cf0
[   33.025343] x29: ffffffc080013d00 x28: ffffff8002589080 x27: 0000000000000000
[   33.032619] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000801000
[   33.039893] x23: ffffff8003466990 x22: ffffff8002c42c80 x21: ffffff8003466940
[   33.047167] x20: ffffff8002940e00 x19: 0000000000000000 x18: 0000000000000006
[   33.054442] x17: ffffff96f2fbe000 x16: ffffffe927ccebe8 x15: ffffffc080013760
[   33.061715] x14: 0000000000000003 x13: 666666663a747865 x12: 74203a63696e6170
[   33.068989] x11: 7265766f5f626b73 x10: ffffffe9288a3710 x9 : ffffffe9273ff768
[   33.076262] x8 : 00000000ffffefff x7 : ffffffe9288a3710 x6 : 80000000fffff000
[   33.083537] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   33.090809] x2 : 0000000000000000 x1 : ffffff8003068000 x0 : 0000000000000094
[   33.098083] Call trace:
[   33.100573]  skb_panic+0x54/0x60
[   33.103864]  skb_put+0x74/0x80
[   33.106979]  rx_complete+0xec/0x270 [u_ether]
[   33.111446]  usb_gadget_giveback_request+0x34/0xe8
[   33.116328]  dwc2_hsotg_complete_request+0x88/0x178 [dwc2]
[   33.121962]  dwc2_hsotg_handle_outdone+0xc4/0x1d8 [dwc2]
[   33.127415]  dwc2_hsotg_epint+0x9ac/0xe90 [dwc2]
[   33.132160]  dwc2_hsotg_irq+0x8f0/0xea8 [dwc2]
[   33.136729]  __handle_irq_event_percpu+0x60/0x230
[   33.141529]  handle_irq_event+0x54/0xc0
[   33.145442]  handle_level_irq+0xc8/0x1b0
[   33.149442]  generic_handle_domain_irq+0x34/0x58
[   33.154149]  bcm2836_chained_handle_irq+0x30/0x58
[   33.158948]  generic_handle_domain_irq+0x34/0x58
[   33.163655]  bcm2836_arm_irqchip_handle_irq+0x64/0x80
[   33.168800]  call_on_irq_stack+0x24/0x58
[   33.172800]  do_interrupt_handler+0x88/0x98
[   33.177065]  el1_interrupt+0x34/0x68
[   33.180713]  el1h_64_irq_handler+0x18/0x28
[   33.184890]  el1h_64_irq+0x64/0x68
[   33.188355]  finish_task_switch.isra.0+0x7c/0x258
[   33.193151]  __schedule+0x380/0xd60
[   33.196705]  schedule+0x64/0x108
[   33.199993]  do_wait+0x15c/0x2f8
[   33.203283]  kernel_wait4+0xa8/0x198
[   33.206925]  __do_sys_wait4+0xe8/0x108
[   33.210744]  __arm64_sys_wait4+0x2c/0x40
[   33.214739]  invoke_syscall+0x50/0x128
[   33.218564]  el0_svc_common.constprop.0+0x48/0xf0
[   33.223359]  do_el0_svc+0x24/0x38
[   33.226740]  el0_svc+0x40/0xe8
[   33.229856]  el0t_64_sync_handler+0x100/0x130
[   33.234297]  el0t_64_sync+0x190/0x198
[   33.238035] Code: 29572107 a90027e8 91346000 97d8dd96 (d4210000)
[   33.244244] ---[ end trace 0000000000000000 ]---
[   33.248954] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[   33.256479] SMP: stopping secondary CPUs
[   33.260478] Kernel Offset: 0x28a7200000 from 0xffffffc080000000
[   33.266498] PHYS_OFFSET: 0x0
[   33.269427] CPU features: 0x0,0000000d,00020000,0000421b
[   33.274832] Memory Limit: none
[   33.277945] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]---

- Occasionally, a netdev watchdog is triggered on the host:
Jan 10 18:24:27 cbridge kernel: ------------[ cut here ]------------
Jan 10 18:24:27 cbridge kernel: NETDEV WATCHDOG: ethupi2 (rndis_host): transmit queue 0 timed out 5572 ms
Jan 10 18:24:27 cbridge kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x2a8/0x2b8
Jan 10 18:24:27 cbridge kernel: Modules linked in: 8021q garp rndis_wlan rndis_host cdc_ether cdc_acm bridge stp llc cmac algif_hash aes_arm64 aes_generic algif_skcipher af_alg bnep nft_chain_nat xt_MASQUERADE vc4 nf_nat xt_conntrack brcmfmac_wcc nf_conntrack hci_uart snd_soc_hdmi_c>
Jan 10 18:24:27 cbridge kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G         C         6.6.31+rpt-rpi-v8 #1  Debian 1:6.6.31-1+rpt1
Jan 10 18:24:27 cbridge kernel: Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
Jan 10 18:24:27 cbridge kernel: pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Jan 10 18:24:27 cbridge kernel: pc : dev_watchdog+0x2a8/0x2b8
Jan 10 18:24:27 cbridge kernel: lr : dev_watchdog+0x2a8/0x2b8
Jan 10 18:24:27 cbridge kernel: sp : ffffffc080013db0
Jan 10 18:24:27 cbridge kernel: x29: ffffffc080013db0 x28: ffffffec5c758b18 x27: ffffffc080013ee0
Jan 10 18:24:27 cbridge kernel: x26: ffffffec5ce94008 x25: 00000000000015c4 x24: ffffffec5d226000
Jan 10 18:24:27 cbridge kernel: x23: 0000000000000000 x22: ffffff80444483dc x21: ffffff8044448000
Jan 10 18:24:27 cbridge kernel: x20: ffffff8043977400 x19: ffffff8044448488 x18: ffffffffffffffff
Jan 10 18:24:27 cbridge kernel: x17: 756f2064656d6974 x16: 2030206575657571 x15: 2074696d736e6172
Jan 10 18:24:27 cbridge kernel: x14: 74203a2974736f68 x13: 736d203237353520 x12: 74756f2064656d69
Jan 10 18:24:27 cbridge kernel: x11: 7420302065756575 x10: ffffffec5d2a3710 x9 : ffffffec5bd1da8c
Jan 10 18:24:27 cbridge kernel: x8 : 00000000ffffefff x7 : ffffffec5d2a3710 x6 : 80000000fffff000
Jan 10 18:24:27 cbridge kernel: x5 : 0000000000000000 x4 : 0000000000000040 x3 : 0000000000000004
Jan 10 18:24:27 cbridge kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80402bdc40
Jan 10 18:24:27 cbridge kernel: Call trace:
Jan 10 18:24:27 cbridge kernel:  dev_watchdog+0x2a8/0x2b8
Jan 10 18:24:27 cbridge kernel:  call_timer_fn+0x3c/0x1c8
Jan 10 18:24:27 cbridge kernel:  __run_timers+0x25c/0x330
Jan 10 18:24:27 cbridge kernel:  run_timer_softirq+0x28/0x50
Jan 10 18:24:27 cbridge kernel:  __do_softirq+0x118/0x384
Jan 10 18:24:27 cbridge kernel:  ____do_softirq+0x18/0x30
Jan 10 18:24:27 cbridge kernel:  call_on_irq_stack+0x24/0x58
Jan 10 18:24:27 cbridge kernel:  do_softirq_own_stack+0x24/0x38
Jan 10 18:24:27 cbridge kernel:  irq_exit_rcu+0x8c/0xd0
Jan 10 18:24:27 cbridge kernel:  el1_interrupt+0x38/0x68
Jan 10 18:24:27 cbridge kernel:  el1h_64_irq_handler+0x18/0x28
Jan 10 18:24:27 cbridge kernel:  el1h_64_irq+0x64/0x68
Jan 10 18:24:27 cbridge kernel:  default_idle_call+0x5c/0x170
Jan 10 18:24:27 cbridge kernel:  do_idle+0x204/0x238
Jan 10 18:24:27 cbridge kernel:  cpu_startup_entry+0x3c/0x50
Jan 10 18:24:27 cbridge kernel:  secondary_start_kernel+0x128/0x150
Jan 10 18:24:27 cbridge kernel:  __secondary_switched+0xb8/0xc0
Jan 10 18:24:27 cbridge kernel: ---[ end trace 0000000000000000 ]---

- The usb-serial connection does not give any output on any of the nodes, maybe because it just comes up immediately before a potential kernel panic?

As standalone devices, the zeros seem to boot just fine.
Does anyone have an idea what i could have messed up?

Thanks,
Sebastian

Chris Burton

unread,
Jan 25, 2025, 10:15:33 AMJan 25
to ClusterHAT
Hi, 
i've recently set up a ClusterHat v2.5 system with a RPi4 host, 2 PiZeros and 2 PiZero 2s with usbboot to bookworm (current 2024-07-04 images) on all nodes.

Did you run any updates on the host before it working and not working?

Do you have modemmanager installed/running as this has somehow caused this error previously ( "systemctl status ModemManager" )?

Are you running the desktop(full/std) or lite version and do you have anything you've manually installed which might be interfering with the serial ports?

Chris. 
Reply all
Reply to author
Forward
0 new messages