diskless clusterctrl images freeze

87 views
Skip to first unread message

Peter Saveliev

unread,
May 16, 2020, 11:57:08 AM5/16/20
to ClusterHAT
Hello!

I've got a strange issue running a diskless clusterhat setup. The system randomly freezes on diskless blades. The kernel continues to run, all the running processes too. Any disk read from not cached files hangs, and sometimes recovers. Sometimes not.

Hardware:

* pi4
* clusterhat v2.4
* pi zero v1.3
* pi zero w v1.1

Software:


No additional software is used (yet), only what's on the images.

The blades are diskless, no SD card are used. No NFS errors are collected. The issue occurs every now and then, like from 1 minute uptime to 30 minutes, randomly. 

I will be very thankful for any tips.

Trace samples (on blades):

[  870.119809] mmc0: timeout waiting for hardware interrupt.
[  870.119833] [2a99a4c3] CMD  37 0
[  870.119842] [2a99a4c3] REQ> d8963d38 0
[  870.119848] [2a99a4cf] TSK< d8963d38 0
[  870.119853] [2a99a4d6] TSK> d8963d38 0
[  870.119859] [2a99a4e9] REQ< d8963d38 10800
[  870.119864] [2a99a4ea] CMD< 37 0
[  870.119869] [2a99a4ef] FCM< d8963d38 d8963d98
[  870.119874] [2a99aafc] CMD  37 0
[  870.119879] [2a99aafc] REQ> d8963d38 0
[  870.119884] [2a99ab09] TSK< d8963d38 0
[  870.119890] [2a99ab0f] TSK> d8963d38 0
[  870.119895] [2a99ab1e] IOS< 30d40 0
[  870.119901] [2a99ab25] REQ< d8963e00 10800
[  870.119906] [2a99ab26] CMD< 1 0
[  870.119911] [2a99ab2a] FCM< d8963e00 d8963e60
[  870.119916] [2a99b139] CMD  1 0
[  870.119921] [2a99b13a] REQ> d8963e00 0
[  870.119927] [2a99b146] TSK< d8963e00 0
[  870.119932] [2a99b14e] TSK> d8963e00 0
[  870.119937] [2a99b15e] IOS< 0 0
[  870.119942] [2a99b87a] IOS< 0 0
[  870.119947] [2a99eb82] IOS< 1dd11 0
[  870.119952] [2a9a1eac] RST< 0 0
[  870.119957] [2a9a6cd9] REQ< d8963e08 10800
[  870.119963] [2a9a6cda] CMD< 34 c00
[  870.119968] [2a9a6ce1] FCM< d8963e08 d8963e68
[  870.119973] [2a9a76aa] CMD  34 0
[  870.119978] [2a9a76ab] REQ> d8963e08 0
[  870.119984] [2a9a76bb] TSK< d8963e08 0
[  870.119989] [2a9a76c4] TSK> d8963e08 0
[  870.119994] [2a9a76db] REQ< d8963e08 10800
[  870.119999] [2a9a76dc] CMD< 34 80000c08
[  870.120005] [2a9a76e3] FCM< d8963e08 d8963e68
[  870.120010] [2a9a80ca] CMD  34 0
[  870.120015] [2a9a80cb] REQ> d8963e08 0
[  870.120020] [2a9a80d7] TSK< d8963e08 0
[  870.120026] [2a9a80de] TSK> d8963e08 0
[  870.120031] [2a9a80ed] IOS< 1dd11 0
[  870.120036] [2a9a880d] REQ< d8963e30 10800
[  870.120041] [2a9a880e] CMD< 0 0
[  870.120047] [2a9a8815] FCM< d8963e30 d8963e90
[  870.120052] [2a9a8a23] FCM> d8963e30 0
[  870.120057] [2a9a8a24] CMD  0 0
[  870.120062] [2a9a8a24] REQ> d8963e30 0
[  870.120067] [2a9a8a34] TSK< d8963e30 0
[  870.120072] [2a9a8a3c] TSK> d8963e30 0
[  870.120077] [2a9a8f4d] IOS< 1dd11 0
[  870.120083] [2a9a9452] REQ< d8963e30 10800
[  870.120088] [2a9a9452] CMD< 8 1aa
[  870.120094] [2a9a9457] FCM< d8963e30 d8963e90
[  870.120099] [2a9a9e3f] CMD  8 0
[  870.120104] [2a9a9e40] REQ> d8963e30 0
[  870.120109] [2a9a9e4e] TSK< d8963e30 0
[  870.120114] [2a9a9e55] TSK> d8963e30 0
[  870.120120] [2a9a9e6a] REQ< d8963dd8 10800
[  870.120125] [2a9a9e6b] CMD< 5 0
[  870.120130] [2a9a9e70] FCM< d8963dd8 d8963e38
[  870.120135] [2a9aa85c] CMD  5 0
[  870.120140] [2a9aa85c] REQ> d8963dd8 0
[  870.120145] [2a9aa86a] TSK< d8963dd8 0
[  870.120151] [2a9aa86e] TSK> d8963dd8 0
[  870.120157] [2a9aa87c] REQ< d8963dd8 10800
[  870.120162] [2a9aa87d] CMD< 5 0
[  870.120167] [2a9aa882] FCM< d8963dd8 d8963e38
[  870.120172] [2a9ab258] CMD  5 0
[  870.120178] [2a9ab258] REQ> d8963dd8 0
[  870.120183] [2a9ab266] TSK< d8963dd8 0
[  870.120188] [2a9ab26c] TSK> d8963dd8 0
[  870.120193] [2a9ab27b] REQ< d8963dd8 10800
[  870.120198] [2a9ab27c] CMD< 5 0
[  870.120203] [2a9ab281] FCM< d8963dd8 d8963e38
[  870.120208] [2a9abc69] CMD  5 0
[  870.120213] [2a9abc69] REQ> d8963dd8 0
[  870.120219] [2a9abc74] TSK< d8963dd8 0
[  870.120224] [2a9abc78] TSK> d8963dd8 0
[  870.120230] [2a9abc84] REQ< d8963dd8 10800
[  870.120235] [2a9abc85] CMD< 5 0
[  870.120240] [2a9abc89] FCM< d8963dd8 d8963e38
[  870.120245] [2a9ac64d] CMD  5 0
[  870.120250] [2a9ac64d] REQ> d8963dd8 0
[  870.120255] [2a9ac657] TSK< d8963dd8 0
[  870.120261] [2a9ac661] TSK> d8963dd8 0
[  870.120266] [2a9ac676] REQ< d8963d38 10800
[  870.120271] [2a9ac677] CMD< 37 0
[  870.120276] [2a9ac67b] FCM< d8963d38 d8963d98
[  870.120282] [2a9ad060] CMD  37 0
[  870.120287] [2a9ad060] REQ> d8963d38 0
[  870.120292] [2a9ad06c] TSK< d8963d38 0
[  870.120297] [2a9ad074] TSK> d8963d38 0
[  870.120303] [2a9ad085] REQ< d8963d38 10800
[  870.120308] [2a9ad086] CMD< 37 0
[  870.120313] [2a9ad08a] FCM< d8963d38 d8963d98
[  870.120318] [2a9ada61] CMD  37 0
[  870.120324] [2a9ada62] REQ> d8963d38 0
[  870.120329] [2a9ada6d] TSK< d8963d38 0
[  870.120334] [2a9ada73] TSK> d8963d38 0
[  870.120339] [2a9ada84] REQ< d8963d38 10800
[  870.120344] [2a9ada84] CMD< 37 0
[  870.120350] [2a9ada88] FCM< d8963d38 d8963d98
[  870.120355] [2a9ae454] CMD  37 0
[  870.120360] [2a9ae455] REQ> d8963d38 0
[  870.120365] [2a9ae460] TSK< d8963d38 0
[  870.120370] [2a9ae466] TSK> d8963d38 0
[  870.120376] [2a9ae477] REQ< d8963d38 10800
[  870.120381] [2a9ae478] CMD< 37 0
[  870.120386] [2a9ae47c] FCM< d8963d38 d8963d98
[  870.120391] [2a9aee86] CMD  37 0
[  870.120397] [2a9aee86] REQ> d8963d38 0
[  870.120402] [2a9aee91] TSK< d8963d38 0
[  870.120407] [2a9aee98] TSK> d8963d38 0
[  870.120412] [2a9aeea5] IOS< 1dd11 0
[  870.120418] [2a9aeeac] REQ< d8963e00 10800
[  870.120422] [2a9aeead] CMD< 1 0
[  870.120428] [2a9aeeb1] FCM< d8963e00 d8963e60
[  870.120433] [2a9af88d] CMD  1 0
[  870.120438] [2a9af88d] REQ> d8963e00 0
[  870.120443] [2a9af898] TSK< d8963e00 0
[  870.120449] [2a9af89e] TSK> d8963e00 0
[  870.120453] [2a9af8ae] IOS< 0 0
[  870.120459] [2aab4968] IOS< 0 0
[  870.120463] [2aab7c6a] IOS< 61a80 0
[  870.120469] [2aabaf78] RST< 0 0
[  870.120474] [2aabfda2] REQ< d8963e08 10800
[  870.120480] [2aabfda4] CMD< 34 c00
[  870.120485] [2aabfdab] FCM< d8963e08 d8963e68
[  870.120490] [2aac0772] CMD  34 0
[  870.120495] [2aac0773] REQ> d8963e08 0
[  870.120501] [2aac0785] TSK< d8963e08 0
[  870.120506] [2aac078e] TSK> d8963e08 0
[  870.120511] [2aac07a6] REQ< d8963e08 10800
[  870.120516] [2aac07a7] CMD< 34 80000c08
[  870.120522] [2aac07ac] FCM< d8963e08 d8963e68
[  870.120527] [2aac118a] CMD  34 0
[  870.120532] [2aac118a] REQ> d8963e08 0
[  870.120538] [2aac119a] TSK< d8963e08 0
[  870.120543] [2aac11a2] TSK> d8963e08 0
[  870.120548] [2aac11b1] IOS< 61a80 0
[  870.120553] [2aac18d3] REQ< d8963e30 10800
[  870.120558] [2aac18d4] CMD< 0 0
[  870.120564] [2aac18db] FCM< d8963e30 d8963e90
[  870.120569] [2aac1978] FCM> d8963e30 0
[  870.120574] [2aac1978] CMD  0 0
[  870.120579] [2aac1979] REQ> d8963e30 0
[  870.120585] [2aac1989] TSK< d8963e30 0
[  870.120590] [2aac1990] TSK> d8963e30 0
[  870.120595] [2aac1eaa] IOS< 61a80 0
[  870.120600] [2aac23bc] REQ< d8963e30 10800
[  870.120606] [2aac23bc] CMD< 8 1aa
[  870.120611] [2aac23c1] FCM< d8963e30 d8963e90
[  870.120616] [2aac2714] CMD  8 0
[  870.120621] [2aac2714] REQ> d8963e30 0
[  870.120626] [2aac2721] TSK< d8963e30 0
[  870.120631] [2aac2729] TSK> d8963e30 0
[  870.120637] [2aac273d] REQ< d8963dd8 10800
[  870.120642] [2aac273e] CMD< 5 0
[  870.120647] [2aac2743] FCM< d8963dd8 d8963e38
[  870.120652] [2aac2a43] CMD  5 0
[  870.120657] [2aac2a43] REQ> d8963dd8 0
[  870.120663] [2aac2a4f] TSK< d8963dd8 0
[  870.120669] [2aac2a54] TSK> d8963dd8 0
[  870.120674] [2aac2a63] REQ< d8963dd8 10800
[  870.120679] [2aac2a64] CMD< 5 0
[  870.120684] [2aac2a68] FCM< d8963dd8 d8963e38
[  870.120689] [2aac2d92] CMD  5 0
[  870.120694] [2aac2d92] REQ> d8963dd8 0
[  870.120700] [2aac2d9f] TSK< d8963dd8 0
[  870.120705] [2aac2da4] TSK> d8963dd8 0
[  870.120710] [2aac2db0] REQ< d8963dd8 10800
[  870.120715] [2aac2db1] CMD< 5 0
[  870.120721] [2aac2db5] FCM< d8963dd8 d8963e38
[  870.120726] [2aac30de] CMD  5 0
[  870.120731] [2aac30de] REQ> d8963dd8 0
[  870.120736] [2aac30ea] TSK< d8963dd8 0
[  870.120741] [2aac30ef] TSK> d8963dd8 0
[  870.120746] [2aac30fc] REQ< d8963dd8 10800
[  870.120751] [2aac30fd] CMD< 5 0
[  870.120757] [2aac3101] FCM< d8963dd8 d8963e38
[  870.120762] [2aac340b] CMD  5 0
[  870.120767] [2aac340c] REQ> d8963dd8 0
[  870.120772] [2aac3419] TSK< d8963dd8 0
[  870.120777] [2aac3421] TSK> d8963dd8 0
[  870.120783] [2aac3438] REQ< d8963d38 10800
[  870.120788] [2aac3438] CMD< 37 0
[  870.120794] [2aac343d] FCM< d8963d38 d8963d98
[  870.120799] [2aac3761] CMD  37 0
[  870.120804] [2aac3761] REQ> d8963d38 0
[  870.120809] [2aac376e] TSK< d8963d38 0
[  870.120814] [2aac3775] TSK> d8963d38 0
[  870.120820] [2aac3789] REQ< d8963d38 10800
[  870.120825] [2aac3789] CMD< 37 0
[  870.120830] [2aac378e] FCM< d8963d38 d8963d98
[  870.120835] [2aac3aac] CMD  37 0
[  870.120840] [2aac3aac] REQ> d8963d38 0
[  870.120845] [2aac3ab7] TSK< d8963d38 0
[  870.120851] [2aac3abe] TSK> d8963d38 0
[  870.120856] [2aac3ad0] REQ< d8963d38 10800
[  870.120861] [2aac3ad1] CMD< 37 0
[  870.120867] [2aac3ad6] FCM< d8963d38 d8963d98
[  870.120872] [2aac3df7] CMD  37 0
[  870.120877] [2aac3df7] REQ> d8963d38 0
[  870.120882] [2aac3e03] TSK< d8963d38 0
[  870.120887] [2aac3e0a] TSK> d8963d38 0
[  870.120893] [2aac3e1d] REQ< d8963d38 10800
[  870.120898] [2aac3e1d] CMD< 37 0
[  870.120903] [2aac3e22] FCM< d8963d38 d8963d98
[  870.120908] [2aac4140] CMD  37 0
[  870.120913] [2aac4140] REQ> d8963d38 0
[  870.120919] [2aac414c] TSK< d8963d38 0
[  870.120924] [2aac4153] TSK> d8963d38 0
[  870.120930] [2aac4162] IOS< 61a80 0
[  870.120935] [2aac4169] REQ< d8963e00 10800
[  870.120940] [2aac416a] CMD< 1 0
[  870.120945] [2aac416f] FCM< d8963e00 d8963e60
[  870.120950] [2aac448f] CMD  1 0
[  870.120955] [2aac448f] REQ> d8963e00 0
[  870.120961] [2aac449b] TSK< d8963e00 0
[  870.120966] [2aac44a2] TSK> d8963e00 0
[  870.120971] [2aac44b2] IOS< 0 0
[  870.120976] [2aac4bca] IOS< 0 0
[  870.120982] [2aac7f01] IOS< 493e0 0
[  870.120987] [2aaca8f7] RST< 0 0
[  870.120992] [2aacf724] REQ< d8963e08 10800
[  870.120997] [2aacf725] CMD< 34 c00
[  870.121002] [2aacf72a] FCM< d8963e08 d8963e68
[  870.121007] [2aad00ff] CMD  34 0
[  870.121013] [2aad00ff] REQ> d8963e08 0
[  870.121018] [2aad010f] TSK< d8963e08 0
[  870.121023] [2aad0119] TSK> d8963e08 0
[  870.121028] [2aad012c] REQ< d8963e08 10800
[  870.121034] [2aad012d] CMD< 34 80000c08
[  870.121039] [2aad0132] FCM< d8963e08 d8963e68
[  870.121044] [2aad0b12] CMD  34 0
[  870.121050] [2aad0b13] REQ> d8963e08 0
[  870.121055] [2aad0b1f] TSK< d8963e08 0
[  870.121060] [2aad0b27] TSK> d8963e08 0
[  870.121065] [2aad0b35] IOS< 493e0 0
[  870.121070] [2aad123c] REQ< d8963e30 10800
[  870.121076] [2aad123d] CMD< 0 0
[  870.121081] [2aad1245] FCM< d8963e30 d8963e90
[  870.121086] [2aad1321] FCM> d8963e30 0
[  870.121091] [2aad1322] CMD  0 0
[  870.121096] [2aad1322] REQ> d8963e30 0
[  870.121101] [2aad132f] TSK< d8963e30 0
[  870.121107] [2aad1337] TSK> d8963e30 0
[  870.121112] [2aad184a] IOS< 493e0 0
[  870.121118] [2aad1d52] REQ< d8963e30 10800
[  870.121123] [2aad1d53] CMD< 8 1aa
[  870.121128] [2aad1d58] FCM< d8963e30 d8963e90
[  870.121133] [2aad2164] CMD  8 0
[  870.121138] [2aad2165] REQ> d8963e30 0
[  870.121143] [2aad2175] TSK< d8963e30 0
[  870.121149] [2aad217f] TSK> d8963e30 0
[  870.121154] [2aad2194] REQ< d8963dd8 10800
[  870.121159] [2aad2196] CMD< 5 0
[  870.121164] [2aad219b] FCM< d8963dd8 d8963e38
[  870.121170] [37844c6b] TIM< 0 0
[  870.121183] mmc0:>cmd op 5 arg 0x0 flags 0x2e1 - resp 00000000 00000000 00000000 00000000, err 0
[  870.121187] mmc0: =========== REGISTER DUMP ===========
[  870.121192] mmc0: SDCMD  0x00004005
[  870.121195] mmc0: SDARG  0x00000000
[  870.121199] mmc0: SDTOUT 0x00024978
[  870.121203] mmc0: SDCDIV 0x00000340
[  870.121207] mmc0: SDRSP0 0xffffffff
[  870.121211] mmc0: SDRSP1 0x0000ff7f
[  870.121214] mmc0: SDRSP2 0xffffffff
[  870.121218] mmc0: SDRSP3 0xffffffff
[  870.121222] mmc0: SDHSTS 0x00000040
[  870.121226] mmc0: SDVDD  0x00000001
[  870.121230] mmc0: SDEDM  0x00010800
[  870.121235] mmc0: SDHCFG 0x0000040a
[  870.121239] mmc0: SDHBCT 0x00000000
[  870.121243] mmc0: SDHBLC 0x00000000
[  870.121246] mmc0: ===========================================

[ 6598.673618] Unable to handle kernel NULL pointer dereference at virtual address 00000020
[ 6598.684956] pgd = c1b4c605
[ 6598.689186] [00000020] *pgd=00000000
[ 6598.694322] Internal error: Oops: 817 [#1] ARM
[ 6598.700367] Modules linked in: cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic raspberrypi_hwmon brcmfmac hwmon brcmutil bcm2835_codec(C) snd_bcm2835(C) bcm2835_v4l2(C) v4l2_mem2mem snd_pcm bcm2835_mmal_vchiq(C) v4l2_common snd_timer videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 snd videobuf2_common videodev media vc_sm_cma(C) sha256_generic cfg80211 rfkill nft_chain_nat_ipv4 ipt_MASQUERADE nf_nat_ipv4 nf_nat nft_counter xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink i2c_dev ip_tables x_tables usb_f_rndis ipv6 usb_f_ecm u_ether usb_f_acm libcomposite u_serial 8021q garp stp llc dwc2 udc_core i2c_bcm2835 fixed uio_pdrv_genirq uio
[ 6598.780482] CPU: 0 PID: 44 Comm: kworker/0:2 Tainted: G         C        4.19.97+ #1294
[ 6598.793119] Hardware name: BCM2835
[ 6598.798824] Workqueue: events_freezable mmc_rescan
[ 6598.805957] PC is at bcm2835_sdhost_finish_command+0x1ec/0x620
[ 6598.814188] LR is at trace_hardirqs_off+0x50/0x124
[ 6598.821362] pc : [<c057889c>]    lr : [<c00d5454>]    psr: 40000093
[ 6598.829984] sp : d8963c30  ip : c0a2741c  fp : d8963c7c
[ 6598.837517] r10: 60000013  r9 : d8954b00  r8 : 00010800
[ 6598.845026] r7 : 00000000  r6 : c0a37900  r5 : d8963ca0  r4 : 00000040
[ 6598.853951] r3 : ffffff92  r2 : 00004037  r1 : 00000020  r0 : 00000000
[ 6598.862828] Flags: nZcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 6598.872468] Control: 00c5387d  Table: 19f88008  DAC: 00000055
[ 6598.880563] Process kworker/0:2 (pid: 44, stack limit = 0xfc8da3fd)
[ 6598.889176] Stack: (0xd8963c30 to 0xd8964000)
[ 6598.895774] 3c20:                                     d8963c7c d8963c40 c0577efc c00778cc
[ 6598.908302] 3c40: a0000093 d8963d38 d8963c7c 70802d62 c0577118 d8954800 d8963d38 c0a27028
[ 6598.920771] 3c60: d8954b00 00010800 c0a27028 00000003 d8963cd4 d8963c80 c0579f14 c05786bc
[ 6598.933222] 3c80: d8963ca4 d8963c90 c057c778 c0405714 c057c6f4 da9df724 d8963cb4 d8963ca8
[ 6598.945824] 3ca0: 60000013 70802d62 d8963cc4 d8954800 d8963d38 d8963d38 00000000 00000000
[ 6598.958674] 3cc0: c0a27028 00000003 d8963cf4 d8963cd8 c0557880 c0579a6c 00000022 d8954800
[ 6598.971679] 3ce0: d8963d38 00000000 d8963d14 d8963cf8 c0558608 c0557814 c0557a80 d8963d38
[ 6598.984827] 3d00: c0a27028 d8954800 d8963d34 d8963d18 c05586a0 c0558580 00000000 d8963d98
[ 6598.998161] 3d20: c0a27028 d8954800 d8963d94 d8963d38 c0558790 c0558644 00000000 d8963d98
[ 6599.011666] 3d40: 00000000 00000000 00000001 d8963d4c d8963d4c 00000000 d8963d58 d8963d58
[ 6599.025290] 3d60: c0557a80 00000000 00000000 00000000 00000000 70802d62 d8954800 c0a27028
[ 6599.039040] 3d80: d8963e60 00000003 d8963de4 d8963d98 c05627bc c0558724 00000037 00000000
[ 6599.052826] 3da0: 00000000 00000000 00000000 00000000 000000f5 00000000 ffffff92 00000000
[ 6599.066560] 3dc0: 00000000 00000000 d8963d38 70802d62 ffffff92 d8954800 d8963e5c d8963de8
[ 6599.080317] 3de0: c056289c c056270c 00000000 d8963e68 00000000 00000000 00000000 00000000
[ 6599.094138] 3e00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 6599.107935] 3e20: 00000000 00000000 00000000 70802d62 c0564c88 00000064 d8954800 00000000
[ 6599.121742] 3e40: c0a27028 d8963ec0 00000000 d8954a2c d8963ebc d8963e60 c0562a78 c056281c
[ 6599.135520] 3e60: 00000029 00000000 00000000 00000000 00000000 00000000 000000e1 00000000
[ 6599.149266] 3e80: 00000000 00000000 00000000 00000000 00000000 70802d62 00000000 d8954800
[ 6599.163035] 3ea0: 000493e0 c0a27028 c0749c34 c0749c3c d8963ee4 d8963ec0 c05625bc c05629cc
[ 6599.176851] 3ec0: 000493e0 70802d62 c0749c34 d8954a28 000493e0 d8954800 d8963f0c d8963ee8
[ 6599.190657] 3ee0: c055aa9c c0562584 c055a654 d8954a28 d893c600 c0a2f810 00000000 dba6b500
[ 6599.204472] 3f00: d8963f44 d8963f10 c003d6e0 c055a660 00000008 d8940858 d8963f44 c0a2f810
[ 6599.218250] 3f20: d893c614 c0a2f824 c0a37900 00000008 d8940858 d893c600 d8963f7c d8963f48
[ 6599.231989] 3f40: c003d984 c003d5bc d8963f7c d8963f58 c0042d40 d8940840 d8942880 00000000
[ 6599.245771] 3f60: d893c600 c003d950 d8940858 da8fde88 d8963fac d8963f80 c00434cc c003d95c
[ 6599.259577] 3f80: ffffffff d8942880 c00433b0 00000000 00000000 00000000 00000000 00000000
[ 6599.273379] 3fa0: 00000000 d8963fb0 c00090ac c00433bc 00000000 00000000 00000000 00000000
[ 6599.287194] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 6599.300959] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 6599.314740] [<c057889c>] (bcm2835_sdhost_finish_command) from [<c0579f14>] (bcm2835_sdhost_request+0x4b4/0x684)
[ 6599.330552] [<c0579f14>] (bcm2835_sdhost_request) from [<c0557880>] (__mmc_start_request+0x78/0x164)
[ 6599.345386] [<c0557880>] (__mmc_start_request) from [<c0558608>] (mmc_start_request+0x94/0xc4)
[ 6599.359675] [<c0558608>] (mmc_start_request) from [<c05586a0>] (mmc_wait_for_req+0x68/0xe0)
[ 6599.373695] [<c05586a0>] (mmc_wait_for_req) from [<c0558790>] (mmc_wait_for_cmd+0x78/0xb4)
[ 6599.387631] [<c0558790>] (mmc_wait_for_cmd) from [<c05627bc>] (mmc_app_cmd+0xbc/0x110)
[ 6599.401154] [<c05627bc>] (mmc_app_cmd) from [<c056289c>] (mmc_wait_for_app_cmd+0x8c/0xfc)
[ 6599.414931] [<c056289c>] (mmc_wait_for_app_cmd) from [<c0562a78>] (mmc_send_app_op_cond+0xb8/0x148)
[ 6599.429663] [<c0562a78>] (mmc_send_app_op_cond) from [<c05625bc>] (mmc_attach_sd+0x44/0x188)
[ 6599.443776] [<c05625bc>] (mmc_attach_sd) from [<c055aa9c>] (mmc_rescan+0x448/0x498)
[ 6599.457105] [<c055aa9c>] (mmc_rescan) from [<c003d6e0>] (process_one_work+0x130/0x3a0)
[ 6599.470697] [<c003d6e0>] (process_one_work) from [<c003d984>] (worker_thread+0x34/0x530)
[ 6599.484401] [<c003d984>] (worker_thread) from [<c00434cc>] (kthread+0x11c/0x158)
[ 6599.497391] [<c00434cc>] (kthread) from [<c00090ac>] (ret_from_fork+0x14/0x28)
[ 6599.507565] Exception stack(0xd8963fb0 to 0xd8963ff8)
[ 6599.515473] 3fa0:                                     00000000 00000000 00000000 00000000
[ 6599.529175] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 6599.542838] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 6599.552287] Code: e5d930a4 e3130010 1a0000ea e3e0306d (e5803020) 
[ 6599.561132] ---[ end trace a28a793731ca13b5 ]---

Chris Burton

unread,
May 16, 2020, 2:16:05 PM5/16/20
to ClusterHAT
Hi, 
I've got a strange issue running a diskless clusterhat setup. The system randomly freezes on diskless blades. The kernel continues to run, all the running processes too. Any disk read from not cached files hangs, and sometimes recovers. Sometimes not.


I haven't seen that oops cause lasting issues before.

If you don't need to use an SD card in the Pi Zeros then you can add "dtparam=sd_poll_once=on" to the end of config.txt for the Pi Zeros ( /var/lib/clusterctrl/nfs/p1/boot/config.txt for P1 etc.) and let me know if that helps or not.

If you do need to use an SD card at some point after boot then feel free to add to the mmc0: timeout waiting for hardware interrupt (without SD card) issue I opened last year.

Chris.

Peter Saveliev

unread,
May 16, 2020, 6:32:50 PM5/16/20
to ClusterHAT
Seems like "dtparam=sd_poll_once=on" has helped — uptime >1h and still working.

I don't plan to use SD cards on the blades, so it won't be an issue.

Thanks a lot!
Reply all
Reply to author
Forward
0 new messages