On both B and C revisions of Beaglebone black, our application (running under either the shipped Angstrom or Debian distros) locks up after random amounts of time. The debug port reports the following problems:
[ 752.038977] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:15]
[ 752.063631] BUG: scheduling while atomic: kworker/0:1/15/0x40010100
[ 754.935632] Kernel panic - not syncing: softlockup: hung tasks
[ 754.946347] [<c0010443>] (unwind_backtrace+0x1/0x8a) from [<c0455ced>] (panic
+0x51/0x148)
[ 754.960814] [<c0455ced>] (panic+0x51/0x148) from [<c00678df>] (watchdog_timer
_fn+0xc3/0xf8)
[ 754.975585] [<c00678df>] (watchdog_timer_fn+0xc3/0xf8) from [<c00421f3>] (__r
un_hrtimer+0xb3/0x140)
[ 754.991496] [<c00421f3>] (__run_hrtimer+0xb3/0x140) from [<c004295b>] (hrtime
r_interrupt+0xd7/0x1c8)
[ 755.007625] [<c004295b>] (hrtimer_interrupt+0xd7/0x1c8) from [<c001c8ed>] (om
ap2_gp_timer_interrupt+0x11/0x1c)
[ 755.025164] [<c001c8ed>] (omap2_gp_timer_interrupt+0x11/0x1c) from [<c0067fa9
>] (handle_irq_event_percpu+0x4d/0x174)
[ 755.043502] [<c0067fa9>] (handle_irq_event_percpu+0x4d/0x174) from [<c00680f9
>] (handle_irq_event+0x29/0x3c)
[ 755.060694] [<c00680f9>] (handle_irq_event+0x29/0x3c) from [<c0069b17>] (hand
le_level_irq+0x6f/0x84)
[ 755.076699] [<c0069b17>] (handle_level_irq+0x6f/0x84) from [<c0067bff>] (gene
ric_handle_irq+0x13/0x1c)
[ 755.092998] [<c0067bff>] (generic_handle_irq+0x13/0x1c) from [<c000c8cb>] (ha
ndle_IRQ+0x3b/0x5c)
[ 755.108422] [<c000c8cb>] (handle_IRQ+0x3b/0x5c) from [<c0008551>] (omap3_intc
_handle_irq+0x39/0x58)
[ 755.124243] [<c0008551>] (omap3_intc_handle_irq+0x39/0x58) from [<c045b95b>]
(__irq_svc+0x3b/0x5c)
[ 755.139692] Exception stack(0xdf117cf8 to 0xdf117d40)
[ 755.148646] 7ce0: 00000
000 c0849ac0
[ 755.162967] 7d00: 00000000 00000000 c07a2080 00000000 df116000 00000002 df117
db0 00000003
[ 755.177335] 7d20: c0812610 df116000 edb88320 df117d40 c0031241 c0030f4e 40000
133 ffffffff
[ 755.191758] [<c045b95b>] (__irq_svc+0x3b/0x5c) from [<c0030f4e>] (__do_softir
q+0x46/0x174)
[ 755.206357] [<c0030f4e>] (__do_softirq+0x46/0x174) from [<c0031241>] (irq_exi
t+0x29/0x50)
[ 755.220773] [<c0031241>] (irq_exit+0x29/0x50) from [<c000c8cf>] (handle_IRQ+0
x3f/0x5c)
[ 755.234735] [<c000c8cf>] (handle_IRQ+0x3f/0x5c) from [<c0008551>] (omap3_intc
_handle_irq+0x39/0x58)
[ 755.250549] [<c0008551>] (omap3_intc_handle_irq+0x39/0x58) from [<c045b95b>]
(__irq_svc+0x3b/0x5c)
[ 755.265995] Exception stack(0xdf117db0 to 0xdf117df8)
[ 755.274974] 7da0: 00000002 00000000 00007
d00 00000000
[ 755.289328] 7dc0: c07c81d0 c07c81d0 c07c75dc 00007d02 0000007d 00000003 c0812
610 de5f4b40
[ 755.303615] 7de0: 00000100 df117df8 c0025b2d c0025bea 00000133 ffffffff
[ 755.315416] [<c045b95b>] (__irq_svc+0x3b/0x5c) from [<c0025bea>] (omap3_nonco
re_dpll_set_rate+0x1f2/0x330)
[ 755.332303] [<c0025bea>] (omap3_noncore_dpll_set_rate+0x1f2/0x330) from [<c03
83273>] (clk_change_rate+0x1b/0x52)
[ 755.350016] [<c0383273>] (clk_change_rate+0x1b/0x52) from [<c03832fb>] (clk_s
et_rate+0x51/0x72)
[ 755.365316] [<c03832fb>] (clk_set_rate+0x51/0x72) from [<c034ba29>] (cpu0_set
_target+0xf9/0x198)
[ 755.380787] [<c034ba29>] (cpu0_set_target+0xf9/0x198) from [<c0348c5d>] (__cp
ufreq_driver_target+0x4d/0x70)
[ 755.397849] [<c0348c5d>] (__cpufreq_driver_target+0x4d/0x70) from [<c034b33b>
] (dbs_check_cpu+0x123/0x134)
[ 755.414765] [<c034b33b>] (dbs_check_cpu+0x123/0x134) from [<c034ad31>] (od_db
s_timer+0x4d/0xb0)
[ 755.430064] [<c034ad31>] (od_dbs_timer+0x4d/0xb0) from [<c003c8c5>] (process_
one_work+0x1b5/0x2c0)
[ 755.445787] [<c003c8c5>] (process_one_work+0x1b5/0x2c0) from [<c003cca3>] (wo
rker_thread+0x19b/0x258)
[ 755.461985] [<c003cca3>] (worker_thread+0x19b/0x258) from [<c003fb8f>] (kthre
ad+0x67/0x74)
[ 755.476573] [<c003fb8f>] (kthread+0x67/0x74) from [<c000c0dd>] (ret_from_fork
+0x11/0x34)
[ 755.490617] drm_kms_helper: panic occurred, switching back to text console
[ 696.382258] CAUTION: musb: Babble Interrupt Occurred
[ 696.422949] CAUTION: musb: Babble Interrupt Occurred
[ 696.483459] gadget: high-speed config #1: Multifunction with RNDIS
[ 702.376300] musb_g_ep0_irq 710: SetupEnd came in a wrong ep0stage wait
[ 775.658309] CAUTION: musb: Babble Interrupt Occurred
The application performs a modest amount of IO using FTDI through a USB hub. The more USB devices we service, the faster the freeze appears, though it can happen soon after the code runs, or as much as 16 hours later.
This very much appears to be a kernel or driver issue. Can it be diagnosed, and is there a way of fixing it?
On both B and C revisions of Beaglebone black, our application (running under either the shipped Angstrom or Debian distros) locks up after random amounts of time. The debug port reports the following problems:
[ 752.038977] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:15]
[ 752.063631] BUG: scheduling while atomic: kworker/0:1/15/0x40010100
[ 754.935632] Kernel panic - not syncing: softlockup: hung tasks
[ 754.946347] [<c0010443>] (unwind_backtrace+0x1/0x8a) from [<c0455ced>] (panic
+0x51/0x148)
[ 754.960814] [<c0455ced>] (panic+0x51/0x148) from [<c00678df>] (watchdog_timer
_fn+0xc3/0xf8)
Wouldn't it make sense to raise the priority of the timer or whatever
code was hitting the WDT? It seems like a bug if usb or other
interrupts could cause the watchdog to fail to be serviced given that
the system is still working properly.
How is the wdt serviced by the kernel? Is it easy to raise its priority?
Jan 19 21:12:10 osbo2 kernel: Modules linked in: binfmt_misc ctr ccm usb_f_acm u_serial g_serial libcomposite arc4 ath9k_htc ath9k_common ath9k_hw ath mac80211 cfg80211 omap_sham omap_aes
Jan 19 21:12:10 osbo2 kernel: CPU: 0 PID: 9958 Comm: kworker/0:5 Tainted: G L 3.19.0-rc4-bone2 #1
Jan 19 21:12:10 osbo2 kernel: Hardware name: Generic AM33XX (Flattened Device Tree)
Jan 19 21:12:10 osbo2 kernel: Workqueue: events od_dbs_timer
Jan 19 21:12:10 osbo2 kernel: task: dd1ff980 ti: de2da000 task.ti: de2da000
Jan 19 21:12:10 osbo2 kernel: PC is at run_timer_softirq+0x126/0x19c
Jan 19 21:12:10 osbo2 kernel: LR is at internal_add_timer+0x25/0x50
Jan 19 21:12:10 osbo2 kernel: pc : [<c00633fe>] lr : [<c0062359>] psr: 600f0133\x0asp : de2dbc78 ip : 00000000 fp : de5823a0
Jan 19 21:12:10 osbo2 kernel: r10: 00000000 r9 : 00000000 r8 : de2dbc90
Jan 19 21:12:10 osbo2 kernel: r7 : bf8cbf65 r6 : 00000002 r5 : dd52dc84 r4 : c0af8400
Jan 19 21:12:10 osbo2 kernel: r3 : 00000019 r2 : 00200200 r1 : dd52dc84 r0 : dd52dc84
Jan 19 21:12:10 osbo2 kernel: Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment kernel
Jan 19 21:12:10 osbo2 kernel: Control: 50c5387d Table: 9d218019 DAC: 00000015
Jan 19 21:12:10 osbo2 kernel: CPU: 0 PID: 9958 Comm: kworker/0:5 Tainted: G L 3.19.0-rc4-bone2 #1
Jan 19 21:12:10 osbo2 kernel: Hardware name: Generic AM33XX (Flattened Device Tree)
Jan 19 21:12:10 osbo2 kernel: Workqueue: events od_dbs_timer
Jan 19 21:12:10 osbo2 kernel: [<c0012441>] (unwind_backtrace) from [<c00101dd>] (show_stack+0x11/0x14)
Jan 19 21:12:10 osbo2 kernel: [<c00101dd>] (show_stack) from [<c009025d>] (watchdog_timer_fn+0x105/0x12c)
Jan 19 21:12:10 osbo2 kernel: [<c009025d>] (watchdog_timer_fn) from [<c0064005>] (__run_hrtimer+0x45/0x11c)
Jan 19 21:12:10 osbo2 kernel: [<c0064005>] (__run_hrtimer) from [<c0064551>] (hrtimer_interrupt+0xc1/0x20c)
Jan 19 21:12:10 osbo2 kernel: [<c0064551>] (hrtimer_interrupt) from [<c0020103>] (omap2_gp_timer_interrupt+0x23/0x28)
Jan 19 21:12:10 osbo2 kernel: [<c0020103>] (omap2_gp_timer_interrupt) from [<c005c7ff>] (handle_irq_event_percpu+0x67/0x12c)
Jan 19 21:12:10 osbo2 kernel: [<c005c7ff>] (handle_irq_event_percpu) from [<c005c8e5>] (handle_irq_event+0x21/0x2c)
Jan 19 21:12:10 osbo2 kernel: [<c005c8e5>] (handle_irq_event) from [<c005e1c1>] (handle_level_irq+0x55/0x90)
Jan 19 21:12:10 osbo2 kernel: [<c005e1c1>] (handle_level_irq) from [<c005c1ef>] (generic_handle_irq+0x23/0x2c)
Jan 19 21:12:10 osbo2 kernel: [<c005c1ef>] (generic_handle_irq) from [<c005c383>] (__handle_domain_irq+0x3b/0x80)
Jan 19 21:12:10 osbo2 kernel: [<c005c383>] (__handle_domain_irq) from [<c000847d>] (omap_intc_handle_irq+0x85/0x8c)
Jan 19 21:12:10 osbo2 kernel: [<c000847d>] (omap_intc_handle_irq) from [<c05e729b>] (__irq_svc+0x3b/0x5c)
Jan 19 21:12:10 osbo2 kernel: Exception stack(0xde2dbc30 to 0xde2dbc78)
Jan 19 21:12:10 osbo2 kernel: bc20: dd52dc84 dd52dc84 00200200 00000019
Jan 19 21:12:10 osbo2 kernel: bc40: c0af8400 dd52dc84 00000002 bf8cbf65 de2dbc90 00000000 00000000 de5823a0
Jan 19 21:12:10 osbo2 kernel: bc60: 00000000 de2dbc78 c0062359 c00633fe 600f0133 ffffffff
Jan 19 21:12:10 osbo2 kernel: [<c05e729b>] (__irq_svc) from [<c00633fe>] (run_timer_softirq+0x126/0x19c)
Jan 19 21:12:10 osbo2 kernel: [<c00633fe>] (run_timer_softirq) from [<c0032a09>] (__do_softirq+0xb1/0x1c0)
Jan 19 21:12:10 osbo2 kernel: [<c0032a09>] (__do_softirq) from [<c0032ce1>] (irq_exit+0x79/0xb0)
Jan 19 21:12:10 osbo2 kernel: [<c0032ce1>] (irq_exit) from [<c005c387>] (__handle_domain_irq+0x3f/0x80)
Jan 19 21:12:10 osbo2 kernel: [<c005c387>] (__handle_domain_irq) from [<c000847d>] (omap_intc_handle_irq+0x85/0x8c)
Jan 19 21:12:10 osbo2 kernel: [<c000847d>] (omap_intc_handle_irq) from [<c05e729b>] (__irq_svc+0x3b/0x5c)
Jan 19 21:12:10 osbo2 kernel: Exception stack(0xde2dbd40 to 0xde2dbd88)
Jan 19 21:12:10 osbo2 kernel: bd40: de0360c0 00000000 ffffffff 00000000 de016880 00000000 de034980 00000000
Jan 19 21:12:10 osbo2 kernel: bd60: de016880 c0a5da5c 000f4240 11e1a300 00000000 de2dbd88 c002a721 c002a740
Jan 19 21:12:10 osbo2 kernel: bd80: 400f0033 ffffffff
Jan 19 21:12:10 osbo2 kernel: [<c05e729b>] (__irq_svc) from [<c002a740>] (_omap3_wait_dpll_status+0x38/0xe4)
Jan 19 21:12:10 osbo2 kernel: [<c002a740>] (_omap3_wait_dpll_status) from [<c002a91b>] (_omap3_noncore_dpll_bypass+0x3b/0x70)
Jan 19 21:12:10 osbo2 kernel: [<c002a91b>] (_omap3_noncore_dpll_bypass) from [<c002ab23>] (omap3_noncore_dpll_set_rate+0x63/0x3b4)
Jan 19 21:12:10 osbo2 kernel: [<c002ab23>] (omap3_noncore_dpll_set_rate) from [<c04fc295>] (clk_change_rate+0xa9/0xe4)
Jan 19 21:12:10 osbo2 kernel: [<c04fc295>] (clk_change_rate) from [<c04fc361>] (clk_set_rate+0x91/0x98)
Jan 19 21:12:10 osbo2 kernel: [<c04fc361>] (clk_set_rate) from [<c04c4469>] (set_target+0xf5/0x2b8)
Jan 19 21:12:10 osbo2 kernel: [<c04c4469>] (set_target) from [<c04c065b>] (__cpufreq_driver_target+0x133/0x25c)
Jan 19 21:12:10 osbo2 kernel: [<c04c065b>] (__cpufreq_driver_target) from [<c04c3d1f>] (dbs_check_cpu+0xaf/0xfc)
Jan 19 21:12:10 osbo2 kernel: [<c04c3d1f>] (dbs_check_cpu) from [<c04c2f53>] (od_dbs_timer+0x53/0xb4)
Jan 19 21:12:10 osbo2 kernel: [<c04c2f53>] (od_dbs_timer) from [<c003e9a1>] (process_one_work+0xb9/0x278)
Jan 19 21:12:10 osbo2 kernel: [<c003e9a1>] (process_one_work) from [<c003f27b>] (worker_thread+0xdb/0x37c)
Jan 19 21:12:10 osbo2 kernel: [<c003f27b>] (worker_thread) from [<c0041ed3>] (kthread+0x9b/0xb0)
Jan 19 21:12:10 osbo2 kernel: [<c0041ed3>] (kthread) from [<c000d739>] (ret_from_fork+0x11/0x38)