loading hal_arm335xQEP fails with Bus Error

69 views
Skip to first unread message

Jacek Radzikowski

unread,
Sep 13, 2020, 3:33:16 AM9/13/20
to Machinekit
Hello,

Loading hal_arm335xQEP with loadrt causes rtapi_app to crash with bus error:

Sep 13 07:22:29 machinekit rtapi:0: 1:rtapi_app:5330:user signal 7 - 'Bus error' received, dumping core (current dir=/home/machinekit)
Sep 13 07:22:29 machinekit kernel: [  440.186026] Unhandled fault: external abort on non-linefetch (0x1818) at 0xb6f041a8
Sep 13 07:22:29 machinekit kernel: [  440.186041] pgd = bf066d95
Sep 13 07:22:29 machinekit kernel: [  440.186045] [b6f041a8] *pgd=9c782831, *pte=48300343, *ppte=48300833
Sep 13 07:22:29 machinekit rtapi:0: 1:rtapi_app:5330:user  --- rtapi_app backtrace: ---
Sep 13 07:22:29 machinekit rtapi:0: 1:rtapi_app:5330:user ERROR decoding backtrace: no debug info in ELF executable (-1)
Sep 13 07:22:29 machinekit rtapi:0: 1:rtapi_app:5330:user ERROR decoding backtrace: no debug info in ELF executable (-1)
Sep 13 07:22:29 machinekit rtapi:0: 1:rtapi_app:5330:user ERROR decoding backtrace: no debug info in ELF executable (-1)
Sep 13 07:22:29 machinekit rtapi:0: 1:rtapi_app:5330:user ERROR decoding backtrace: no debug info in ELF executable (-1)
Sep 13 07:22:29 machinekit rtapi:0: 1:rtapi_app:5330:user  --------------------
Sep 13 07:22:30 machinekit msgd:0: rtapi_app exit detected - scheduled shutdown
Sep 13 07:22:32 machinekit msgd:0: msgd shutting down


I used the following commands to load the module:
$ halrun
msgd:0 stopped
rtapi:0 stopped
rtapi_msgd command:  /usr/libexec/linuxcnc/rtapi_msgd --instance=0 --rtmsglevel=1 --usrmsglevel=1 --debug=1 --halsize=524288
rtapi_app command:  /usr/libexec/linuxcnc/rtapi_app_rt-preempt --instance=0 --debug=1
halcmd: loadrt hal_arm335xQEP encoders=eQEP0
<stdin>:1: insmod failed, returned -1:
rtapi_rpc(): reply timeout
halcmd:

Pin modes are set up by loading bone_eqep0-00A0.dtbo by uboot, audio overlay (with conflicting pins) is disabled. System has been installed from bone-debian-9.12-machinekit-armhf-2020-05-02-4gb.img, and kernel updated to 4.19.135-bone-rt-r55

I tried to google the problem, but couldn't find any information.
I will appreciate any ideas on how to make the module work.

Thank you,
Jacek.

Charles Steinkuehler

unread,
Sep 13, 2020, 9:26:02 AM9/13/20
to machi...@googlegroups.com
Double-check your device tree setup. The error "external abort on
non-linefetch" almost always means the underlying hardware is not
responding, typically because it's not been setup (taken out of reset,
clock enabled, etc) by the kernel.
--
Charles Steinkuehler
cha...@steinkuehler.net

Jacek Radzikowski

unread,
Sep 14, 2020, 3:53:31 AM9/14/20
to Charles Steinkuehler, machi...@googlegroups.com
Charles,

Thank you very much for the suggestion. Checking /sys/kernel/debug/pinctrl/44e10800.pinmux-pinctrl-single/pins showed that indeed, the pinmuxes were not set up correctly (but they were removed from the set of pins universal cape can control).
After RTFSing the universal cape overlay I figured out the names of the pins to change (two of the BBB pins used by QEP0 have two processor pins attached), and now I set the muxes using config-pin. Querying the pins using config-pin shows correct configuration, but the muxes in /sys/kernel/debug/pinctrl/44e10800.pinmux-pinctrl-single/pins still don't look right. The pins used by QEP0 are P9_25 (117), P9_27 (115), P9_91 (116) and P9_92 (114). The numbers in parentheses are GPIO numbers.
Here are the pin modes reported by config-pin after setting them up:
$ for p in P9_25 P9_27 P9_91 P9_92; do config-pin -q $p;done
P9_25 Mode: qep
P9_27 Mode: qep
P9_91 Mode: qep
P9_92 Mode: qep

Here are the muxes read from the kernel debug interface:
pin 114 (PIN114) 44e109c8 00000028 pinctrl-single
pin 115 (PIN115) 44e109cc 00000028 pinctrl-single
pin 116 (PIN116) 44e109d0 00000030 pinctrl-single
pin 117 (PIN117) 44e109d4 00000030 pinctrl-single


All modes (the last 3 bits) are set to 0, while they shouldn't be. I noticed that early in the boot process kernel throws an exception, which seems to come from the initialization code, probably processing the device tree. Full boot log is in the attachment:
[    0.281700] ------------[ cut here ]------------
[    0.281731] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:828 clk_core_disable_lock+0x15/0x1c
[    0.281774] l4_per_cm:clk:00d4:0 already disabled
[    0.281781] Modules linked in:
[    0.281798] CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.106-bone-rt-r49 #1stretch
[    0.281805] Hardware name: Generic AM33XX (Flattened Device Tree)
[    0.281844] [<c010ce9d>] (unwind_backtrace) from [<c010a81d>] (show_stack+0x11/0x14)
[    0.281861] [<c010a81d>] (show_stack) from [<c01256bf>] (__warn+0xb3/0xc4)
[    0.281874] [<c01256bf>] (__warn) from [<c0125703>] (warn_slowpath_fmt+0x33/0x48)
[    0.281888] [<c0125703>] (warn_slowpath_fmt) from [<c057e319>] (clk_core_disable_lock+0x15/0x1c)
[    0.281908] [<c057e319>] (clk_core_disable_lock) from [<c011a39f>] (_disable_clocks+0x23/0x7c)
[    0.281925] [<c011a39f>] (_disable_clocks) from [<c011bfe1>] (omap_hwmod_deassert_hardreset+0x81/0xf0)
[    0.281940] [<c011bfe1>] (omap_hwmod_deassert_hardreset) from [<c011c803>] (_omap_device_notifier_call+0x1ff/0x340)
[    0.281956] [<c011c803>] (_omap_device_notifier_call) from [<c013d9b3>] (notifier_call_chain+0x4b/0x60)
[    0.281970] [<c013d9b3>] (notifier_call_chain) from [<c013dc15>] (__blocking_notifier_call_chain+0x2d/0x3c)
[    0.281983] [<c013dc15>] (__blocking_notifier_call_chain) from [<c013dc3b>] (blocking_notifier_call_chain+0x17/0x1c)
[    0.281998] [<c013dc3b>] (blocking_notifier_call_chain) from [<c0600543>] (device_add+0x2a3/0x498)
[    0.282021] [<c0600543>] (device_add) from [<c0732653>] (of_platform_device_create_pdata+0x73/0xa0)
[    0.282038] [<c0732653>] (of_platform_device_create_pdata) from [<c07327b9>] (of_platform_bus_create+0x12d/0x27c)
[    0.282051] [<c07327b9>] (of_platform_bus_create) from [<c0732803>] (of_platform_bus_create+0x177/0x27c)
[    0.282064] [<c0732803>] (of_platform_bus_create) from [<c0732a57>] (of_platform_populate+0x67/0xe4)
[    0.282087] [<c0732a57>] (of_platform_populate) from [<c0d09549>] (pdata_quirks_init+0x5d/0x6c)
[    0.282102] [<c0d09549>] (pdata_quirks_init) from [<c0d094e3>] (omap_generic_init+0x15/0x1e)
[    0.282126] [<c0d094e3>] (omap_generic_init) from [<c0d02499>] (customize_machine+0x19/0x1c)
[    0.282145] [<c0d02499>] (customize_machine) from [<c0102929>] (do_one_initcall+0x45/0x17c)
[    0.282160] [<c0102929>] (do_one_initcall) from [<c0d00e39>] (kernel_init_freeable+0x1a7/0x242)
[    0.282182] [<c0d00e39>] (kernel_init_freeable) from [<c08a3805>] (kernel_init+0xd/0xdc)
[    0.282197] [<c08a3805>] (kernel_init) from [<c0101101>] (ret_from_fork+0x11/0x30)
[    0.282205] Exception stack(0xdc115fb0 to 0xdc115ff8)
[    0.282215] 5fa0:                                     00000000 00000000 00000000 00000000
[    0.282228] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    0.282238] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    0.282246] ---[ end trace 0000000000000001 ]---

All of this with the board running the stock kernel from the image:
Linux machinekit 4.19.106-bone-rt-r49 #1stretch PREEMPT RT Wed Mar 11 10:50:28 UTC 2020 armv7l GNU/Linux

And I still get bus errors when loading the module.
Does it look like something you've encountered in the past?

Thank you,
Jacek.



--
website: http://www.machinekit.io blog: http://blog.machinekit.io github: https://github.com/machinekit
---
You received this message because you are subscribed to the Google Groups "Machinekit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to machinekit+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/machinekit/8ae2751e-9c8f-df95-5ffc-b49a4ba6ba7c%40steinkuehler.net.


--
Given a choice between two theories, take the one which is funnier
machinekit-boot.txt

Charles Steinkuehler

unread,
Sep 15, 2020, 9:07:03 AM9/15/20
to machi...@googlegroups.com
I haven't seen anything like that personally, but I haven't worked much
with the eqep modules and anything I have done would be on much older
kernels.

I recommend asking for help on the Beagleboard group. RCN hangs out
here somewhat, but there's a much larger pool of Beagle specific
knowledge on their group. I think if you can get the kernel properly
loading the eqep driver, things should begin working.
Charles Steinkuehler
cha...@steinkuehler.net

Jacek Radzikowski

unread,
Sep 15, 2020, 11:28:15 AM9/15/20
to Charles Steinkuehler, machi...@googlegroups.com
ok, thanks. I will ask there.

Jacek.


Reply all
Reply to author
Forward
0 new messages