Hi,
When attempting to boot locally built letux 5.10.16 kernel (
https://download.goldelico.com/letux-kernel/letux-5.10.16-ci20/) on ci20 board running debian bullseye rootfs (
https://download.goldelico.com/letux-debian-rootfs/20201123-bullseye-11.sid-mipsel-xfce4.tbz) the kernel would panic when testing the cmpxchg_futex_value_locked.
[ 0.111749] futex hash table entries: 256 (order: -1, 3072 bytes, linear)
[ 0.118615] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 801b97b4, ra == 80d894a0
[ 0.129461] Oops[#1]:
[ 0.131714] CPU: 0 PID: 1 Comm: swapper Not tainted 5.10.16-clang-ci20+ #1
[ 0.138687] $ 0 : 00000000 00000000 00000000 4b68f057
[ 0.143981] $ 4 : 84099ce0 00000000 00000000 00000000
[ 0.149278] $ 8 : 00000001 84099acc 00000000 00000036
[ 0.154575] $12 : 000000ef 00000000 00000000 00000001
[ 0.159873] $16 : 80cb0000 80cb0000 80cb0000 00000000
[ 0.165171] $20 : 80cc0000 80cb9e40 00000000 00000000
[ 0.170469] $24 : 00000000 80635110
[ 0.175766] $28 : 84098000 84099cd0 00000000 80d894a0
[ 0.181063] Hi : 00000000
[ 0.183974] Lo : 000000a0
[ 0.186903] epc : 801b97b4 cmpxchg_futex_value_locked+0x30/0x6c
[ 0.193081] ra : 80d894a0 futex_detect_cmpxchg+0x34/0x74
[ 0.111341] Status: 10000403 KERNEL EXL IE
[ 0.115578] Cause : 00800008 (ExcCode 02)
[ 0.119638] BadVA : 00000000
[ 0.122551] PrId : 3ee1024f (Ingenic XBurst)
[ 0.126966] Modules linked in:
[ 0.130059] Process swapper (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
[ 0.138119] Stack : 80cb0000 00000000 80cc0000 80cb9e40 ffffffff 4b68f057 80cb0000 80d8940c
[ 0.146609] 84097980 801a66c0 84099d48 80d8742c 00000000 84099d18 00000000 00000100
[ 0.155098] 00000100 80d88c68 00000008 4b68f057 80d8938c 84099d48 80df0000 80100c68
[ 0.163587] 00000cc0 8408ee80 00000000 802a95e0 8408ee80 00000dc0 ffffffff ffffffff
[ 0.172075] ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[ 0.180566] ...
[ 0.183009] Call Trace:
[ 0.185486] [<801b97b4>] cmpxchg_futex_value_locked+0x30/0x6c
[ 0.191318] [<80d8940c>] futex_init+0x80/0xe0
[ 0.195730] [<80100c68>] do_one_initcall+0x150/0x308
[ 0.113385] [<80d7db68>] do_initcall_level+0x128/0x160
[ 0.118590] [<80d7d9f8>] do_initcalls+0x60/0xa8
[ 0.123181] [<80d7d98c>] do_basic_setup+0x28/0x34
[ 0.127949] [<80d7d834>] kernel_init_freeable+0x6c/0xa8
[ 0.133249] [<80a0f4ac>] kernel_init+0x10/0x128
[ 0.137838] [<801030a0>] ret_from_kernel_thread+0x10/0x18
[ 0.143311]
[ 0.144803] Code: 00000000 24020000 0000000f <c0a30000> 14660005 00000000 00e00825 e0a10000 1020fff9
[ 0.154725]
[ 0.156343] ---[ end trace 45d2be734d847193 ]---
Looking at code in futex_detect_cmpxchg the compxchg_futex_value_locked is expected to return -EFAULT because of the address of 0 being passed. The exception handler isn't handling this properly. No modifications were made to the kernel source code, but the the letux precompiled code is built with the gcc-4.9 rather than gcc-10 on bulleye. Did come across others encountering this problem:
https://github.com/neilbrown/gnubee-tools/issues/25
https://groups.google.com/u/2/g/gnubee/c/sh9lqCDkO2I
As a work around I disabled the problematic check by patching the config to select CONFIG_HAVE_FUTEX_CMPXCHG. This allowed the locally built kernel to boot up. I was playing around with getting systemtap to work again on the mip and noticed it also had problems with things that expected the EFAULT exceptions to be triggered.
I suspect that there is some asm code that is using a register that gets clobbered gcc generated code or vice versa. The older gcc-4.9 compiler didn't use that register so the problem wasn't encountered.
-Will