[HELP NEEDED] Multiple Bugs Caused System hangs completely after a while if Depth 16 is used

16 views
Skip to first unread message

Tom Li

unread,
Jun 1, 2014, 1:49:33 PM6/1/14
to loongs...@googlegroups.com
The performance of SM712 graphics is really poor. But even worse,
depth 16, which is a faster choice, is broken. We have to use
depth 32 or fbdev, both slower than depth 16.

The multiple bugs in xf86-video-siliconmotion, cause many lockup issues[0]
when using depth 16. It is the reason of YeeLoong 8089D CRASHES EVERYDAY
when using this driver.

Some of the issues are resolved, but the others not.
I just working with the issue these days and figured out that

x11perf --copywinpix100

is a good way to hang the system completely.

The last debug logs from the drivers:

> SMI_SetupForSolidFill
color=0000FFFF rop=03
DPR14 = 0000FFFF
DPR34 = FFFFFFFF
DPR38 = FFFFFFFF
< SMI_SetupForSolidFill
> SMI_SubsequentSolidFillRect
x=3 y=0 w=600 h=600
DPR04 = 00030000
DPR08 = 02580258
DPR0C = 800000F0
< SMI_SubsequentSolidFillRect
> SMI_AccelSync
< SMI_AccelSync
(completely hang)

I checked the SetupForSolidFill, SubsequentSolidFillRect
and AccelSync from XAA implementation.

But I realized I'm on the wrong way of solving the problem.
Because neither set NoAccel nor XAA, I can't get rid of crashes.

EXA is even stranger, it is broken for a long time,
it renders corrupted fonts, and hang completely after a while.
But, for this x11perf test, it works without any issue. Really strange.


(I don't want to tell more details,
please read the original bug report.)

------

I found the problem is really hard to figure out.

In a nutshell, when using depth 16, the computer completely hang by
running the above x11perf test. But the the driver doesn't break things
and hang the computer immediately/directly, the hang occured at a certain
place included the memory operations. (e.g pointer dereference, memmove)

Before hang, kernel will receive a random spurious 8259A IRQ interrupt:

[ 168.972000] spurious 8259A interrupt: IRQ3.
[ 251.516000] spurious 8259A interrupt: IRQ13.
[ 254.968000] spurious 8259A interrupt: IRQ3.
[ 254.968000] spurious 8259A interrupt: IRQ10.
[ 260.968000] spurious 8259A interrupt: IRQ6.
[ 46.704000] spurious 8259A interrupt: IRQ13.
[ 47.940000] spurious 8259A interrupt: IRQ10.

If you are luckily enough, a kernel panic will be occurred instead of
mysterious lockups:

[ 1556.616000] spurious 8259A interrupt: IRQ0.
[ 1556.620000] CPU 0 Unable to handle kernel paging request at virtual
address 000000000000a400, epc == 000000000000a400, ra ==
ffffffff80279ae4
[ 1556.620000] Oops[#1]:
[ 1556.620000] CPU: 0 PID: 4661 Comm: X Not tainted 3.14.4-yeeloong-gaizi+ #10
[ 1556.620000] task: 98000000b86dba80 ti: 98000000bfef8000 task.ti:
98000000bfef8000
[ 1556.620000] $ 0 : 0000000000000000 ffffffffcfffffff
000000000000a400 0000000000000000
[ 1556.620000] $ 4 : 0000000000000008 98000000bf360000
0000000000000020 0000000000000006
[ 1556.620000] $ 8 : 0000000000000001 000000000000ffff
000000000000ffff 00000000764cf0d0
[ 1556.620000] $12 : 00000000140044e0 000000001000001f
0000000076474a76 ffffffffffffffff
[ 1556.620000] $16 : 0000000000000000 0000000000000000
0000000000000008 ffffffff80b61cc0
[ 1556.620000] $20 : 000000000000ffff ffffffff80adef58
0000000000000008 ffffffff80ba0000
[ 1556.620000] $24 : 00000000000004b0 0000000000000800
[ 1556.620000] $28 : 98000000bfef8000 98000000bfefbe00
98000000bf345080 ffffffff80279ae4
[ 1556.620000] Hi : 0000000000000000
[ 1556.620000] Lo : 0000000000000000
[ 1556.620000] epc : 000000000000a400 0xa400
[ 1556.620000] Not tainted
[ 1556.620000] ra : ffffffff80279ae4 handle_irq_event_percpu+0x6c/0x220
[ 1556.620000] Status: 140044e2 KX SX UX KERNEL EXL
[ 1556.620000] Cause : 10008408
[ 1556.620000] BadVA : 000000000000a400
[ 1556.620000] PrId : 00006303 (ICT Loongson-2)
[ 1556.620000] Modules linked in: ctr ccm netconsole configfs arc4
rtl8187 eeprom_93cx6 led_class mac80211 cfg80211 psmouse
loongson2_cpufreq rfkill 8139too mii snd_cs5535audio snd_ac97_codec
ac97_bus snd_pcm snd_timer snd soundcore ipv6
[ 1556.620000] Process X (pid: 4661, threadinfo=98000000bfef8000,
task=98000000b86dba80, tls=00000000773e74a0)
[ 1556.620000] Stack : ffffffff80b61cc0 000000000000ffff
000000000000ffff 000000000000ffff
000000000000ffff ffff00000000ffff 000000000000ffff 00000000102e4848
0000000000000000 ffffffff80279d04 000000000000ffff 000000000000ffff
ffffffff80b61cc0 ffffffff8027d278 0000000000000000 ffffffff80279154
0000000000000000 ffffffff80209310 0000000000000008 ffffffff80203f38
0000000000000000 ffffffff80206f40 0000000000000000 ffffffffcfffffff
00000000764ce992 0000000076474688 00000000764ce8d2 00000000764745c8
00000000000000c6 0000000000000006 0000000000000001 000000000000ffff
000000000000ffff 00000000764cf0d0 000000000000002e 00000000000000c8
0000000076474a76 ffffffffffffffff 0000000000000000 000000000000ffff
...
[ 1556.620000] Call Trace:
[ 1556.620000] [<ffffffff80279d04>] handle_irq_event+0x6c/0xa8
[ 1556.620000] [<ffffffff8027d278>] handle_level_irq+0xb0/0x170
[ 1556.620000] [<ffffffff80279154>] generic_handle_irq+0x5c/0x80
[ 1556.620000] [<ffffffff80209310>] do_IRQ+0x18/0x28
[ 1556.620000] [<ffffffff80203f38>] mach_irq_dispatch+0x50/0x78
[ 1556.620000] [<ffffffff80206f40>] ret_from_irq+0x0/0x4
[ 1556.620000]
[ 1556.620000]
Code: (Bad address in epc)
[ 1556.620000]
[ 1556.624000] ---[ end trace 6928418bef65e208 ]---
[ 1556.624000] Kernel panic - not syncing: Fatal exception in interrupt


The driver break some parts of memory is a reasonable explain,
or it is a bug of the hardware? It is really mysterious bug,

I'm not a X Window or hardware experts. I don't know what to do next.
Any idea is welcomed.

BTW, I am offering a $20 Amazon Giftcard for the reward. If anyone fixes
the issue properly (without just adding dirty hacks or commenting out
the code), and
successfully send to the upstream and get merged, please email me :)

[0] https://bugs.freedesktop.org/show_bug.cgi?id=21622


---
Thanks,
Tom Li
Reply all
Reply to author
Forward
0 new messages