Serial hang: Testing Paul's patches

瀏覽次數:55 次
跳到第一則未讀訊息

Dirk Behme

未讀,
2008年8月1日 凌晨2:15:442008/8/1
收件者:pa...@pwsan.com、Beagle Board

Testing the patches

http://www.pwsan.com/omap/gptimer_workaround_3.tar.gz
http://www.pwsan.com/omap/read_die_ids.patch

boot log outputs:

<7>OMAP_TAP_IDCODE 0x0b7ae02f REV 0 HAWKEYE 0xb7ae MANF 017
<7>OMAP_TAP_DIE_ID_0: 0x00000000
<7>OMAP_TAP_DIE_ID_1: 0x00000000 DEV_REV: 0
<7>OMAP_TAP_DIE_ID_2: 0x00000000
<7>OMAP_TAP_DIE_ID_3: 0x00000000
<7>OMAP_TAP_PROD_ID_0: 0x00000000 DEV_TYPE: 0

This board has the serial hang without patches applied.

I tried patch series 3 in two kernel configurations. Without and with
CONFIG_DEBUG_LL enabled. My default is without debug ll enabled. Then
I wondered why I don't get OMAP_TAP output at boot and only with
dmesg, so for a second try I enabled debug ll.

Result:

- With older patch series *2* yesterday I had a lot of "Timer
workaround" outputs while typing at serial console (and after some
time serial hang)

- With this patch series (*3*) I don't see any "*** GPTIMER missed
match interrupt!" outputs. Independent of debug ll enabled or not (not
sure if I missed anything or if this is intended).

- *Without* debug ll enabled I get serial hang (with patch series 3
applied) in < 10min doing something like in attachment

- *With* debug ll enabled, I couldn't get serial hang within ~20min
doing similiar stuff like in attachment. Maybe it simply will take
longer with debug ll enabled to hang. Or I had luck.

Most probably I did something wrong here, sorry then. But maybe it helps.

Anyway, many thanks to Paul looking at this!

Dirk

serial_hang_test.txt

Koen Kooi

未讀,
2008年8月1日 中午12:40:582008/8/1
收件者:Beagle Board

Koen Kooi

未讀,
2008年8月1日 下午1:10:402008/8/1
收件者:Beagle Board


On 1 aug, 18:40, Koen Kooi <koen.k...@gmail.com> wrote:
> On 1 aug, 08:15, Dirk Behme <dirk.be...@googlemail.com> wrote:
>
> > Testing the patches
>
> >http://www.pwsan.com/omap/gptimer_workaround_3.tar.gzhttp://www.pwsan...
>
> uImage with those patches applied:
>
> http://amethyst.openembedded.net/~koen/beagleboard/demo/uImage-2.6.26...

Result:

----
OMAP_TAP_IDCODE 0x0b7ae02f REV 0 HAWKEYE 0xb7ae MANF
017
OMAP_TAP_DIE_ID_0:
0x00000000
OMAP_TAP_DIE_ID_1: 0x00000000 DEV_REV:
0
OMAP_TAP_DIE_ID_2:
0x00000000
OMAP_TAP_DIE_ID_3:
0x00000000
OMAP_TAP_PROD_ID_0: 0x00000000 DEV_TYPE: 0
----

Not from a serial hang, but it is scary debug stuff appearing out of
nowhere, system seems to continue running fine:

------------[ cut here ]------------
WARNING: at arch/arm/mach-omap2/timer-gp.c:73 omap2_gp_timer_interrupt
+0x44/0x78()
Modules linked in: ipv6
[<c0037b88>] (dump_stack+0x0/0x14) from [<c0053de0>] (warn_on_slowpath
+0x4c/0x68)
[<c0053d94>] (warn_on_slowpath+0x0/0x68) from [<c003db74>]
(omap2_gp_timer_interrupt+0x44/0x78)
r6:00000000 r5:c04059f8 r4:00000002
[<c003db30>] (omap2_gp_timer_interrupt+0x0/0x78) from [<c0079dac>]
(handle_IRQ_event+0x3c/0x74)
r5:00000000 r4:c03ff300
[<c0079d70>] (handle_IRQ_event+0x0/0x74) from [<c007b67c>]
(handle_level_irq+0xd4/0xf0)
r7:c7c74c60 r6:00000000 r5:00000025 r4:c04086ec
[<c007b5a8>] (handle_level_irq+0x0/0xf0) from [<c0033048>]
(__exception_text_start+0x48/0x64)
r5:c04086ec r4:00000025
[<c0033000>] (__exception_text_start+0x0/0x64) from [<c00336b0>]
(__irq_svc+0x30/0x80)
Exception stack(0xc7db1cc8 to 0xc7db1d10)
1cc0: 00000001 00000000 000b689f 0004c6a4 c72ec780
00000016
1ce0: c7ccdce0 c7c74c60 deea4357 00000177 c7ccdf50 c7db1d44 c0429380
c7db1d10
1d00: 00000410 c031a38c 00000013 ffffffff
r7:c7c74c60 r6:c7ccdce0 r5:d8200000 r4:ffffffff
[<c031a09c>] (schedule+0x0/0x3f8) from [<c031a648>] (schedule_timeout
+0x20/0xb8)
[<c031a628>] (schedule_timeout+0x0/0xb8) from [<c0319f7c>]
(wait_for_common+0xdc/0x168)
r7:c7db0000 r6:7fffffff r5:c7ccdce0 r4:c7db1da8
[<c0319ea0>] (wait_for_common+0x0/0x168) from [<c031a098>]
(wait_for_completion+0x18/0x1c)
[<c031a080>] (wait_for_completion+0x0/0x1c) from [<c021c658>]
(mmc_wait_for_req+0x104/0x114)
[<c021c554>] (mmc_wait_for_req+0x0/0x114) from [<c021c6d4>]
(mmc_wait_for_cmd+0x6c/0x7c)
r6:c7cbfc00 r5:c7db1f48 r4:c7db1e1c
[<c021c668>] (mmc_wait_for_cmd+0x0/0x7c) from [<c02219cc>]
(mmc_blk_issue_rq+0x290/0x550)
r7:c7d4aa00 r6:c7db0000 r5:c7d794e4 r4:c7db1ec4
[<c022173c>] (mmc_blk_issue_rq+0x0/0x550) from [<c0222484>]
(mmc_queue_thread+0xb4/0xdc)
[<c02223d0>] (mmc_queue_thread+0x0/0xdc) from [<c0066d00>] (kthread
+0x54/0x80)
r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c02223d0
r4:c7d794e4
[<c0066cac>] (kthread+0x0/0x80) from [<c0056fec>] (do_exit+0x0/0x604)
r5:00000000 r4:00000000
---[ end trace 7286ff11b7be1fa3 ]---

Paul Walmsley

未讀,
2008年8月1日 下午2:31:232008/8/1
收件者:Dirk Behme、Beagle Board
Hello Dirk,

On Fri, 1 Aug 2008, Dirk Behme wrote:

> Result:


>
> - *Without* debug ll enabled I get serial hang (with patch series 3 applied)
> in < 10min doing something like in attachment
>
> - *With* debug ll enabled, I couldn't get serial hang within ~20min doing
> similiar stuff like in attachment. Maybe it simply will take longer with debug
> ll enabled to hang. Or I had luck.
>
> Most probably I did something wrong here, sorry then. But maybe it helps.

It does help, very much. Dirk, when your Beagle serial hangs again, could
you please send a Sysrq-q (break + q on serial console) and E-mail me the
GPTIMER register dump at the bottom? It will look something like this:

<3>GPT TCRR: ffff9c66
GPT TCRR: ffff9c66
<3>GPT TMAT: ffffbfff
GPT TMAT: ffffbfff
<3>GPT TISR: 00000000
GPT TISR: 00000000
<3>GPT TIER: 00000003
GPT TIER: 00000003
<3>GPT TCLR: 00000041
GPT TCLR: 00000041
<3>GPT TOCR: 00000000
GPT TOCR: 00000000
<3>GPT TOWR: 00000000
GPT TOWR: 00000000


thank you for the TAP data and the testing help,

- Paul

Koen Kooi

未讀,
2008年8月2日 上午9:07:152008/8/2
收件者:Beagle Board
On 1 aug, 20:31, Paul Walmsley <p...@pwsan.com> wrote:

> It does help, very much.  Dirk, when your Beagle serial hangs again, could
> you please send a Sysrq-q (break + q on serial console) and E-mail me the
> GPTIMER register dump at the bottom?  It will look something like this:
>
> <3>GPT TCRR: ffff9c66
> GPT TCRR: ffff9c66
> <3>GPT TMAT: ffffbfff
> GPT TMAT: ffffbfff
> <3>GPT TISR: 00000000
> GPT TISR: 00000000
> <3>GPT TIER: 00000003
> GPT TIER: 00000003
> <3>GPT TCLR: 00000041
> GPT TCLR: 00000041
> <3>GPT TOCR: 00000000
> GPT TOCR: 00000000
> <3>GPT TOWR: 00000000
> GPT TOWR: 00000000
>
> thank you for the TAP data and the testing help,

after 4 hours and 32 minutes:

GPT TCRR: 20a06241
GPT TMAT: ffffbfff
GPT TISR: 00000000
GPT TIER: 00000003
GPT TCLR: 00000041
GPT TOCR: 00000000
GPT TOWR: 00000000

regards,

Koen

Dirk Behme

未讀,
2008年8月3日 上午8:01:522008/8/3
收件者:Paul Walmsley、beagl...@googlegroups.com

After Paul explained me that I have to use 'ctrl-a f q' at minicom to
send Sysrq-q (thanks!) with debug ll disabled I get serial hang after
~27min:

-- cut --
root@beagleboard:~# SysRq : Show Pending Timers
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 1724156066894 nsecs

cpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1216686567818359375 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <c03f3d98>, hrtimer_wakeup, S:01, do_nanosleep, xkbd/1580
# expires at 1382835683593 nsecs [in -341320383301 nsecs]
#1: <c03f3d98>, hrtimer_wakeup, S:01, do_nanosleep, ipaq-sleep/1571
# expires at 1383926635742 nsecs [in -340229431152 nsecs]
#2: <c03f3d98>, tick_sched_timer, S:01,
tick_nohz_restart_sched_tick, swapper/0
# expires at 1656406250000 nsecs [in -67749816894 nsecs]
#3: <c03f3d98>, it_real_fn, S:01, do_setitimer, Xfbdev/1498
# expires at 1656428782958 nsecs [in -67727283936 nsecs]
.expires_next : 1382835683593 nsecs
.hres_active : 1
.nr_events : 146275
.nohz_mode : 2
.idle_tick : 1382828125000 nsecs
.tick_stopped : 0
.idle_jiffies : 138601
.idle_calls : 686178
.idle_sleeps : 668180
.idle_entrytime : 1722605987548 nsecs
.idle_waketime : 1656403564453 nsecs
.idle_exittime : 1656403747558 nsecs
.idle_sleeptime : 1663773041876 nsecs
.last_jiffies : 173619
.next_jiffies : 173620
.idle_expires : 1382835937500 nsecs
jiffies: 173619


Tick Device: mode: 1
Clock Event Device: gp timer
max_delta_ns: 2147483647
min_delta_ns: 30517
mult: 140737
shift: 32
mode: 3
next_event: 1382835683593 nsecs
set_next_event: omap2_gp_timer_set_next_event
set_mode: omap2_gp_timer_set_mode
event_handler: hrtimer_interrupt


GPT TCRR: 00aabe07


GPT TMAT: ffffbfff
GPT TISR: 00000000
GPT TIER: 00000003
GPT TCLR: 00000041
GPT TOCR: 00000000
GPT TOWR: 00000000

-- cut --

No "*** GPTIMER missed match interrupt!", though.

Doing 'ctrl-a f q' several times, "now at XXX nsecs" and GPT TCRR:
ZZZZ still increases. The other GPT values stay the same.

Dirk


Koen Kooi

未讀,
2008年8月3日 上午8:14:062008/8/3
收件者:Beagle Board
On 3 aug, 14:01, Dirk Behme <dirk.be...@googlemail.com> wrote:

> GPT TCRR: 00aabe07
> GPT TMAT: ffffbfff
> GPT TISR: 00000000
> GPT TIER: 00000003
> GPT TCLR: 00000041
> GPT TOCR: 00000000
> GPT TOWR: 00000000
> -- cut --
>
> No "*** GPTIMER missed match interrupt!", though.
>
> Doing  'ctrl-a f q' several times, "now at XXX nsecs" and GPT TCRR:
> ZZZZ still increases. The other GPT values stay the same.

With all the hangs I see only TCRR differs, TMAT, TIET and TCLR are
always the same when hanging.

regards,

Koen

Dirk Behme

未讀,
2008年8月4日 上午11:10:292008/8/4
收件者:Paul Walmsley、beagl...@googlegroups.com、Philip

Yesterday, Philip had this with Koen's image:

*** GPTIMER missed match interrupt! last load: ffff76e1


------------[ cut here ]------------
WARNING: at arch/arm/mach-omap2/timer-gp.c:73
omap2_gp_timer_interrupt+0x44/0x78()

Modules linked in: pegasus ipv6


[<c0037b88>] (dump_stack+0x0/0x14) from [<c0053de0>]
(warn_on_slowpath+0x4c/0x68)
[<c0053d94>] (warn_on_slowpath+0x0/0x68) from [<c003db74>]
(omap2_gp_timer_interrupt+0x44/0x78)
r6:00000000 r5:c04059f8 r4:00000002
[<c003db30>] (omap2_gp_timer_interrupt+0x0/0x78) from [<c0079dac>]
(handle_IRQ_event+0x3c/0x74)
r5:00000000 r4:c03ff300
[<c0079d70>] (handle_IRQ_event+0x0/0x74) from [<c007b67c>]
(handle_level_irq+0xd4/0xf0)

r7:00000ab6 r6:00000000 r5:00000025 r4:c04086ec


[<c007b5a8>] (handle_level_irq+0x0/0xf0) from [<c0033048>]
(__exception_text_start+0x48/0x64)
r5:c04086ec r4:00000025
[<c0033000>] (__exception_text_start+0x0/0x64) from [<c00336b0>]
(__irq_svc+0x30/0x80)

Exception stack(0xc03fbf08 to 0xc03fbf50)
bf00: a0000013 e671ecb1 220cb6b1 00000000 198e134f
a0000013
bf20: 00000000 00000ab6 19254d38 00000ab5 0000004a c03fbfa4 c03fbee8
c03fbf50
bf40: c0069908 c006fdbc 60000013 ffffffff

r7:00000ab6 r6:00000000 r5:d8200000 r4:ffffffff
[<c006fa60>] (tick_nohz_stop_sched_tick+0x0/0x390) from [<c0034a78>]
(cpu_idle+0x44/0x80)
[<c0034a34>] (cpu_idle+0x0/0x80) from [<c0318f20>] (rest_init+0x58/0x6c)
r5:c042804c r4:c043fe8c
[<c0318ec8>] (rest_init+0x0/0x6c) from [<c0008b64>]
(start_kernel+0x24c/0x2a4)
[<c0008918>] (start_kernel+0x0/0x2a4) from [<80008034>] (0x80008034)
---[ end trace 4118c6862fc8eec1 ]---


Koen Kooi

未讀,
2008年8月4日 中午12:55:242008/8/4
收件者:Beagle Board


On 4 aug, 17:10, Dirk Behme <dirk.be...@googlemail.com> wrote:
> Koen Kooi wrote:
> > On 3 aug, 14:01, Dirk Behme <dirk.be...@googlemail.com> wrote:
>
> >>GPT TCRR: 00aabe07
> >>GPT TMAT: ffffbfff
> >>GPT TISR: 00000000
> >>GPT TIER: 00000003
> >>GPT TCLR: 00000041
> >>GPT TOCR: 00000000
> >>GPT TOWR: 00000000
> >>-- cut --
>
> >>No "*** GPTIMER missed match interrupt!", though.
>
> >>Doing 'ctrl-a f q' several times, "now at XXX nsecs" and GPT TCRR:
> >>ZZZZ still increases. The other GPT values stay the same.
>
> > With all the hangs I see only TCRR differs, TMAT, TIET and TCLR are
> > always the same when hanging.
>
> Yesterday, Philip had this with Koen's image:
>
> *** GPTIMER missed match interrupt! last load: ffff76e1

<snip>

Khasim reports that switching from 32k timer to MPU timer eliminates
the hang.

regards,

Koen

Dirk Behme

未讀,
2008年8月5日 中午12:44:132008/8/5
收件者:beagl...@googlegroups.com

Update from Khasim about 32k timer weirdness:

http://www.beagleboard.org/irclogs/index.php?date=2008-08-05#T16:27:58

Dirk

回覆所有人
回覆作者
轉寄
0 則新訊息