allwinner h3: cpu hang problem with ondemand cpufreq gov

22 views
Skip to first unread message

Perr Zhang

unread,
Sep 12, 2021, 2:53:55 AM9/12/21
to linux-sunxi
cpu:
allwinner h3(cortex-a7)

Phenomenon:
cpu is enabled ondemand cpufreq frequency modulation, there is no such problem when frequency modulation is not enabled
The cpu will be stuck for a period of time in the stuck place described below, and it will recover after a while

Code location::
https://elixir.bootlin.com/linux/v4.19.206/source/drivers/usb/core/hcd.c#L1771

By analyzing the perf output file, __usb_hcd_giveback_urb() will get stuck in these places:
c04d35bc __usb_hcd_giveback_urb+0x60 ([kernel.kallsyms]): 1586 times
c04d35f4 __usb_hcd_giveback_urb+0x98 ([kernel.kallsyms]): 1 time
c04d3590 __usb_hcd_giveback_urb+0x34 ([kernel.kallsyms]): 4 times
c04d3574 __usb_hcd_giveback_urb+0x18 ([kernel.kallsyms]): 1 time
c04d3580 __usb_hcd_giveback_urb+0x24 ([kernel.kallsyms]): 1 time
c04d3598 __usb_hcd_giveback_urb+0x3c ([kernel.kallsyms]): 1 time
The less frequent ones are ignored

0000084c <__usb_hcd_giveback_urb>:
84c: e92d4070 push {r4, r5, r6, lr}
850: e3a03000 mov r3, #0
854: e590103c ldr r1, [r0, #60] ; 0x3c
858: e1a04000 mov r4, r0
85c: e5902028 ldr r2, [r0, #40] ; 0x28
860: e3110001 tst r1, #1
864: e5905024 ldr r5, [r0, #36] ; 0x24
868: e5906010 ldr r6, [r0, #16]
86c: e592003c ldr r0, [r2, #60] ; 0x3c
870: e5843004 str r3, [r4, #4]
874: 1a00001b bne 8e8 <__usb_hcd_giveback_urb+0x9c>
878: e1a01004 mov r1, r4
87c: ebffffe1 bl 808 <unmap_urb_for_dma>
880: e1a00005 mov r0, r5
884: ebfffffe bl 0 <usb_anchor_suspend_wakeups>
888: e1a00004 mov r0, r4
88c: ebfffffe bl 0 <usb_unanchor_urb>
890: e5846038 str r6, [r4, #56] ; 0x38
894: e10f6000 mrs r6, CPSR
898: f10c0080 cpsid i
89c: e5943078 ldr r3, [r4, #120] ; 0x78
8a0: e1a00004 mov r0, r4
8a4: e12fff33 blx r3
8a8: e121f006 msr CPSR_c, r6 <----------corresponding statement local_irq_restore(flags);
8ac: e1a00005 mov r0, r5 <-------------__usb_hcd_giveback_urb+0x60
8b0: ebfffffe bl 0 <usb_anchor_resume_wakeups>
8b4: e2843008 add r3, r4, #8
8b8: f593f000 pldw [r3]
8bc: e1932f9f ldrex r2, [r3]
8c0: e2422001 sub r2, r2, #1
8c4: e1831f92 strex r1, r2, [r3]
8c8: e3310000 teq r1, #0
8cc: 1afffffa bne 8bc <__usb_hcd_giveback_urb+0x70>
8d0: e594300c ldr r3, [r4, #12]
8d4: e3530000 cmp r3, #0
8d8: 1a00000c bne 910 <__usb_hcd_giveback_urb+0xc4>
8dc: e1a00004 mov r0, r4
8e0: e8bd4070 pop {r4, r5, r6, lr}
8e4: eafffffe b 0 <usb_free_urb>
8e8: e5941058 ldr r1, [r4, #88] ; 0x58
8ec: e5942054 ldr r2, [r4, #84] ; 0x54
8f0: e1510002 cmp r1, r2
8f4: 23a03000 movcs r3, #0
8f8: 33a03001 movcc r3, #1
8fc: e3560000 cmp r6, #0
900: 13a03000 movne r3, #0
904: e3530000 cmp r3, #0
908: 13e06078 mvnne r6, #120 ; 0x78
90c: eaffffd9 b 878 <__usb_hcd_giveback_urb+0x2c>
910: e3000000 movw r0, #0
914: e3a03000 mov r3, #0
918: e3400000 movt r0, #0
91c: e3a02001 mov r2, #1
920: e3a01003 mov r1, #3
924: ebfffffe bl 0 <__wake_up>
928: e1a00004 mov r0, r4
92c: e8bd4070 pop {r4, r5, r6, lr}
930: eafffffe b 0 <usb_free_urb>

Because perf is based on interrupts, it takes a long time to turn off the interrupt.
After the interrupt is turned on, the perf event is triggered.

After commenting out the local_irq_save() and local_irq_restore() here, it is still stuck.
The perf flame graph shows that the function that mainly eats the cpu is mmiocpy()


sample call stack:
swapper 0 [000] 938.366296: 1642083 cycles:ppp:
c070ad4c mmiocpy+0x4c ([kernel.kallsyms])
bf025558 uvc_video_decode_data.constprop.5+0x38 (/lib/modules/4.19.173/kernel/drivers/media/usb/uvc/uvcvideo.ko)
bf025610 uvc_video_decode_isoc+0x6c (/lib/modules/4.19.173/kernel/drivers/media/usb/uvc/uvcvideo.ko)
bf0242a0 uvc_video_complete+0xb0 (/lib/modules/4.19.173/kernel/drivers/media/usb/uvc/uvcvideo.ko)
c04d35b0 __usb_hcd_giveback_urb+0x54 ([kernel.kallsyms])
c04d36e4 usb_giveback_urb_bh+0xac ([kernel.kallsyms])
c0122070 tasklet_action_common.constprop.3+0x64 ([kernel.kallsyms])
c01021fc __softirqentry_text_start+0x114 ([kernel.kallsyms])
c0122400 irq_exit+0xcc ([kernel.kallsyms])
c0162634 __handle_domain_irq+0x60 ([kernel.kallsyms])
c0390a34 gic_handle_irq+0x4c ([kernel.kallsyms])
c0101a0c __irq_svc+0x6c ([kernel.kallsyms])
c01087e8 arch_cpu_idle+0x38 ([kernel.kallsyms])
c0145d78 do_idle+0xe4 ([kernel.kallsyms])
c0146070 cpu_startup_entry+0x18 ([kernel.kallsyms])
c0a00cb8 start_kernel+0x3dc ([kernel.kallsyms])

count statistics:

42 mmiocpy+0x31c
42 mmiocpy+0xf0
210 mmiocpy+0x2b8
229 mmiocpy+0x208
235 mmiocpy+0x2ac
247 mmiocpy+0x1fc
271 mmiocpy+0x158
277 mmiocpy+0x14c
380 mmiocpy+0x4c

The last few passes are the stuck points caused by ldm r1!, {...} loading registers from memory.
This shows that the cpu core and the memory controller are not compatible with each other.

mmiocpy == memcpy, disassembly:

arch/arm/lib/memcpy.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <memcpy>:
0: e92d4011 push {r0, r4, lr}
4: e2522004 subs r2, r2, #4
8: ba00002b blt bc <memcpy+0xbc>
c: e210c003 ands ip, r0, #3
10: f5d1f000 pld [r1]
14: 1a000030 bne dc <memcpy+0xdc>
18: e211c003 ands ip, r1, #3
1c: 1a00003a bne 10c <memcpy+0x10c>
20: e252201c subs r2, r2, #28
24: e92d01e0 push {r5, r6, r7, r8}
28: ba00000c blt 60 <memcpy+0x60>
2c: f5d1f000 pld [r1]
30: e2522060 subs r2, r2, #96 ; 0x60
34: f5d1f01c pld [r1, #28]
38: ba000002 blt 48 <memcpy+0x48>
3c: f5d1f03c pld [r1, #60] ; 0x3c
40: f5d1f05c pld [r1, #92] ; 0x5c
44: f5d1f07c pld [r1, #124] ; 0x7c
48: e8b151f8 ldm r1!, {r3, r4, r5, r6, r7, r8, ip, lr} <------------------Load the register list from [r1]
4c: e2522020 subs r2, r2, #32
50: e8a051f8 stmia r0!, {r3, r4, r5, r6, r7, r8, ip, lr}
54: aafffffa bge 44 <memcpy+0x44>
58: e3720060 cmn r2, #96 ; 0x60
5c: aafffff9 bge 48 <memcpy+0x48>
60: e212c01c ands ip, r2, #28
64: e26cc020 rsb ip, ip, #32
68: 108ff00c addne pc, pc, ip
6c: ea000011 b b8 <memcpy+0xb8>
70: e320f000 nop {0}
74: e4913004 ldr r3, [r1], #4
78: e4914004 ldr r4, [r1], #4
7c: e4915004 ldr r5, [r1], #4
80: e4916004 ldr r6, [r1], #4
84: e4917004 ldr r7, [r1], #4
88: e4918004 ldr r8, [r1], #4
8c: e491e004 ldr lr, [r1], #4
90: e08ff00c add pc, pc, ip
94: e320f000 nop {0}
98: e320f000 nop {0}
9c: e4803004 str r3, [r0], #4
a0: e4804004 str r4, [r0], #4
a4: e4805004 str r5, [r0], #4
a8: e4806004 str r6, [r0], #4
ac: e4807004 str r7, [r0], #4
b0: e4808004 str r8, [r0], #4
b4: e480e004 str lr, [r0], #4
b8: e8bd01e0 pop {r5, r6, r7, r8}
bc: e1b02f82 lsls r2, r2, #31
c0: 14d13001 ldrbne r3, [r1], #1
c4: 24d14001 ldrbcs r4, [r1], #1
c8: 24d1c001 ldrbcs ip, [r1], #1
cc: 14c03001 strbne r3, [r0], #1
d0: 24c04001 strbcs r4, [r0], #1
d4: 24c0c001 strbcs ip, [r0], #1
d8: e8bd8011 pop {r0, r4, pc}
dc: e26cc004 rsb ip, ip, #4
e0: e35c0002 cmp ip, #2
e4: c4d13001 ldrbgt r3, [r1], #1
e8: a4d14001 ldrbge r4, [r1], #1
ec: e4d1e001 ldrb lr, [r1], #1
f0: c4c03001 strbgt r3, [r0], #1
f4: a4c04001 strbge r4, [r0], #1
f8: e052200c subs r2, r2, ip
fc: e4c0e001 strb lr, [r0], #1
100: baffffed blt bc <memcpy+0xbc>
104: e211c003 ands ip, r1, #3
108: 0affffc4 beq 20 <memcpy+0x20>
10c: e3c11003 bic r1, r1, #3
110: e35c0002 cmp ip, #2
114: e491e004 ldr lr, [r1], #4
118: 0a00002c beq 1d0 <memcpy+0x1d0>
11c: ca000057 bgt 280 <memcpy+0x280>
120: e252201c subs r2, r2, #28
124: ba00001f blt 1a8 <memcpy+0x1a8>
128: e92d03e0 push {r5, r6, r7, r8, r9}
12c: f5d1f000 pld [r1]
130: e2522060 subs r2, r2, #96 ; 0x60
134: f5d1f01c pld [r1, #28]
138: ba000002 blt 148 <memcpy+0x148>
13c: f5d1f03c pld [r1, #60] ; 0x3c
140: f5d1f05c pld [r1, #92] ; 0x5c
144: f5d1f07c pld [r1, #124] ; 0x7c
148: e8b100f0 ldm r1!, {r4, r5, r6, r7} <------------------Load the register list from [r1]
14c: e1a0342e lsr r3, lr, #8
150: e2522020 subs r2, r2, #32
154: e8b15300 ldm r1!, {r8, r9, ip, lr} <------------------Load the register list from [r1]
158: e1833c04 orr r3, r3, r4, lsl #24
15c: e1a04424 lsr r4, r4, #8
160: e1844c05 orr r4, r4, r5, lsl #24
164: e1a05425 lsr r5, r5, #8
168: e1855c06 orr r5, r5, r6, lsl #24
16c: e1a06426 lsr r6, r6, #8
170: e1866c07 orr r6, r6, r7, lsl #24
174: e1a07427 lsr r7, r7, #8
178: e1877c08 orr r7, r7, r8, lsl #24
17c: e1a08428 lsr r8, r8, #8
180: e1888c09 orr r8, r8, r9, lsl #24
184: e1a09429 lsr r9, r9, #8
188: e1899c0c orr r9, r9, ip, lsl #24
18c: e1a0c42c lsr ip, ip, #8
190: e18ccc0e orr ip, ip, lr, lsl #24
194: e8a013f8 stmia r0!, {r3, r4, r5, r6, r7, r8, r9, ip}
198: aaffffe9 bge 144 <memcpy+0x144>
19c: e3720060 cmn r2, #96 ; 0x60
1a0: aaffffe8 bge 148 <memcpy+0x148>
1a4: e8bd03e0 pop {r5, r6, r7, r8, r9}
1a8: e212c01c ands ip, r2, #28
1ac: 0a000005 beq 1c8 <memcpy+0x1c8>
1b0: e1a0342e lsr r3, lr, #8
1b4: e491e004 ldr lr, [r1], #4
1b8: e25cc004 subs ip, ip, #4
1bc: e1833c0e orr r3, r3, lr, lsl #24
1c0: e4803004 str r3, [r0], #4
1c4: cafffff9 bgt 1b0 <memcpy+0x1b0>
1c8: e2411003 sub r1, r1, #3
1cc: eaffffba b bc <memcpy+0xbc>
1d0: e252201c subs r2, r2, #28
1d4: ba00001f blt 258 <memcpy+0x258>
1d8: e92d03e0 push {r5, r6, r7, r8, r9}
1dc: f5d1f000 pld [r1]
1e0: e2522060 subs r2, r2, #96 ; 0x60
1e4: f5d1f01c pld [r1, #28]
1e8: ba000002 blt 1f8 <memcpy+0x1f8>
1ec: f5d1f03c pld [r1, #60] ; 0x3c
1f0: f5d1f05c pld [r1, #92] ; 0x5c
1f4: f5d1f07c pld [r1, #124] ; 0x7c
1f8: e8b100f0 ldm r1!, {r4, r5, r6, r7} <------------------Load the register list from [r1]
1fc: e1a0382e lsr r3, lr, #16
200: e2522020 subs r2, r2, #32
204: e8b15300 ldm r1!, {r8, r9, ip, lr}
208: e1833804 orr r3, r3, r4, lsl #16
20c: e1a04824 lsr r4, r4, #16
210: e1844805 orr r4, r4, r5, lsl #16
214: e1a05825 lsr r5, r5, #16
218: e1855806 orr r5, r5, r6, lsl #16
21c: e1a06826 lsr r6, r6, #16
220: e1866807 orr r6, r6, r7, lsl #16
224: e1a07827 lsr r7, r7, #16
228: e1877808 orr r7, r7, r8, lsl #16
22c: e1a08828 lsr r8, r8, #16
230: e1888809 orr r8, r8, r9, lsl #16
234: e1a09829 lsr r9, r9, #16
238: e189980c orr r9, r9, ip, lsl #16
23c: e1a0c82c lsr ip, ip, #16
240: e18cc80e orr ip, ip, lr, lsl #16
244: e8a013f8 stmia r0!, {r3, r4, r5, r6, r7, r8, r9, ip}
248: aaffffe9 bge 1f4 <memcpy+0x1f4>
24c: e3720060 cmn r2, #96 ; 0x60
250: aaffffe8 bge 1f8 <memcpy+0x1f8>
254: e8bd03e0 pop {r5, r6, r7, r8, r9}
258: e212c01c ands ip, r2, #28
25c: 0a000005 beq 278 <memcpy+0x278>
260: e1a0382e lsr r3, lr, #16
264: e491e004 ldr lr, [r1], #4
268: e25cc004 subs ip, ip, #4
26c: e183380e orr r3, r3, lr, lsl #16
270: e4803004 str r3, [r0], #4
274: cafffff9 bgt 260 <memcpy+0x260>
278: e2411002 sub r1, r1, #2
27c: eaffff8e b bc <memcpy+0xbc>
280: e252201c subs r2, r2, #28
284: ba00001f blt 308 <memcpy+0x308>
288: e92d03e0 push {r5, r6, r7, r8, r9}
28c: f5d1f000 pld [r1]
290: e2522060 subs r2, r2, #96 ; 0x60
294: f5d1f01c pld [r1, #28]
298: ba000002 blt 2a8 <memcpy+0x2a8>
29c: f5d1f03c pld [r1, #60] ; 0x3c
2a0: f5d1f05c pld [r1, #92] ; 0x5c
2a4: f5d1f07c pld [r1, #124] ; 0x7c
2a8: e8b100f0 ldm r1!, {r4, r5, r6, r7}
2ac: e1a03c2e lsr r3, lr, #24
2b0: e2522020 subs r2, r2, #32
2b4: e8b15300 ldm r1!, {r8, r9, ip, lr}
2b8: e1833404 orr r3, r3, r4, lsl #8
2bc: e1a04c24 lsr r4, r4, #24
2c0: e1844405 orr r4, r4, r5, lsl #8
2c4: e1a05c25 lsr r5, r5, #24
2c8: e1855406 orr r5, r5, r6, lsl #8
2cc: e1a06c26 lsr r6, r6, #24
2d0: e1866407 orr r6, r6, r7, lsl #8
2d4: e1a07c27 lsr r7, r7, #24
2d8: e1877408 orr r7, r7, r8, lsl #8
2dc: e1a08c28 lsr r8, r8, #24
2e0: e1888409 orr r8, r8, r9, lsl #8
2e4: e1a09c29 lsr r9, r9, #24
2e8: e189940c orr r9, r9, ip, lsl #8
2ec: e1a0cc2c lsr ip, ip, #24
2f0: e18cc40e orr ip, ip, lr, lsl #8
2f4: e8a013f8 stmia r0!, {r3, r4, r5, r6, r7, r8, r9, ip}
2f8: aaffffe9 bge 2a4 <memcpy+0x2a4>
2fc: e3720060 cmn r2, #96 ; 0x60
300: aaffffe8 bge 2a8 <memcpy+0x2a8>
304: e8bd03e0 pop {r5, r6, r7, r8, r9}
308: e212c01c ands ip, r2, #28
30c: 0a000005 beq 328 <memcpy+0x328>
310: e1a03c2e lsr r3, lr, #24
314: e491e004 ldr lr, [r1], #4
318: e25cc004 subs ip, ip, #4
31c: e183340e orr r3, r3, lr, lsl #8
320: e4803004 str r3, [r0], #4
324: cafffff9 bgt 310 <memcpy+0x310>
328: e2411001 sub r1, r1, #1
32c: eaffff62 b bc <memcpy+0xbc>

Reply all
Reply to author
Forward
0 new messages