I'll try to be brief.
Symptom: I get a very hard to reproduce random crash when using gstreamer video player in kivy on Windows. I have a large kivy program that occasionally crashes as soon as video playback in it begins. It happens totally randomly BUT multiple machines reproduce it, and the crash always occurs at the same module and offset. The crash only occurs in a pyinstaller-created package.
I use kivy built from source, gstreamer dependency 0.1.12, python 3.4, windows.
Where I'm at right now:
I'm currently knee-deep in multiple WinDbgs, and with huge effort have traced back where the crash occurs, but I need some help, as I'm not well-versed in this deep-level debugging. gdb is impossible to attach to the process, it trips over some internal error and asks to send a bug report. So I took the WinDbg route, and attached that.
The crash is in libgstlibav.dll at offset 0x1582. It crashes immediately as soon as gstreamer tries to load libgstlibav.dll plugin, the GST_DEBUG logs I did earlier confirms that.
Using the attached WinDbg I listed the disassembly at the address ("uf libgstlibav+0x1582" command), and then with gdb I loaded up libgstlibav.dll without running anything, and after figuring out the base address in gdb, I managed to match up the WinDbg address with gdb, and the function that crashes is: plugin_init()
In gdb doing "disas 0x65780000+0x1582" gives:
(gdb) disas 0x65780000+0x1582
Dump of assembler code for function plugin_init:
0x0000000065781550 <+0>: push %rsi
0x0000000065781551 <+1>: push %rbx
0x0000000065781552 <+2>: sub $0x68,%rsp
0x0000000065781556 <+6>: cmpq $0x0,0xeaaaba(%rip) # 0x6662c018 <ffmpeg_debug>
0x000000006578155e <+14>: mov %rcx,%rbx
0x0000000065781561 <+17>: je 0x65781693 <plugin_init+323>
0x0000000065781567 <+23>: lea 0xbcdaa8(%rip),%rcx # 0x6634f016
0x000000006578156e <+30>: callq 0x6624f3c8 <_gst_debug_get_category>
0x0000000065781573 <+35>: mov %rax,0xeaaa96(%rip) # 0x6662c010 <CAT_PERFORMANCE>
0x000000006578157a <+42>: callq 0x66220e70 <avutil_version>
0x000000006578157f <+47>: movzbl %al,%esi
0x0000000065781582 <+50>: cmpl $0x4,0x1601717(%rip) # 0x66d82ca0 <__imp__gst_debug_min>
0x0000000065781589 <+57>: jbe 0x657815d9 <plugin_init+137>I highlighted the crash line. It basically does a loglevel check. Using gdb to find the source line:
(gdb) info line *0x65780000+0x1582
Line 51 of "gstav.c" starts at address 0x6578157f <plugin_init+47> and ends at 0x657815d9 <plugin_init+137>.
So the crash is here:
https://github.com/GStreamer/gst-libav/blob/ae3a80eec7129bc9f6d812ecfbe857ccd5b6c74f/ext/libav/gstav.c#L51This seems good so far, now comes the fun part!
I could attach WinDbg both to a crashing AND a correctly running instance of the program, and there is a difference that I cannot explain.
Using WinDbg, here are the two dumps of the crashing offset 0x1582:
Good instance:
libgstlibav+0x1567:
00000000`75711567 488d0da8dabc00 lea rcx,[libgstlibav!gst_plugin_desc+0x8c016 (00000000`762df016)]
00000000`7571156e e855deac00 call libgstlibav+0xacf3c8 (00000000`761df3c8)
00000000`75711573 48890596aaea00 mov qword ptr [libgstlibav!gst_plugin_desc+0x369010 (00000000`765bc010)],rax
00000000`7571157a e8f1f8a900 call libgstlibav+0xaa0e70 (00000000`761b0e70)
00000000`7571157f 0fb6f0 movzx esi,al
00000000`75711582 833d178076d504 cmp dword ptr [libgstreamer_1_0_0!gst_debug_min (00000000`4ae795a0)],4
00000000`75711589 764e jbe libgstlibav+0x15d9 (00000000`757115d9) Branch
Crashing instance:
libgstlibav+0x1567:
00000000`e2b11567 488d0da8dabc00 lea rcx,[libgstlibav!gst_plugin_desc+0x8c016 (00000000`e36df016)]
00000000`e2b1156e e855deac00 call libgstlibav+0xacf3c8 (00000000`e35df3c8)
00000000`e2b11573 48890596aaea00 mov qword ptr [libgstlibav!gst_plugin_desc+0x369010 (00000000`e39bc010)],rax
00000000`e2b1157a e8f1f8a900 call libgstlibav+0xaa0e70 (00000000`e35b0e70)
00000000`e2b1157f 0fb6f0 movzx esi,al
00000000`e2b11582 833d17801f6804 cmp dword ptr [00000001`4ad095a0],4
00000000`e2b11589 764e jbe libgstlibav+0x15d9 (00000000`e2b115d9) Branch
See the difference? The crashing variant address is bogus, the high dword part of the address is 00000001 instead of 00000000!
HOW IS THIS POSSIBLE?
Doing "ln 0x4ad095a0" in the crashing instance shows:
(00000000`4ad095a0) libgstreamer_1_0_0!gst_debug_min | (00000000`4ad095a4) libgstreamer_1_0_0!gst_debug_enabled
Exact matches:
libgstreamer_1_0_0!gst_debug_min (<no parameter info>)
So indeed, if the high dword would be zero, it would correctly reference gst_debug_min, just as in the Good case.
This is where I need help!
I've never really done this sort of debugging, and I don't want to close any program right now, as I'm not sure if I can reproduce the crash again.
Basically, how is it possible that an imported address reference ends up bogus (how does 00000000 turn into 00000001)?
And it's not a hardware issue (well, at least not on one specific machine), as multiple machines reproduce the crash at the exact same address.
If this helps, the crash only occurs using the Intel grapichs driver, I never have the crash with nvidia driver (this is a switchable graphics laptop, with both Intel and Nvidia graphics). The crash occurs on Intel-only machines too.
So if any wizards are out there, I'd be very grateful for heir help! :)
Balázs