Windows Emacs (built from r110828) was idle for a few minutes while I made a
coffee. On returning, emacs was completely unresponsive and redisplay
was not drawing anything.
Windows XP SP3
MinGW gcc 4.7.2
Backtrace from "thread apply all bt full":
Thread 6 (Thread 8744.0x2080):
#0 0x7c90120f in ntdll!DbgUiConnectToDbg () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7c952119 in ntdll!KiIntSystemCall () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#2 0x00000005 in ?? ()
No symbol table info available.
#3 0x00000004 in ?? ()
No symbol table info available.
#4 0x00000001 in ?? ()
No symbol table info available.
#5 0x5ad3ffd0 in ?? ()
No symbol table info available.
#6 0x00000000 in ?? ()
No symbol table info available.
Thread 5 (Thread 8744.0x5ec):
#0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7c90d9da in ntdll!ZwReadFile () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#2 0x7c801879 in ReadFile () from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#3 0x00000610 in ?? ()
No symbol table info available.
#4 0x00000000 in ?? ()
No symbol table info available.
Thread 4 (Thread 8744.0x241c):
#0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7c90df5a in ntdll!ZwWaitForSingleObject () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#2 0x7c8025db in WaitForSingleObjectEx () from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#3 0x000005d0 in ?? ()
No symbol table info available.
#4 0x00000000 in ?? ()
No symbol table info available.
Thread 3 (Thread 8744.0x21d4):
#0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7e4191be in USER32!GetProcessWindowStation () from C:\WINDOWS\system32\user32.dll
No symbol table info available.
#2 0x7e4191f1 in USER32!GetMessageW () from C:\WINDOWS\system32\user32.dll
No symbol table info available.
#3 0x0114875d in w32_msg_pump (msg_buf=0x5b5aff54) at w32fns.c:2386
msg = {
hwnd = 0x3f9500bc,
message = 0xf,
wParam = 0x0,
lParam = 0x0,
time = 0x6cc97f02,
pt = {
x = 0xdd,
y = 0x2f
}
}
result = 0x0
focus_window = 0x403
#4 0x0114899b in w32_msg_worker@4 (arg=0x0) at w32fns.c:2612
msg = {
hwnd = 0x86af0b20,
message = 0x80502fc0,
wParam = 0x0,
lParam = 0x0,
time = 0x0,
pt = {
x = 0x80502fc8,
y = 0x8dedcca0
}
}
dummy_buf = {
next = 0x0,
w32msg = {
msg = {
hwnd = 0x0,
message = 0x0,
wParam = 0x0,
lParam = 0x0,
time = 0x0,
pt = {
x = 0x0,
y = 0x0
}
},
dwModifiers = 0x0,
rect = {
left = 0x0,
top = 0x0,
right = 0x0,
bottom = 0x0
}
},
result = 0x0,
completed = 0x0
}
#5 0x7c80b729 in KERNEL32!GetModuleFileNameA () from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#6 0x00000000 in ?? ()
No symbol table info available.
Thread 2 (Thread 8744.0x1840):
#0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7c90d21a in ntdll!ZwDelayExecution () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#2 0x7c8023f1 in SleepEx () from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#3 0x00000000 in ?? ()
No symbol table info available.
Thread 1 (Thread 8744.0xf50):
#0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7c90d2aa in ntdll!ZwDuplicateObject () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#2 0x7c80df03 in KERNEL32!DuplicateHandle () from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#3 0xffffffff in ?? ()
No symbol table info available.
#4 0xfffffffe in ?? ()
No symbol table info available.
#5 0xffffffff in ?? ()
No symbol table info available.
#6 0x0162acb8 in real_itimer ()
No symbol table info available.
#7 0x00000000 in ?? ()
No symbol table info available.
> Date: Thu, 08 Nov 2012 12:57:18 +0000
> From: Andy Moreton <andrewjmore...@gmail.com>
> Windows Emacs (built from r110828) was idle for a few minutes while I made a
> coffee. On returning, emacs was completely unresponsive and redisplay
> was not drawing anything.
If this recurs, maybe try bisecting to find the problematic commit.
Do you know what was the previous revision which you used?
Also, how was Emacs unresponsive -- did it consume any CPU cycles at
all? Were all the threads locked up, or just some? If you detach
from it, then attach again, do you see exactly the same backtrace?
> #3 0x0114875d in w32_msg_pump (msg_buf=0x5b5aff54) at w32fns.c:2386
> msg = {
> hwnd = 0x3f9500bc,
> message = 0xf,
> wParam = 0x0,
> lParam = 0x0,
> time = 0x6cc97f02,
> pt = {
> x = 0xdd,
> y = 0x2f
> }
> }
> result = 0x0
> focus_window = 0x403
This looks like the input thread is waiting for GetMessageW to return,
which is normal -- it means there's no input, and we are waiting for
it.
> Thread 1 (Thread 8744.0xf50):
> #0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
> No symbol table info available.
> #1 0x7c90d2aa in ntdll!ZwDuplicateObject () from C:\WINDOWS\system32\ntdll.dll
> No symbol table info available.
> #2 0x7c80df03 in KERNEL32!DuplicateHandle () from > C:\WINDOWS\system32\kernel32.dll
> No symbol table info available.
> #3 0xffffffff in ?? ()
> No symbol table info available.
> #4 0xfffffffe in ?? ()
> No symbol table info available.
> #5 0xffffffff in ?? ()
> No symbol table info available.
> #6 0x0162acb8 in real_itimer ()
> No symbol table info available.
> #7 0x00000000 in ?? ()
> No symbol table info available.
This is the main thread, but its backtrace looks like the stack is
smashed, and real_itimer is a variable, not a function.
>> Date: Thu, 08 Nov 2012 12:57:18 +0000
>> From: Andy Moreton <andrewjmore...@gmail.com>
>> Windows Emacs (built from r110828) was idle for a few minutes while I made a
>> coffee. On returning, emacs was completely unresponsive and redisplay
>> was not drawing anything.
> If this recurs, maybe try bisecting to find the problematic commit.
> Do you know what was the previous revision which you used?
I rebuild emacs every day from trunk, but only do a full bootstrap when necessary. I have updated the Mingw compiler this week though, so that could be an issue. I'll try bisecting (and downgrading the compiler) if I see this again.
> Also, how was Emacs unresponsive -- did it consume any CPU cycles at
> all? Were all the threads locked up, or just some? If you detach
> from it, then attach again, do you see exactly the same backtrace?
Emacs was not consuming any cycles - the system was completely idle.
>> Thread 1 (Thread 8744.0xf50):
>> #0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
>> No symbol table info available.
>> #1 0x7c90d2aa in ntdll!ZwDuplicateObject () from C:\WINDOWS\system32\ntdll.dll
>> No symbol table info available.
>> #2 0x7c80df03 in KERNEL32!DuplicateHandle () from
>> C:\WINDOWS\system32\kernel32.dll
>> No symbol table info available.
>> #3 0xffffffff in ?? ()
>> No symbol table info available.
>> #4 0xfffffffe in ?? ()
>> No symbol table info available.
>> #5 0xffffffff in ?? ()
>> No symbol table info available.
>> #6 0x0162acb8 in real_itimer ()
>> No symbol table info available.
>> #7 0x00000000 in ?? ()
>> No symbol table info available.
> This is the main thread, but its backtrace looks like the stack is
> smashed, and real_itimer is a variable, not a function.
> We need more data points on this.
I'll see if this is reproduceable and try to get more info.
> Date: Thu, 08 Nov 2012 18:33:11 +0000
> From: Andy Moreton <andrewjmore...@gmail.com>
> CC: 12...@debbugs.gnu.org
> I rebuild emacs every day from trunk, but only do a full bootstrap when > necessary. I have updated the Mingw compiler this week though, so that could > be an issue.
Was the build optimized? (I'm guessing not, but I want to be sure.)
> I'll try bisecting (and downgrading the compiler) if I see this > again.
Thanks.
> > Also, how was Emacs unresponsive -- did it consume any CPU cycles at
> > all? Were all the threads locked up, or just some? If you detach
> > from it, then attach again, do you see exactly the same backtrace?
> Emacs was not consuming any cycles - the system was completely idle.
OK. Any idea why you had so many threads? Normally, Emacs 24.3.50
should have only 3: the main thread, the input thread, and a thread
that runs atimers (Emacs arranges for a timer to fire every 2 seconds,
to check whether any new input has arrived.) Yet another thread, the
4th one, is automatically started by the OS when you attach a debugger
to Emacs, and this is it:
Thread 6 (Thread 8744.0x2080):
#0 0x7c90120f in ntdll!DbgUiConnectToDbg () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7c952119 in ntdll!KiIntSystemCall () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
But what are the other 2 threads you have, namely:
Thread 5 (Thread 8744.0x5ec):
#0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7c90d9da in ntdll!ZwReadFile () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#2 0x7c801879 in ReadFile () from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#3 0x00000610 in ?? ()
No symbol table info available.
#4 0x00000000 in ?? ()
No symbol table info available.
Thread 4 (Thread 8744.0x241c):
#0 0x7c90e514 in ntdll!LdrAccessResource () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#1 0x7c90df5a in ntdll!ZwWaitForSingleObject () from C:\WINDOWS\system32\ntdll.dll
No symbol table info available.
#2 0x7c8025db in WaitForSingleObjectEx () from C:\WINDOWS\system32\kernel32.dll
No symbol table info available.
#3 0x000005d0 in ?? ()
No symbol table info available.
#4 0x00000000 in ?? ()
No symbol table info available.
One of them appears to be reading something, the other is waiting for
some event. Did you have some subprocess running or some network
connection active at that time? Or maybe your routine operation has
some subprocesses (a speller, perhaps?) and/or network connections
active?
> > We need more data points on this.
> I'll see if this is reproduceable and try to get more info.
>> Date: Thu, 08 Nov 2012 18:33:11 +0000
>> From: Andy Moreton <andrewjmore...@gmail.com>
>> CC: 12...@debbugs.gnu.org
>> I rebuild emacs every day from trunk, but only do a full bootstrap when
>> necessary. I have updated the Mingw compiler this week though, so that could
>> be an issue.
> Was the build optimized? (I'm guessing not, but I want to be sure.)
system-configuration-options includes --no-opt, so unoptimized.
>> Emacs was not consuming any cycles - the system was completely idle.
> OK. Any idea why you had so many threads? Normally, Emacs 24.3.50
> should have only 3: the main thread, the input thread, and a thread
> that runs atimers (Emacs arranges for a timer to fire every 2 seconds,
> to check whether any new input has arrived.) Yet another thread, the
> 4th one, is automatically started by the OS when you attach a debugger
> to Emacs, and this is it:
> Thread 6 (Thread 8744.0x2080):
> #0 0x7c90120f in ntdll!DbgUiConnectToDbg () from C:\WINDOWS\system32\ntdll.dll
> No symbol table info available.
> #1 0x7c952119 in ntdll!KiIntSystemCall () from C:\WINDOWS\system32\ntdll.dll
> No symbol table info available.
> But what are the other 2 threads you have, namely:
[snipped]
> One of them appears to be reading something, the other is waiting for
> some event. Did you have some subprocess running or some network
> connection active at that time? Or maybe your routine operation has
> some subprocesses (a speller, perhaps?) and/or network connections
> active?
I have some files open via tramp (pscp method) on another machine. This uses putty for ssh (plink.exe) and scp (pscp.exe) for file tranfers. At the time of the lockup, emacs had a CMD.EXE subprocess running plink.exe.
> Date: Fri, 09 Nov 2012 10:48:31 +0000
> From: Andy Moreton <andrewjmore...@gmail.com>
> CC: 12...@debbugs.gnu.org
> I have some files open via tramp (pscp method) on another machine. This uses > putty for ssh (plink.exe) and scp (pscp.exe) for file tranfers. At the time of > the lockup, emacs had a CMD.EXE subprocess running plink.exe.
Thanks. I wanted to know this to try to reproduce the lockup here. I
will try that as soon as the problem with crashes on Windows is
resolved.
> Date: Fri, 09 Nov 2012 13:14:45 +0200
> From: Eli Zaretskii <e...@gnu.org>
> Cc: 12...@debbugs.gnu.org
> > Date: Fri, 09 Nov 2012 10:48:31 +0000
> > From: Andy Moreton <andrewjmore...@gmail.com>
> > CC: 12...@debbugs.gnu.org
> > I have some files open via tramp (pscp method) on another machine. This uses > > putty for ssh (plink.exe) and scp (pscp.exe) for file tranfers. At the time of > > the lockup, emacs had a CMD.EXE subprocess running plink.exe.
> Thanks. I wanted to know this to try to reproduce the lockup here. I
> will try that as soon as the problem with crashes on Windows is
> resolved.
The latest trunk is up and running for more than 3 hours with no
problems. I have a shell buffer running plink and a remote file
fetched with pscp. I also have display-time active.
Any other features you have on that could trigger periodic processing
of any kind?
>> Date: Fri, 09 Nov 2012 13:14:45 +0200
>> From: Eli Zaretskii <e...@gnu.org>
>> Cc: 12...@debbugs.gnu.org
>> > Date: Fri, 09 Nov 2012 10:48:31 +0000
>> > From: Andy Moreton <andrewjmore...@gmail.com>
>> > CC: 12...@debbugs.gnu.org
>> > I have some files open via tramp (pscp method) on another machine. This uses
>> > putty for ssh (plink.exe) and scp (pscp.exe) for file tranfers. At the time of
>> > the lockup, emacs had a CMD.EXE subprocess running plink.exe.
>> Thanks. I wanted to know this to try to reproduce the lockup here. I
>> will try that as soon as the problem with crashes on Windows is
>> resolved.
> The latest trunk is up and running for more than 3 hours with no
> problems. I have a shell buffer running plink and a remote file
> fetched with pscp. I also have display-time active.
I have seen this once again, after doing a full bootstrap from tip this morning to ensure I wasn't seeing artifacts of a broken build. It seems to happen about once a day.
> Any other features you have on that could trigger periodic processing
> of any kind?
Not that I know of. The emacs-server is running, but I not used anything that connects to it, so it should be doing anything. I do use gnus to read news, but quit gnus after I've read some groups (the lockup was several hours later).
The compiler update may be at issue, so I'll try downgrading back to MinGW gcc 4.7.0 and doing a full bootstrap.
> Thread 7 (Thread 6696.0x4b8):
> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #1 0x7c91df5a in ntdll!ZwWaitForSingleObject () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #2 0x7199402b in ?? () from /cygdrive/c/WINDOWS/System32/mswsock.dll
> #3 0x719957c9 in ?? () from /cygdrive/c/WINDOWS/System32/mswsock.dll
> #4 0x719f67de in WSACancelAsyncRequest () from /cygdrive/c/WINDOWS/system32/Ws2_32.dll
> #5 0x0108d925 in _sys_read_ahead (fd=4) at w32.c:6079
> #6 0x01033127 in reader_thread (arg=0x167dc98) at w32proc.c:838
> #7 0x7c80b729 in KERNEL32!GetModuleFileNameA () from /cygdrive/c/WINDOWS/system32/kernel32.dll
> #8 0x00000000 in ?? ()
> Thread 5 (Thread 6696.0x2344):
> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #1 0x7c91df5a in ntdll!ZwWaitForSingleObject () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #2 0x7199402b in ?? () from /cygdrive/c/WINDOWS/System32/mswsock.dll
> #3 0x719957c9 in ?? () from /cygdrive/c/WINDOWS/System32/mswsock.dll
> #4 0x719f67de in WSACancelAsyncRequest () from /cygdrive/c/WINDOWS/system32/Ws2_32.dll
> #5 0x0108d925 in _sys_read_ahead (fd=5) at w32.c:6079
> #6 0x01033127 in reader_thread (arg=0x167dc40) at w32proc.c:838
> #7 0x7c80b729 in KERNEL32!GetModuleFileNameA () from /cygdrive/c/WINDOWS/system32/kernel32.dll
> #8 0x00000000 in ?? ()
> Thread 2 (Thread 6696.0x1788):
> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #1 0x7c91d21a in ntdll!ZwDelayExecution () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #2 0x7c8023f1 in SleepEx () from /cygdrive/c/WINDOWS/system32/kernel32.dll
> #3 0x00000000 in ?? ()
> Thread 1 (Thread 6696.0x4d0):
> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #1 0x7e3eceba in USER32!SetInternalWindowPos () from /cygdrive/c/WINDOWS/system32/USER32.dll
> #2 0x7e3cf408 in USER32!SetMenu () from /cygdrive/c/WINDOWS/system32/USER32.dll
> #3 0x012c6395 in set_frame_menubar (f=0x3926840 <__register_frame_info+59926592>, first_time=false, deep_p=false) at w32menu.c:610
> #4 0x01200075 in update_menu_bar (f=0x3926840 <__register_frame_info+59926592>, save_match_data=0, hooks_run=1) at xdisp.c:11327
> #5 0x011ffa95 in prepare_menu_bars () at xdisp.c:11205
> #6 0x012055fa in redisplay_internal () at xdisp.c:13081
> #7 0x012034a1 in redisplay () at xdisp.c:12653
> #8 0x0103b2a2 in read_char (commandflag=1, nmaps=3, maps=0x82f9b0, prev_event=57358362, used_mouse_menu=0x82fa83, end_time=0x0)
> at keyboard.c:2428
> #9 0x0104eef4 in read_key_sequence (keybuf=0x82fc00, bufsize=30, prompt=57358362, dont_downcase_last=false,
> can_return_switch_frame=true, fix_current_buffer=true) at keyboard.c:9230
> #10 0x010385c4 in command_loop_1 () at keyboard.c:1458
> #11 0x01010e86 in internal_condition_case (bfun=0x10380de <command_loop_1>, handlers=57408946, hfun=0x10378fd <cmd_error>)
> at eval.c:1288
> #12 0x01037d57 in command_loop_2 (ignore=57358362) at keyboard.c:1167
> #13 0x010108e3 in internal_catch (tag=57398802, func=0x1037d33 <command_loop_2>, arg=57358362) at eval.c:1059
> #14 0x01037d11 in command_loop () at keyboard.c:1146
> #15 0x010372cb in recursive_edit_1 () at keyboard.c:778
> #16 0x010375f8 in Frecursive_edit () at keyboard.c:842
> #17 0x01002920 in main (argc=1, argv=0xa44480) at emacs.c:1552
This backtrace is more informative. I'm beginning to think that
there's some deadlock between threads that use a critical section,
because all of the threads are parked at the same interface:
ntdll!LdrAccessResource, and at least one of them waits for a critical
section:
> Thread 3 (Thread 6696.0xc28):
> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #1 0x7c91df5a in ntdll!ZwWaitForSingleObject () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #2 0x7c929b23 in ntdll!RtlpWaitForCriticalSection () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #3 0x7c911046 in ntdll!RtlEnumerateGenericTableLikeADirectory () from /cygdrive/c/WINDOWS/system32/ntdll.dll
> #4 0x006811a0 in ?? ()
> #5 0x012e871e in post_msg (lpmsg=0x5b8cfa94) at w32xfns.c:279
> #6 0x01147b48 in my_post_msg (wmsg=0x5b8cfa94, hwnd=0x2cec0092, msg=0, wParam=103, lParam=2228225) at w32fns.c:1942
Fabrice, what bzr revision did you compile, and with what version of
GCC?
>> Thread 7 (Thread 6696.0x4b8):
>> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #1 0x7c91df5a in ntdll!ZwWaitForSingleObject () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #2 0x7199402b in ?? () from /cygdrive/c/WINDOWS/System32/mswsock.dll
>> #3 0x719957c9 in ?? () from /cygdrive/c/WINDOWS/System32/mswsock.dll
>> #4 0x719f67de in WSACancelAsyncRequest () from /cygdrive/c/WINDOWS/system32/Ws2_32.dll
>> #5 0x0108d925 in _sys_read_ahead (fd=4) at w32.c:6079
>> #6 0x01033127 in reader_thread (arg=0x167dc98) at w32proc.c:838
>> #7 0x7c80b729 in KERNEL32!GetModuleFileNameA () from /cygdrive/c/WINDOWS/system32/kernel32.dll
>> #8 0x00000000 in ?? ()
>> Thread 5 (Thread 6696.0x2344):
>> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #1 0x7c91df5a in ntdll!ZwWaitForSingleObject () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #2 0x7199402b in ?? () from /cygdrive/c/WINDOWS/System32/mswsock.dll
>> #3 0x719957c9 in ?? () from /cygdrive/c/WINDOWS/System32/mswsock.dll
>> #4 0x719f67de in WSACancelAsyncRequest () from /cygdrive/c/WINDOWS/system32/Ws2_32.dll
>> #5 0x0108d925 in _sys_read_ahead (fd=5) at w32.c:6079
>> #6 0x01033127 in reader_thread (arg=0x167dc40) at w32proc.c:838
>> #7 0x7c80b729 in KERNEL32!GetModuleFileNameA () from /cygdrive/c/WINDOWS/system32/kernel32.dll
>> #8 0x00000000 in ?? ()
>> Thread 2 (Thread 6696.0x1788):
>> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #1 0x7c91d21a in ntdll!ZwDelayExecution () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #2 0x7c8023f1 in SleepEx () from /cygdrive/c/WINDOWS/system32/kernel32.dll
>> #3 0x00000000 in ?? ()
>> Thread 1 (Thread 6696.0x4d0):
>> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #1 0x7e3eceba in USER32!SetInternalWindowPos () from /cygdrive/c/WINDOWS/system32/USER32.dll
>> #2 0x7e3cf408 in USER32!SetMenu () from /cygdrive/c/WINDOWS/system32/USER32.dll
>> #3 0x012c6395 in set_frame_menubar (f=0x3926840
>> <__register_frame_info+59926592>, first_time=false, deep_p=false) at
>> w32menu.c:610
>> #4 0x01200075 in update_menu_bar (f=0x3926840
>> <__register_frame_info+59926592>, save_match_data=0, hooks_run=1) at
>> xdisp.c:11327
>> #5 0x011ffa95 in prepare_menu_bars () at xdisp.c:11205
>> #6 0x012055fa in redisplay_internal () at xdisp.c:13081
>> #7 0x012034a1 in redisplay () at xdisp.c:12653
>> #8 0x0103b2a2 in read_char (commandflag=1, nmaps=3, maps=0x82f9b0,
>> prev_event=57358362, used_mouse_menu=0x82fa83, end_time=0x0)
>> at keyboard.c:2428
>> #9 0x0104eef4 in read_key_sequence (keybuf=0x82fc00, bufsize=30, prompt=57358362, dont_downcase_last=false,
>> can_return_switch_frame=true, fix_current_buffer=true) at keyboard.c:9230
>> #10 0x010385c4 in command_loop_1 () at keyboard.c:1458
>> #11 0x01010e86 in internal_condition_case (bfun=0x10380de <command_loop_1>,
>> handlers=57408946, hfun=0x10378fd <cmd_error>)
>> at eval.c:1288
>> #12 0x01037d57 in command_loop_2 (ignore=57358362) at keyboard.c:1167
>> #13 0x010108e3 in internal_catch (tag=57398802, func=0x1037d33 <command_loop_2>, arg=57358362) at eval.c:1059
>> #14 0x01037d11 in command_loop () at keyboard.c:1146
>> #15 0x010372cb in recursive_edit_1 () at keyboard.c:778
>> #16 0x010375f8 in Frecursive_edit () at keyboard.c:842
>> #17 0x01002920 in main (argc=1, argv=0xa44480) at emacs.c:1552
> This backtrace is more informative. I'm beginning to think that
> there's some deadlock between threads that use a critical section,
> because all of the threads are parked at the same interface:
> ntdll!LdrAccessResource, and at least one of them waits for a critical
> section:
>> Thread 3 (Thread 6696.0xc28):
>> #0 0x7c91e514 in ntdll!LdrAccessResource () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #1 0x7c91df5a in ntdll!ZwWaitForSingleObject () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #2 0x7c929b23 in ntdll!RtlpWaitForCriticalSection () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #3 0x7c911046 in ntdll!RtlEnumerateGenericTableLikeADirectory () from /cygdrive/c/WINDOWS/system32/ntdll.dll
>> #4 0x006811a0 in ?? ()
>> #5 0x012e871e in post_msg (lpmsg=0x5b8cfa94) at w32xfns.c:279
>> #6 0x01147b48 in my_post_msg (wmsg=0x5b8cfa94, hwnd=0x2cec0092, msg=0,
>> wParam=103, lParam=2228225) at w32fns.c:1942
That build was from revno 110618 (2012-10-22), so the GCC version
should be 4.7.0, since MinGW didn't upgrade to 4.7.2 until 2012-11-05.
(My builds thereafter were with GCC 4.7.2)
> I tried suspending and then resuming each thread in turn from Process > Explorer. Resuming thread 212604 unblocked emacs and it started working again.
Was that the only thread whose resumption unlocks Emacs? If so, can
you find out what thread was that? Process Explorer can show that
call-stack, and you should be able to find out what functions were
referenced by using the "info line" command inside GDB. Like this:
(gdb) info line *0x11c3d40
Line 863 of "sysdep.c" starts at address 0x11c3d40 <init_sys_modes+7>
and ends at 0x11c3d4a <init_sys_modes+17>.
(Note the asterisk before the address.)
WrUserRequest seems to indicate that the thread was suspended by the
application itself, which would point the blaming finger at my
implementation of SIGALRM (see w32proc.c), whereby when the timer
expires, the thread which runs the timer code suspends the main
thread, invokes the signal handler, and then resumes the main thread.
If my guess is correct, this would mean that the thread whose state is
WrUserRequest is the main (a.k.a. "Lisp") thread.
Another possibility is that this is the input thread, the one that
calls GetMessage. But then I don't understand why it is blocked
forever until manually resumed. Hmm...
If you attach GDB, do you again see garbled backtrace, like in the
original report? Or do you see something more informative?
>> I tried suspending and then resuming each thread in turn from Process
>> Explorer. Resuming thread 212604 unblocked emacs and it started working again.
> Was that the only thread whose resumption unlocks Emacs? If so, can
> you find out what thread was that? Process Explorer can show that
> call-stack, and you should be able to find out what functions were
I tried to get call stacks from process Explorer, but that made it die :-(
The fatc that thread 236140 is at ThreadStartRoutine makes me wonder if this is related to the perils of DllMain (i.e. the loader lock).
> If you attach GDB, do you again see garbled backtrace, like in the
> original report? Or do you see something more informative?
Sorry, I don't have the original process running any more - it's hard to get anything done with an unresponsive app of the screen. I'll try to dig out more info the next time I get a lockup.
The one thing that does seem completely consistent is that the lockup happens after several minutes of being idle (i.e. no keyboard or mouse input).
>> Date: Tue, 13 Nov 2012 16:00:30 +0000
>> From: Andy Moreton <andrewjmore...@gmail.com>
>> CC: dmonc...@gmail.com, f...@missioncriticalit.com, 12...@debbugs.gnu.org
>> The fact that thread 236140 is at ThreadStartRoutine makes me wonder if this
>> is related to the perils of DllMain (i.e. the loader lock).
> Sorry, I don't follow. Can you say more about this problem, or point
> me to some accessible documentation about it?
The DllMain notifications for process and thread create/destroy are called with the (system internal) loader lock held. This means that anything called from these routines should not use locks, or deadlock is possible. So I was wondering if the thread manipulation for timer handling is interacting with those mechanisms.
Of course I don't know nearly enough about Win32 to actually say much useful here, so the actual problem is probably something else entirely.
Can you please try the patch below and see if it prevents the
lock-ups?
=== modified file 'src/w32proc.c'
--- src/w32proc.c 2012-11-05 03:18:32 +0000
+++ src/w32proc.c 2012-11-13 16:59:53 +0000
@@ -431,13 +431,24 @@ timer_loop (LPVOID arg)
/* Simulate a signal delivered to the thread which installed
the timer, by suspending that thread while the handler
runs. */
- DWORD result = SuspendThread (itimer->caller_thread);
+ DWORD result;
+
+ if (dwMainThreadId)
+ enter_crit ();
+ result = SuspendThread (itimer->caller_thread);
if (result == (DWORD)-1)
- return 2;
+ {
+ if (dwMainThreadId)
+ leave_crit ();
+ return 2;
+ }
> Date: Tue, 13 Nov 2012 16:40:29 +0000
> From: Andy Moreton <andrewjmore...@gmail.com>
> CC: dmonc...@gmail.com, f...@missioncriticalit.com, 12...@debbugs.gnu.org
> On 13/11/2012 16:35, Eli Zaretskii wrote:
> >> Date: Tue, 13 Nov 2012 16:00:30 +0000
> >> From: Andy Moreton <andrewjmore...@gmail.com>
> >> CC: dmonc...@gmail.com, f...@missioncriticalit.com, 12...@debbugs.gnu.org
> >> The fact that thread 236140 is at ThreadStartRoutine makes me wonder if this
> >> is related to the perils of DllMain (i.e. the loader lock).
> > Sorry, I don't follow. Can you say more about this problem, or point
> > me to some accessible documentation about it?
> The DllMain notifications for process and thread create/destroy are called > with the (system internal) loader lock held. This means that anything called > from these routines should not use locks, or deadlock is possible. So I was > wondering if the thread manipulation for timer handling is interacting with > those mechanisms.
Thanks for the explanation.
> Of course I don't know nearly enough about Win32 to actually say much useful > here, so the actual problem is probably something else entirely.
Don't assume I know more than you do ;-)
Anyway, I actually don't understand why some thread is at
ThreadStartRoutine, if that fact really means that a thread is being
created. The timer thread is created during startup, and is not shut
down until Emacs shuts down. And we don't create any other threads in
the middle of a session (unless the user invokes profiler).
So the only way I can understand this ThreadStartRoutine business is
that somehow the timer thread exited due to an error (look for a line
saying "return 2;" in timer_loop), and then the next time Emacs sets
up the 2-sec atimer, a new thread will be started.
So could you please set a breakpoint at line 575 in w32proc.c, which
is this:
/* Start a new thread. */
itimer->terminate = 0;
itimer->type = which; <<<<<<<<<<<<<<<<<<<<<<<<<<<
run Emacs under GDB, and see if this breakpoint breaks more than once
when you run Emacs as usual? It should break one time during startup,
and never thereafter. To make sure you don't disrupt the timers,
define commands for this breakpoint that simply continue, like this:
(gdb) break w32proc.c:575
(gdb) commands
> continue
> end
> I applied this to emacs-24 branch (r110866) this morning. So far I've not seen
> a lockup, but I'll to run it for a day or two to be sure.
After longer uptime, it seems this patch is not successful. I haven't had a complete lockup, but I have seen a couple of glitches where it froze but then recovered a short while later.
The unfreeze may have been due to capturing a stack trace with Process Explorer (I have upgraded to the latest version which is less buggy).
The patched emacs-24 does seem to leak handles: at the moment Process Explorer report that emacs has 50805 handles in all, most of which are thread handles. The number of handles seems to increase at a rate of 2 to 4 per second.
I have no time to build Emacs now and have not had it for a very long
time. However I have seen Emacs freeze a lot of times when it is idle.
This started to get more common when I moved to a 64-bit windows 7
(from windows xp, 32-bit).
I have always thought that this must be bad handling of the messages
from the operating systems, but I have had no chance to pin this down.
Reading here I wonder if data are accessed somewhere in a system time
thread without critical section handling.
>> I applied this to emacs-24 branch (r110866) this morning. So far I've not
>> seen
>> a lockup, but I'll to run it for a day or two to be sure.
> After longer uptime, it seems this patch is not successful. I haven't had a
> complete lockup, but I have seen a couple of glitches where it froze but
> then recovered a short while later.
> The unfreeze may have been due to capturing a stack trace with Process
> Explorer (I have upgraded to the latest version which is less buggy).
> The patched emacs-24 does seem to leak handles: at the moment Process
> Explorer report that emacs has 50805 handles in all, most of which are
> thread handles. The number of handles seems to increase at a rate of 2 to 4
> per second.
> Date: Wed, 14 Nov 2012 16:29:53 +0000
> From: Andy Moreton <andrewjmore...@gmail.com>
> CC: dmonc...@gmail.com, f...@missioncriticalit.com, 12...@debbugs.gnu.org
> After longer uptime, it seems this patch is not successful. I haven't had a > complete lockup, but I have seen a couple of glitches where it froze but then > recovered a short while later.
OK, that was a stab in the dark anyway.
I found a few problems in the code and fixed them in revision 110867
on the emacs-24 branch. Please try the latest branch (without the
patch) and see if the problem persists.
> The patched emacs-24 does seem to leak handles: at the moment Process Explorer > report that emacs has 50805 handles in all, most of which are thread handles. > The number of handles seems to increase at a rate of 2 to 4 per second.
Yes, that's one of the problems I fixed in r110867. Shame on me.
(I think the fact that the handle to the caller thread was constantly
modified could be the reason for the lockup, if the change happened
between SuspendThread and ResumeThread calls.)