Cannot find user-level thread for LWP 4532: generic error
(gdb) backtrace
#0 0x00002b955c467747 in kill () from /lib/libc.so.6
Cannot fetch general-purpose registers for thread 47920578785888:
generic error
(gdb) bt
#0 0x00002b955c467747 in kill () from /lib/libc.so.6
I'm using gcc 4.1 and Matlab 2006a (or 2006b) on a linux amd64
platform. A trivial example mex file (yprimef) works and does not stop
Matlab with a SIGALRM. Others have had similar problems (see link
below); Chis reports offline that his issue remains unresolved using
g95, but does not occur using PGI compilers.
Has anyone seen this issue, or have a guess where I can look to further
troubleshoot? I plan on trying and older version of gcc, and perhaps
gfortran for comparison. I'll try writing a signal handler for the
fortran code and see if I can get any more information.
Bill
update: same code compiled with gfortran does not cause Matlab to
exit.
Bill
Andy
>From my past experiences, it's Matlab's polling that raises a SIGALRM,
the g95 MEX functions seem to abort on this signal.
Here's a Matlab session backtracing the SIGALRM
[chulbert@mellin ~]$ cat test_g95_mex.f90
SUBROUTINE MEXFUNCTION(nlhs,plhs,nrhs,prhs)
INTEGER :: nlhs,nrhs
INTEGER(kind=8) :: plhs(nlhs),prhs(nrhs)
END SUBROUTINE MEXFUNCTION
[chulbert@mellin ~]$ g95 -shared -o test_g95_mex.mexa64
test_g95_mex.f90
[chulbert@mellin ~]$ /usr/local/bin/matlab -Dgdb
GNU gdb Red Hat Linux (6.3.0.0-1.122rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(no debugging symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".
(gdb) handle SIGALRM stop
Signal Stop Print Pass to program Description
SIGALRM Yes Yes Yes Alarm clock
(gdb) r -nojvm
Starting program: /usr/local/matlab/R2006b/bin/glnxa64/MATLAB -nojvm
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread 46912601027440 (LWP 24182)]
[New Thread 1084229952 (LWP 24187)]
[Thread 1084229952 (LWP 24187) exited]
< M A T L A B >
Copyright 1984-2006 The MathWorks, Inc.
Version 7.3.0.298 (R2006b)
August 03, 2006
Detaching after fork from child process 24188.
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
>> test_g95_mex
>>
>> clear all
>>
Program received signal SIGALRM, Alarm clock.
[Switching to Thread 46912601027440 (LWP 24182)]
0x00000030c2cc3086 in poll () from /lib64/libc.so.6
(gdb) bt
#0 0x00000030c2cc3086 in poll () from /lib64/libc.so.6
#1 0x00000030c862d40a in _XtWaitForSomething () from
/usr/lib64/libXt.so.6
#2 0x00000030c862e543 in XtAppNextEvent () from /usr/lib64/libXt.so.6
#3 0x00002aaaaeba1a5b in UIX_AssertXThread_internal ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwuix.so
#4 0x00002aaaaeba1d0e in UIX_AssertXThread_internal ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwuix.so
#5 0x00002aaaab123330 in ioGetCharNoEcho ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwservices.so
#6 0x00002aaaab47215a in iolib::IOProxy::ReportIqmRequest ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwbridge.so
#7 0x00002aaaab4733a0 in ioCmdLineEditLoad ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwbridge.so
#8 0x00002aaaab47a40b in mnGetExecStatusAsInt ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwbridge.so
#9 0x00002aaaab47a52a in mnGetExecStatusAsInt ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwbridge.so
#10 0x00002aaaab47ac7b in mnParser ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwbridge.so
#11 0x00002aaaab5af503 in mcrInstance::mnParser ()
from /usr/local/matlab/R2006b/bin/glnxa64/libmwmcr.so
#12 0x000000000040159a in ?? ()
#13 0x00000030c2c1c784 in __libc_start_main () from /lib64/libc.so.6
#14 0x00000000004013da in ?? ()
#15 0x00007fffeeec3e98 in ?? ()
#16 0x0000000000000000 in ?? ()
Give it a try now. The library initialization was setting up a
handler for SIGALRM that was never being used. It's used on some
platforms to implement checkfiles. Although adding that capability to
x86_64 is in the works, I am going to work things such that it is
disabled if you are running from a non-fortran main program. Let me
know how it goes.
Andy
I'm getting the same results that Chris posted using the latest g95
source, a SIGALRM with the same stack trace. The last few lines from
strace:
18264 16:18:05.379524 poll([{fd=4, events=POLLIN}, {fd=0,
events=POLLIN|POLLPRI}], 2, 499) = -1 EINTR (Interrupted system call)
18264 16:18:05.698031 --- SIGALRM (Alarm clock) @ 0 (0) ---
18264 16:18:05.702710 +++ killed by SIGALRM +++
18324 16:18:05.703331 <... read resumed> "", 4) = 0
18324 16:18:05.703595 close(3) = 0
18324 16:18:05.703651 exit_group(0) = ?
Bill
Maybe I didn't think this through.. I was telling gdb to stop on
SIGALRM, which apparently Matlab is generating. So it should stop
whether or not g95 has redefined the SIGALRM handler... I'm running
now without gdb, and no unhandled SIGALRMs ( "Alarm clock") thus far.
If I see any I'll report back.
Bill
However, there is the related issue of SIGINT. After running a g95
compiled MEX file in Matlab, a Ctrl-C will cause Matlab to exit
immediately. Normally Matlab will intercept the SIGINT and interrupt
any running Matlab process and return to the Matlab command line. Can
a fix similar to the SIGALRM issue be applied for SIGINT to prevent the
Matlab SIGINT handler from being redefined?
Bill
Zack