[Boost-users] Auto unit test suite hangs due to locked mutex

93 views
Skip to first unread message

Richard Newman

unread,
Aug 10, 2005, 5:15:03 PM8/10/05
to boost...@lists.boost.org
I have searched the net for some indication of what I might be dealing
with but have come up empty.

This is on a Fedora Core 2 system running GCC 3.3.3. Boost was
installed via RPM:
# rpm -q boost
boost-1.32.0-3.1.2
#

I have a reasonably small auto unit test suite via Boost with 48 small
discrete tests. It has been running fine previously. There are no
tests that are known to invoke mutexes, Boost or otherwise.

When we previously built Boost manually, this was not an issue; we're
now trying to use the standard RPMs to make distribution/release
building simpler. There is possibly an issue in how we're linking to
libraries in our automake since those had to be changed to use the RPM
installed libraries rather than our previously home-built ones.
However, the make (compile, link, etc.) throws no errors, etc., and so
we have no indication other than this hang that this is indeed where the
problem lies. I had these make files, etc. working fine on a similarily
installed system that instead ran FC4 with gcc 4.0.1 and Boost was
installed as an RPM but at boost-1.32.0-6. When I copied to the FC2
system and set it up, I now see the issue.

The lone symptom is that the test suite seems to complete but then hangs
afterwards, apparently deadlocked on a mutex.

When I run the test program under --log_level=all, I get the following
output:

------------
$ mytestsuite --log_level=all
Running 48 test cases...
Entering test suite "Auto Unit Test"
Entering test case "CounterConstruction"
[...specific test output...]
Leaving test case "CounterConstruction"

[...more test cases...]

Entering test case "DataNodeEqualityTest"
[...specific test output...]
Leaving test case "DataNodeEqualityTest"
Leaving test suite "Auto Unit Test"


*** No errors detected
--------------
...and then it just hangs without returning to the command line until I
ctrl-C.

When I run it under gdb to get a stack trace, I get the following session:
--------------
$ libtool gdb mytestsuite
*** Warning: inferring the mode of operation is deprecated.
*** Future versions of Libtool will require -mode=MODE be specified.
GNU gdb Red Hat Linux (6.0post-0.20040223.19rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host
libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) run
Starting program: /home/build/<ourpath>/.libs/lt-mytestsuite
Error while mapping shared library sections:
: Success.
Error while reading shared library symbols:
: No such file or directory.
[Thread debugging using libthread_db enabled]
[New Thread -150709728 (LWP 13105)]
Error while reading shared library symbols:
: No such file or directory.
Error while reading shared library symbols:
: No such file or directory.
Running 48 test cases...

*** No errors detected

Program received signal SIGINT, Interrupt.
[Switching to Thread -150709728 (LWP 13105)]
0x00272402 in ?? ()
(gdb) backtrace
#0 0x00272402 in ?? ()
#1 0x0064dcbe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2 0x0064ac84 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
#3 0x00000000 in ?? ()
(gdb) quit
The program is running. Exit anyway? (y or n) y
$
--------------

Finally, a library listing for the test suite executable gives the
following:
--------------
$ ldd ./.libs/mytestsuite
linux-gate.so.1 => (0x001fb000)
mylibrary.0 => not found [...related to libtool use?]
libxslt.so.1 => /usr/lib/libxslt.so.1 (0x03ec2000)
libxml2.so.2 => /usr/lib/libxml2.so.2 (0x03a8c000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00644000)
libz.so.1 => /usr/lib/libz.so.1 (0x00557000)
libxmlwrapp.so.5 => /home/build/<somepath>/dep/lib/libxmlwrapp.so.5
(0x00dcc000)
libboost_unit_test_framework.so.1 =>
/usr/lib/libboost_unit_test_framework.so.1 (0x003bf000)
libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x0080c000)
libm.so.6 => /lib/tls/libm.so.6 (0x0052c000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00763000)
libc.so.6 => /lib/tls/libc.so.6 (0x0040f000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x003f2000)
$
--------------

Any advice to proceed and repair? We'd like very much to be able to use
these RPMs than reverting back to our previous homegrown approach.

Thank you and kind regards,

Richard Newman
Crowley Davis Research, Inc.

_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Gennadiy Rozental

unread,
Aug 10, 2005, 11:09:14 PM8/10/05
to boost...@lists.boost.org
> Any advice to proceed and repair? We'd like very much to be able to use
> these RPMs than reverting back to our previous homegrown approach.

I don't think I could be of too much help here. Couple notes though. All
boost libraries could be built both in single and multi-thread mode. Could
you choose one to use? Which one are you using now? Does it anything to do
with what you are doing - Did you try to run trivial single test case
module? Why do you need to link with pthreads at all?

Gennadiy

Richard Newman

unread,
Aug 11, 2005, 1:18:41 PM8/11/05
to boost...@lists.boost.org
I appreciate your reply.

With it coming from an RPM, I think I can only tell if it were linked in
multi-thread mode by looking at its dependencies (there is no _mt_gcc
type suffixes on the libraries since they are just copied into /usr/lib
by the RPM). The dependencies include libpthread, so I'm going to
assume from that multi-threaded mode has been used to build the
libraries for the package. When we built Boost directly, we generally
used the _mt_gcc version.

Nominally, we don't need pthreads right now in our test case as none of
the tests seem to directly invoke mutexes. However, the library we are
building the test suite to cover does include mutex support. (We
recently began to use the Boost test framework to automate unit testing
and so we have been adding tests as we touch code during refactors, etc.
so our test coverage is not at all complete yet). In any case, linking
with pthread on my FC4/gcc4.0.1/boost1.32.0-6rpm worked fine. I do
think though that a library reference might be to blame, I just don't
know where to look.

I could certainly do a trivial single test case. However, I had avoided
such an approach because to compare apples to apples, the best is to
leave everything in place and comment out all the unit tests, adding
back in a single one. From there though, I could start to tear apart
the make files and note when things changed. I'm probably left with
that as the only real diagnostic now; I was hoping the symptoms were
indicative of some library, etc., missing that I was ignorant of.

Thanks,
Richard

Richard Newman

unread,
Aug 11, 2005, 2:22:53 PM8/11/05
to boost...@lists.boost.org
When I reduced the test set to the following test, it still hangs but
says that one test passed (as the 48 I used before did).

BOOST_AUTO_UNIT_TEST(TrivialTest)
{
int x = 0;
x = 1;
}

When I remove this one test and so have no tests, no hang occurs. It
simply says no errors and terminates normally. I guess I can review the
macro represented by BOOST_AUTO_UNIT_TEST to see what might be invoked
here.

The make session for both cases was the same (excepting of course the
test output):
--------------------
$ make
if /bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H
-I. -I/home/<somepath>/project/src/mylibrary -I../..
-I/home/<somepath>/project/src -I/home/<somepath>/construct/dep/include
-D__CSGA_REVISION__='"2057M"' -pthread -Werror -Wall -g -O2 -MT
function.lo -MD -MP -MF ".deps/function.Tpo" -c -o function.lo
function.cpp; \
then mv -f ".deps/function.Tpo" ".deps/function.Plo"; else rm -f
".deps/function.Tpo"; exit 1; fi
g++ -DHAVE_CONFIG_H -I. -I/home/<somepath>/project/src/mylibrary
-I../.. -I/<somepath>/project/src -I/<somepath>/dep/include
-D__CSGA_REVISION__=\"2057M\" -pthread -Werror -Wall -g -O2 -MT
function.lo -MD -MP -MF .deps/function.Tpo -c function.cpp -fPIC -DPIC
-o .libs/function.o
/bin/sh ../../libtool --tag=CXX --mode=link g++ -pthread -Werror -Wall
-g -O2 -o mylibrary.la -rpath /<somepath>/release/product/lib
<several .lo files listed> -L/<somepath>/dep/lib -lxslt -lxml2 -lxmlwrapp
rm -fr .libs/mylibrary.0 .libs/mylibrary.0.0.0 .libs/mylibrary.la
.libs/mylibrary.lai
g++ -shared -nostdlib
/usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../../crti.o
/usr/lib/gcc-lib/i386-redhat-linux/3.3.3/crtbeginS.o
<several .o files listed under .libs/> -L/usr/lib -L/<somepath>/dep/lib
/usr/lib/libxslt.so /usr/lib/libxml2.so -lxmlwrapp
-L/usr/lib/gcc-lib/i386-redhat-linux/3.3.3
-L/usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../.. -lstdc++ -lm -lc
-lgcc_s /usr/lib/gcc-lib/i386-redhat-linux/3.3.3/crtendS.o
/usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../../crtn.o -pthread
-Werror -Wall -g -O2 -Wl,-soname -Wl,mylibrary.0 -o .libs/mylibrary.0.0.0
(cd .libs && rm -f mylibrary.0 && ln -s mylibrary.0.0.0 mylibrary.0)
(cd .libs && rm -f mylibrary && ln -s mylibrary.0.0.0 mylibrary)
creating mylibrary.la
(cd .libs && rm -f mylibrary.la && ln -s ../mylibrary.la mylibrary.la)
/bin/sh ../../libtool --tag=CXX --mode=link g++ -pthread -Werror -Wall
-g -O2 -o mytestsuite -R/<somepath>/dep/lib mytestsuite.o
../../src/mylibrary/mylibrary.la -L/<somepath>/dep/lib
-lboost_unit_test_framework
g++ -pthread -Werror -Wall -g -O2 -o .libs/mytestsuite mytestsuite.o
../../src/mylibrary/.libs/mylibrary -L/<somepath>/dep/lib -L/usr/lib
/usr/lib/libxslt.so /usr/lib/libxml2.so -lm -lpthread -lz -lxmlwrapp
-lboost_unit_test_framework -Wl,--rpath
-Wl,/<somepath>/release/product/lib -Wl,--rpath -Wl,/<somepath>/dep/lib
creating mytestsuite
../../src/mylibrary/mytestsuite

*** No errors detected
--------------------


Richard

Richard Newman

unread,
Aug 11, 2005, 4:12:50 PM8/11/05
to boost...@lists.boost.org
I took the .cpp file where I wrote the TrivialTest and used gcc (3.3.3)
to produce the expanded listing (using -E). I then substituted that
expanded version back as the .cpp file and rebuilt via automake (using
-g3 -O0) for debug. Now my backtrace is more informative:

--------------------------
(gdb) cont
Running 1 test case...

*** No errors detected

Program received signal SIGINT, Interrupt.
0x008a4402 in ?? ()
(gdb) backtrace
#0 0x008a4402 in ?? ()
#1 0x0064dcbe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2 0x0064ac84 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
#3 0x003fd840 in _dl_runtime_resolve () from /lib/ld-linux.so.2
#4 0x0036e8ea in scoped_lock (this=0x8aab67c, m=@0x3fd840) at
lwm_pthreads.hpp:72
#5 0x0036e8ea in scoped_lock (this=0xfeedd910, m=@0x8aab67c) at
lwm_pthreads.hpp:72
#6 0x0036e7dc in boost::detail::sp_counted_base::release
(this=0x8aab670) at shared_count.hpp:140
#7 0x0036e170 in ~shared_count (this=0x8aab650) at shared_count.hpp:378
#8 0x00377392 in ~shared_ptr (this=0x8aab64c) at unit_test_suite.hpp:60
#9 0x003772eb in ~test_case (this=0x8aab630) at unit_test_suite.hpp:60
#10 0x0037719c in ~function_test_case (this=0x8aab630) at
call_traits.hpp:103
#11 0x00b45d45 in boost::unit_test::ut_detail::normalize_test_case_name
() from /usr/lib/libboost_unit_test_framework.so.1
#12 0x00b46018 in
std::for_each<std::_List_iterator<boost::unit_test::test_case*,
boost::unit_test::test_case*&, boost::unit_test::test_case**>, void
(*)(boost::unit_test::test_case*)> () from
/usr/lib/libboost_unit_test_framework.so.1
#13 0x00b459cb in boost::unit_test::test_suite::~test_suite () from
/usr/lib/libboost_unit_test_framework.so.1
#14 0x00b46ddc in main () from /usr/lib/libboost_unit_test_framework.so.1
#15 0x00423ad4 in __libc_start_main () from /lib/tls/libc.so.6
#16 0x08048a3d in _start ()
(gdb)
--------------------------

Looks like the Boost unit test framework is possibly not able to release
a reference to the unit test and thus holding the mutex hostage.

Are there recent bug fixes in this area or known issues with regard to FC2?

Richard

Gennadiy Rozental

unread,
Aug 11, 2005, 11:02:24 PM8/11/05
to boost...@lists.boost.org
Unfortunately (and IMO it's a critical design flaw) smart_ptr chooses
threadness on defines level. Still you should be able to build the unit test
framework in single threaded mode when shared_ptr wouldn't be using any
mutexes.

Gennadiy

"Richard Newman" <ric...@cdres.com> wrote in message
news:ddgbg3$g28$1...@sea.gmane.org...
>I took the .cpp file where I wrote the TrivialTest and used gcc (3.3.3)
> to produce the expanded listing (using -E). I then substituted that
> expanded version back as the .cpp file and rebuilt via automake (using
> -g3 -O0) for debug. Now my backtrace is more informative:
>
> --------------------------
> (gdb) cont
> Running 1 test case...
>
> *** No errors detected
>
> Program received signal SIGINT, Interrupt.
> 0x008a4402 in ?? ()
> (gdb) backtrace
> #0 0x008a4402 in ?? ()
> #1 0x0064dcbe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
> #2 0x0064ac84 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
> #3 0x003fd840 in _dl_runtime_resolve () from /lib/ld-linux.so.2
> #4 0x0036e8ea in scoped_lock (this=0x8aab67c, m=@0x3fd840) at
> lwm_pthreads.hpp:72
> #5 0x0036e8ea in scoped_lock (this=0xfeedd910, m=@0x8aab67c) at
> lwm_pthreads.hpp:72
> #6 0x0036e7dc in boost::detail::sp_counted_base::release



Reply all
Reply to author
Forward
0 new messages