Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tcl Thread

24 views
Skip to first unread message

Njett

unread,
Oct 20, 2008, 4:01:13 PM10/20/08
to
I've been emailing back and forth with Jeff Hobbs on the tcl thread
mailing list trying to figure out an issue with the Tcl thread
extension. Basically the problem script is:

package require Thread
set workers 10
set poolId [tpool::create -minworkers $workers -maxworkers $workers]

When I run this from Windows XP, I have no trouble starting 300+
threads. When I try from my Gentoo Linux machine it hangs on the last
line. I can start 2 or 3 threads but it starts to hang if I increase
it, and by the time I increase it to 10 threads it hangs
consistently. Jeff Hobbs confirmed that it work fine for him on Linux
using the Active State binaries, so I tried them on my Gentoo box and
got the same result. I also recently had a co-worker try it on Ubuntu
and he has the same problem.

See http://sourceforge.net/mailarchive/forum.php?thread_name=20081007151818.pvcn26xrs4k00ck0%40webmail.realnets.com&forum_name=tcl-threads
for the original conversation.

Anyone have any ideas here? I'm using Tcl/Tk 8.5.1 and Thread 2.6.5.
I've also tried with Tcl/Tk 8.4.18 with the same results.

Thanks for your help.
Nathan

Alexandre Ferrieux

unread,
Oct 20, 2008, 5:31:48 PM10/20/08
to
On Oct 20, 10:01 pm, Njett <nj...@realnets.com> wrote:
>
> Seehttp://sourceforge.net/mailarchive/forum.php?thread_name=200810071518...

> for the original conversation.
>
> Anyone have any ideas here?  I'm using Tcl/Tk 8.5.1 and Thread 2.6.5.
> I've also tried with Tcl/Tk 8.4.18 with the same results.

I can at least answer this one:

> Is there any way to determine what Tcl and/or the Thead extension
> is/are doing when they hang?

Yes: strace, pstack, gdb.

First attach with strace to see whether anything is still alive in
there. Depending on the thread implementation in the OS, you may or
may not see threads as individual processes with different pids. In
that case give multiple '-p' options to strace in order to attach to
all of them.

At this point you may see threads still having an activity. Stop
strace, re-attach with gdb, and use 'bt' to see their backtrace. Call
them the 'live threads' (if any).

Then run a pstack on the process, and bring the output here.
Alternatively you can attach with gdb and issue 'thread apply all bt'.
Both will give you a stack trace of each thread. To improve
readability of this, it might be good to reproduce with Tcl compiled
in -g (comment/uncomment two CFLAGS= lines in the Makefile built by
'configure').

From this output you can weed out the live threads, since you have
already isolated them. The remaining ones are more interesting. I'd
bet they are all in some variant of pthread_mutex_lock(). Then the
difficult deadlock-hunt starts: we must identify a cycle in the mutex
"hierarchy" (which is no longer one since there's a cycle).
Unfortunately the pthread library doesn't offer something as useful as
"who is holding that mutex", so a bit of RTFS is in order. So let's
hope the trace is small enough to fit here ;-)

-Alex

Njett

unread,
Oct 21, 2008, 11:12:25 AM10/21/08
to
Thanks for your reply Alex. I opened tclsh and ran the script in my
original post, below is the output from strace and gdb.

nathan@NJPC ~ $ strace -p 6218
Process 6218 attached - interrupt to quit
futex(0x905dcdc, FUTEX_WAIT, 25, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
futex(0x9076360, FUTEX_WAKE, 1) = 1
mmap2(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb4420000
mprotect(0xb4420000, 4096, PROT_NONE) = 0
clone(child_stack=0xb4c204b4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|
CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|
CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0xb4c20bd8,
{entry_number:6, base_addr:0xb4c20b90, limit:1048575, seg_32bit:1,
contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0,
useable:1}, child_tidptr=0xb4c20bd8) = 17759
futex(0xb4c20d84, FUTEX_WAKE, 1) = 1
futex(0x9076360, FUTEX_WAKE, 1) = 1
futex(0x905dcd8, FUTEX_WAKE, 1) = 1
futex(0x905dcdc, FUTEX_WAIT, 35, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
futex(0x905dcdc, FUTEX_WAIT, 36, NULL^C <unfinished ...>
Process 6218 detached

nathan@NJPC ~ $ gdb
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/
gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
(gdb) attach 6218
Attaching to process 6218
Reading symbols from /usr/bin/tclsh8.5...(no debugging symbols
found)...done.
Using host libthread_db library "/lib/libthread_db.so.1".
Reading symbols from /usr/lib/libtcl8.5.so...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libtcl8.5.so
Reading symbols from /lib/libdl.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0xb7c5a6c0 (LWP 6218)]
[New Thread 0xb5c22b90 (LWP 6223)]
[New Thread 0xb6423b90 (LWP 6222)]
[New Thread 0xb6c24b90 (LWP 6221)]
[New Thread 0xb7425b90 (LWP 6220)]
[New Thread 0xb7c59b90 (LWP 6219)]
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libm.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1...
(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/lib/thread2.6.5/libthread2.6.5.so...(no
debugging symbols found)...done.
Loaded symbols for /usr/lib/thread2.6.5/libthread2.6.5.so

(no debugging symbols found)
0xffffe424 in __kernel_vsyscall ()
(gdb) thread apply all bt

Thread 6 (Thread 0xb7c59b90 (LWP 6219)):
#0 0xffffe424 in __kernel_vsyscall ()
#1 0xb7d167d1 in select () from /lib/libc.so.6
#2 0xb7ec4dbd in ?? () from /usr/lib/libtcl8.5.so

Thread 5 (Thread 0xb7425b90 (LWP 6220)):
#0 0xffffe424 in __kernel_vsyscall ()
#1 0xb7dbb576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7ec3bb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 4 (Thread 0xb6c24b90 (LWP 6221)):
#0 0xffffe424 in __kernel_vsyscall ()
#1 0xb7dbb576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7ec3bb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 3 (Thread 0xb6423b90 (LWP 6222)):
#0 0xffffe424 in __kernel_vsyscall ()
#1 0xb7dbb576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7ec3bb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 2 (Thread 0xb5c22b90 (LWP 6223)):
#0 0xffffe424 in __kernel_vsyscall ()
#1 0xb7dbb576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7ec3bb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 1 (Thread 0xb7c5a6c0 (LWP 6218)):
#0 0xffffe424 in __kernel_vsyscall ()
#1 0xb7dbb576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7ec3bb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so
#0 0xffffe424 in __kernel_vsyscall ()


Let me know if I can provide any additional info.
Thanks,
Nathan

Alexandre Ferrieux

unread,
Oct 21, 2008, 11:55:00 AM10/21/08
to

Hmmm, you mean you're getting this deadlock with only 6 threads...
interesting !
However, I see that thread 6 is in select(), so it should be
responsive to external events.
Are you sure you attached at a point where the whole stuff was really
frozen ?

-Alex

Njett

unread,
Oct 21, 2008, 12:41:42 PM10/21/08
to
Alex,

I'm pretty certain. I started the script and when I went to run
strace and gdb I found they were not installed, so I had to compile
and install them, then come back and run them. The script had been
stuck for at least 5-7 minutes before I ran strace and gdb. I
realized I had not compiled tcl with the debug flag so I just
recompiled. Below is the strace and gdb info, I again left the script
running for about 5 minutes before running strace and gdb.

nathan@NJPC ~ $ strace -p 15967
Process 15967 attached - interrupt to quit
futex(0x991bcdc, FUTEX_WAIT, 35, NULL) = -1 EAGAIN (Resource
temporarily unavailable)
futex(0x991bcdc, FUTEX_WAIT, 36, NULL^C <unfinished ...>
Process 15967 detached


nathan@NJPC ~ $ gdb
GNU gdb 6.7.1
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/
gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".

(gdb) attach 15967
Attaching to process 15967


Reading symbols from /usr/bin/tclsh8.5...(no debugging symbols
found)...done.
Using host libthread_db library "/lib/libthread_db.so.1".
Reading symbols from /usr/lib/libtcl8.5.so...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libtcl8.5.so
Reading symbols from /lib/libdl.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]

[New Thread 0xb7d026c0 (LWP 15967)]
[New Thread 0xb4cc8b90 (LWP 15974)]
[New Thread 0xb54c9b90 (LWP 15973)]
[New Thread 0xb5ccab90 (LWP 15972)]
[New Thread 0xb64cbb90 (LWP 15971)]
[New Thread 0xb6cccb90 (LWP 15970)]
[New Thread 0xb74cdb90 (LWP 15969)]
[New Thread 0xb7d01b90 (LWP 15968)]

Thread 8 (Thread 0xb7d01b90 (LWP 15968)):


#0 0xffffe424 in __kernel_vsyscall ()

#1 0xb7dbe7d1 in select () from /lib/libc.so.6
#2 0xb7f6cdbd in ?? () from /usr/lib/libtcl8.5.so

Thread 7 (Thread 0xb74cdb90 (LWP 15969)):


#0 0xffffe424 in __kernel_vsyscall ()

#1 0xb7e63576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7f6bbb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 6 (Thread 0xb6cccb90 (LWP 15970)):


#0 0xffffe424 in __kernel_vsyscall ()

#1 0xb7e63576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7f6bbb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 5 (Thread 0xb64cbb90 (LWP 15971)):


#0 0xffffe424 in __kernel_vsyscall ()

#1 0xb7e63576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7f6bbb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 4 (Thread 0xb5ccab90 (LWP 15972)):


#0 0xffffe424 in __kernel_vsyscall ()

#1 0xb7e63576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7f6bbb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 3 (Thread 0xb54c9b90 (LWP 15973)):


#0 0xffffe424 in __kernel_vsyscall ()

#1 0xb7e63576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7f6bbb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so

Thread 2 (Thread 0xb4cc8b90 (LWP 15974)):


#0 0xffffe424 in __kernel_vsyscall ()

#1 0xb7e65c2e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2 0xb7e61a93 in _L_mutex_lock_50 () from /lib/libpthread.so.0
#3 0xb7e6147d in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0xb7f6bc42 in Tcl_MutexLock () from /usr/lib/libtcl8.5.so
#5 0x09934360 in ?? ()
#6 0x0994d010 in ?? ()
#7 0xb4cc84b8 in ?? ()
#8 0xb7e92ff4 in ?? () from /usr/lib/thread2.6.5/libthread2.6.5.so
#9 0xbfbaa2d8 in ?? ()
#10 0xbfbaa2d8 in ?? ()
#11 0x0992f518 in ?? ()
#12 0xb7e8c1b4 in TpoolWorker () from /usr/lib/thread2.6.5/
libthread2.6.5.so
#13 0xb7f89ff4 in ?? () from /usr/lib/libtcl8.5.so
#14 0xbfbaa2d8 in ?? ()
#15 0xb7e931ec in listMutex () from /usr/lib/thread2.6.5/
libthread2.6.5.so
#16 0xb7f89ff4 in ?? () from /usr/lib/libtcl8.5.so
#17 0xb7f89ff4 in ?? () from /usr/lib/libtcl8.5.so
#18 0x099b0400 in ?? ()
#19 0x0994d010 in ?? ()
#20 0xb7f50e38 in TclpFree () from /usr/lib/libtcl8.5.so
#21 0x00000000 in ?? ()

Thread 1 (Thread 0xb7d026c0 (LWP 15967)):


#0 0xffffe424 in __kernel_vsyscall ()

#1 0xb7e63576 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/
libpthread.so.0
#2 0xb7f6bbb3 in Tcl_ConditionWait () from /usr/lib/libtcl8.5.so


#0 0xffffe424 in __kernel_vsyscall ()

This time it looks like it got 8 threads going before it hung.

Nathan

Alexandre Ferrieux

unread,
Oct 21, 2008, 3:46:18 PM10/21/08
to

This time there's an outlier (thread 2), grinding its teeth on a
mutex.
Do you confirm that this trace was obtained with:

package require Thread
set workers 10
set poolId [tpool::create -minworkers $workers -maxworkers
$workers]

But, unless I miss something obvious, this code just creates workers
ready to receive jobs, but does not actually submit anything. So how
do you detect that it's frozen ? Is there other code ?

-Alex

Njett

unread,
Oct 21, 2008, 4:03:57 PM10/21/08
to
Alex,

That is correct, I am starting tclsh and manually running each line
one at a time. When I run the tpool::create command it should start
the threads and report back the thread pool ID. If I try starting 2
threads it reports back an ID immediately, when I try 10 it hangs
indefinitely (never returns the tpool ID, never returns to a % prompt
to enter another command). On Windows XP if I try to start 100 it
takes roughly 10 seconds then returns the tpool id. This section of
code is part of a larger project I'm working on, I was able to isolate
this section of code to reliably reproduce the issue I'm having. The
app freezes before the tpool is handed any actual jobs to process.

Nathan

Alexandre Ferrieux

unread,
Oct 21, 2008, 4:33:21 PM10/21/08
to

OK. Thanks for making all this clear. I think you now have gathered
proper material to open a very useful bug report. I'm not familiar
with tpools and so must pass on to the Thread Extension developers,
but such a tiny test case with so violent effects is a bug hunter's
dream !

-Alex

GPS

unread,
Oct 21, 2008, 6:22:53 PM10/21/08
to
Njett wrote:

> I've been emailing back and forth with Jeff Hobbs on the tcl thread
> mailing list trying to figure out an issue with the Tcl thread
> extension. Basically the problem script is:
>
> package require Thread
> set workers 10
> set poolId [tpool::create -minworkers $workers -maxworkers $workers]

Thread pools seem to be inherently broken. I don't know why. Several
people have experienced deadlocks (including me) with tpool in Linux.

Thread pools deadlock in a FUTEX_WAIT.


> When I run this from Windows XP, I have no trouble starting 300+
> threads. When I try from my Gentoo Linux machine it hangs on the last
> line. I can start 2 or 3 threads but it starts to hang if I increase
> it, and by the time I increase it to 10 threads it hangs
> consistently. Jeff Hobbs confirmed that it work fine for him on Linux
> using the Active State binaries, so I tried them on my Gentoo box and
> got the same result. I also recently had a co-worker try it on Ubuntu
> and he has the same problem.

I suspect it's actually a bug in the Thread package.

> See
>
http://sourceforge.net/mailarchive/forum.php?thread_name=20081007151818.pvcn26xrs4k00ck0%40webmail.realnets.com&forum_name=tcl-threads
> for the original conversation.
>
> Anyone have any ideas here? I'm using Tcl/Tk 8.5.1 and Thread 2.6.5.
> I've also tried with Tcl/Tk 8.4.18 with the same results.

I've tried it with 8.5.3 IIRC and 8.6 with Ubuntu.

My test case is on a disassembled box, but it's generally quite easy to
deadlock a tpool in Linux, as you seem to have found.


George

Alexandre Ferrieux

unread,
Oct 22, 2008, 3:20:08 AM10/22/08
to
On Oct 22, 12:22 am, GPS <georg...@xmission.com> wrote:
>
> Thread pools seem to be inherently broken.  I don't know why.  Several
> people have experienced deadlocks (including me) with tpool in Linux.

Update: Nathan has found an existing bugreport [2005794]:

http://sourceforge.net/tracker/?func=detail&atid=110894&aid=2005794&group_id=10894

and we have followed up there.
For those interested, I have proposed a fix (that still needs Zoran's
validation ;-) which seems to solve the deadlock problem.
The problem was just a lack of locking around the loop that starts the
minworkers: so just add a Tcl_MutexLock/Unlock(&tpoolPtr->mutex)
around that loop.

As a side note, one funny thing is that the Thread package also comes
with a script-only implementation of [tpool], which displays exactly
the same bug.
The equivalent fix should be to add a tsv::lock around the same loop
(in Tcl).

-Alex


Alexandre Ferrieux

unread,
Oct 22, 2008, 8:12:01 AM10/22/08
to
On Oct 22, 9:20 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

> For those interested, I have proposed a fix (that still needs Zoran's
> validation ;-) which seems to solve the deadlock problem.

2nd update: Zoran has committed the patch (along with further
cleanup). Grab the HEAD.

-Alex

Njett

unread,
Oct 22, 2008, 12:14:23 PM10/22/08
to
I can confirm the patch works and tpools function properly on Linux
now.

> 2nd update: Zoran has committed the patch (along with further
> cleanup). Grab the HEAD.

Now we just need a new version of the thread package released, the
current version 2.6.5 is two years old, and if you look at the
changelogs there have been a number of bugs fixed since then.

Nathan

Donal K. Fellows

unread,
Oct 22, 2008, 6:36:12 PM10/22/08
to
Njett wrote:
> Now we just need a new version of the thread package released, the
> current version 2.6.5 is two years old, and if you look at the
> changelogs there have been a number of bugs fixed since then.

Sounds like we need a maintenance release, even if nothing else. Time
to twist some arms (gently...)

Donal.

0 new messages