Can I join a canceled undetached thread

Hungyu Lin

unread,

Sep 28, 1996, 3:00:00 AM9/28/96

to

In this program I let the main thread create two slave threads. Slave 0
cancels Slave 1 and then exits. The main thread then
joins the two threads and retrieves the exit_values. The call to
join Slave 0 returns successfully and get the correct exit_value.
The call to join Slave 1 also returns successfully. However, the
exit_value from Slave 1 is not correct and causes a segmentation
fault when I try to get the value of *exit_value. Does this mean
that a canceled undetached thread can not pass information to the
joining thread? Your help is appreciated. Here is the output and
the program.(Program is run on AIX 4.2(? 4.1)

Hungyu Lin
Califormia State University, San Marcos

slave 0 starts
slave 1 starts
slave 0 cancel 1
thread 0 befor exit
join slave 0 normally
Segmentation fault(coredump)

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#define slaves 2
#define check(status, string) if (status != 0) perror (string)

pthread_t threads[slaves];

void *
thread_task (void *arg)
{
int count;
int my_number;
int status;
int not_done = 1;
int old;

my_number = (int) arg;

printf("slave %d starts\n", my_number);
status = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old);
sleep(rand()%5);

while (not_done) {
if (my_number == 0) {
not_done = 0;
status = pthread_setcancelstate (PTHREAD_CANCEL_DISABLE,
&old);
for (count = 0; count < slaves; count++)
if (count != my_number){
status = pthread_cancel(threads[count]);
if (status == 0)
printf("slave %d cancel %d\n", my_number, count);
}
status = pthread_setcancelstate (PTHREAD_CANCEL_ENABLE,
&old);
}
pthread_testcancel();
}
printf("thread %d befor exit\n", my_number);
pthread_exit((void *) &my_number);
}

main() {
int slave_no;
int *exit_value;
int status;
pthread_attr_t attr;

if(pthread_attr_init(&attr) != 0)
printf("init errori %d\n", status);
if(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_UNDETACHED)
!=0)
printf("set error\n");

for (slave_no = 0; slave_no < slaves; slave_no++)
status = pthread_create (
&threads[slave_no],
&attr,
thread_task,
(void *) slave_no);

for (slave_no = 0; slave_no < slaves; slave_no++) {
status = pthread_join (
threads[slave_no],
(void **) &(exit_value));
if (status == 0)
printf(" join slave %d normally\n", *exit_value);

}

}

Dave Butenhof

unread,

Sep 30, 1996, 3:00:00 AM9/30/96

to

> In this program I let the main thread create two slave threads.
> Slave 0 cancels Slave 1 and then exits. The main thread then
> joins the two threads and retrieves the exit_values. The call to
> join Slave 0 returns successfully and get the correct exit_value.
> The call to join Slave 1 also returns successfully. However, the
> exit_value from Slave 1 is not correct and causes a segmentation
> fault when I try to get the value of *exit_value. Does this mean
> that a canceled undetached thread can not pass information to the
> joining thread? Your help is appreciated. Here is the output and
> the program.(Program is run on AIX 4.2(? 4.1)
>
> Hungyu Lin
> Califormia State University, San Marcos

I have a couple of comments on this program, but first, the answer. No,
a cancelled thread cannot return an exit value to pthread_join. How
would it do so? It was cancelled, so the thread will never execute the
final "return" or pthread_exit() call. (And it's not legal to call
pthread_exit() from within a cancellation cleanup handler, either.) I
don't remember exactly what draft 7 (the API that's on AIX) said, but
POSIX threads says that joining with a cancelled thread always returns
the special value PTHREAD_CANCELED.

However, there is a way. Remember that pthread_join is just a minor
convenience, not a necessity. It's perfectly reasonable (and often
preferable) to code your own "custom" version. It's easy, too. You just
need a place to store the return value, a mutex, and a condition
variable. At exit, lock the mutex, set the value, signal (or broadcast)
the condition variable, and unlock the mutex. Instead of calling
pthread_join, you lock the mutex, and, while the value is 0 (of course,
it must be pre-initialized to 0, before creating the thread -- or use a
separate flag to indicate termination if you want to be able to return a
value of 0), wait on the condition variable. You can lock the mutex, set
the value, and signal/broadcast the condition variable from inside a
cleanup handler, too -- so you could do whatever is appropriate to
communicate the state.

Commentary:

> my_number = (int) arg;

This is "questionable" ANSI C. While we specifically intended that the
thread's start argument be used that way, we were really wrong. ANSI C
does not guarantee that it's OK to cast an int to a void* and back,
although you can get away with it on (at least) "nearly all" existing
implementations of UNIX.

> printf("slave %d starts\n", my_number);
> status = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old);

This call to pthread_setcancelstate isn't needed: threads ALWAYS start
up with cancelation state PTHREAD_CANCEL_ENABLE and cancelation type
PTHREAD_CANCEL_DEFERRED.

> status = pthread_setcancelstate (PTHREAD_CANCEL_DISABLE,
> &old);
> for (count = 0; count < slaves; count++)
> if (count != my_number){
> status = pthread_cancel(threads[count]);
> if (status == 0)
> printf("slave %d cancel %d\n", my_number, count);
> }
> status = pthread_setcancelstate (PTHREAD_CANCEL_ENABLE,
> &old);

There's no point to disabling cancelation in this loop, because your
only call (to pthread_cancel) isn't a cancellation point. Only our outer
loop can be cancelled, and only at the pthread_testcancel() call.

> pthread_exit((void *) &my_number);

In general, it is a very bad idea to return a stack address as the
thread value. Remember that you are EXITING the thread's start routine
frame, and therefore "orphaning" that location. It may very well be
overwritten during the termination of the thread! You could pull the
same (loose ANSI C) trick of casting my_number to (void*), instead of
returning the address of my_number, or you could stuff the return status
somewhere else. (For example, make the "threads" array elements a
structure rather than just a pthread_t -- with a pthread_t member and a
void *return_value member.)

/---[ Dave Butenhof ]-----------------------[ bute...@zko.dec.com ]---\
| Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 |
| 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 |
\-----------------[ Better Living Through Concurrency ]----------------/

Fred A Kulack

unread,

Sep 30, 1996, 3:00:00 AM9/30/96

to

Hungyu Lin wrote:
>
> In this program I let the main thread create two slave threads. Slave 0
> cancels Slave 1 and then exits. The main thread then
> joins the two threads and retrieves the exit_values. The call to
> join Slave 0 returns successfully and get the correct exit_value.
> The call to join Slave 1 also returns successfully. However, the
> exit_value from Slave 1 is not correct and causes a segmentation
> fault when I try to get the value of *exit_value. Does this mean
> that a canceled undetached thread can not pass information to the
> joining thread? Your help is appreciated. Here is the output and
> the program.(Program is run on AIX 4.2(? 4.1)

Think about how you specified the return value for slave 1.... You
didn't.

Posix specifies that the if a thread is cancelled via pthread_cancel(),
the exit status will be set to PTHREAD_CANCELED (which is neither NULL
nor a valid pointer).

Check the exit_value against NULL and PTHREAD_CANCELED before you
grab the status.

--

"Isn't sanity just a one trick pony any way? I mean, all you get is one
trick... rational thinking... But when you're good and crazy,
anything goes!" -- The
Tick

Fred Kulack Open Systems Enablement - AS/400 PThreads & Unix
APIs
--------------------------------------------------------------------------
IBM in Rochester, MN (Phone: 507.253.5982 IBM Tie line
553-5982)
IBM: mailto:kul...@vnet.ibm.com Personal:
mailto:kul...@infonet.isl.net
#define DISCLAIMER (The views expressed are mine, not
IBM's)

Fred A Kulack

unread,

Oct 1, 1996, 3:00:00 AM10/1/96

to

On AIX 4.1, thread specific data is bound to a key.
When an attempt is made to use pthread_key_delete() on a key which is in
use (at least one thread has non-null thread specific data associated
with that key), pthread_key_delete() fails with EBUSY.

Obviously, its trying to prevent the case where you pthread_key_delete()
followed by a pthread_key_create() that returns the same key. If threads
still have a tls pointer for that key, they now are going to be
unpleasantly surprised when possibly the wrong destructor runs for it.

Although the spec says nothing about this behavior, I'm wondering if any
other platforms provide this behavior or if they just say 'tough beans'
and require the user to not delete until they are truely and for surely
done with the key.

Dave Butenhof

unread,

Oct 2, 1996, 3:00:00 AM10/2/96

to

Fred Kulack <kul...@rchland.ibm.com> wrote:
> On AIX 4.1, thread specific data is bound to a key.
> When an attempt is made to use pthread_key_delete() on a key which is in
> use (at least one thread has non-null thread specific data associated
> with that key), pthread_key_delete() fails with EBUSY.
>
> Obviously, its trying to prevent the case where you pthread_key_delete()
> followed by a pthread_key_create() that returns the same key. If threads
> still have a tls pointer for that key, they now are going to be
> unpleasantly surprised when possibly the wrong destructor runs for it.
>
> Although the spec says nothing about this behavior, I'm wondering if any
> other platforms provide this behavior or if they just say 'tough beans'
> and require the user to not delete until they are truely and for surely
> done with the key.

In fact, the "spec" does say something about this behavior. It is
expressly allowed, even encouraged, by the draft 7 version that AIX
implements. That is, the application MUST ensure that all threads have
NULL values for a key before deleting the key -- and the result of
attempting to delete a key with non-NULL values is undefined. There is
an error (EBUSY) defined for use if the implementation chooses to
detect this error.

However, POSIX threads (that is, the real standard) removed this
loophole. The implementation MUST allow you to delete keys that have
non-NULL values, without failing. The rationale for the change was
simply that it's hard, or even impossible, to control whether some
thread may still have a value. Because the standard provides no
mechanism to ensure that there are no non-NULL values, it was felt that
it was not reasonable to allow the delete operation to fail in that
case.

I'm not convinced that I agree with the logic, but I didn't object to
the change. (At the time, I thought the whole idea of being able to
delete a key was silly, so I perhaps didn't pay enough attention to the
details.) Of course, at the time, my implementation provided essentially
infinite TSD (thread-specific data) keys, and there was no need to
delete any. For a lot of reasons, I later decided to take advantage of
the fixed limit (_PTHREAD_KEYS_MAX is 255 on Digital UNIX 4.0), and that
makes key deletion meaningful.

Unfortunately, key deletion, without the ability to destroy context in
other threads, can cause all sorts of trouble if that key is later
reused. The risk can be minimized by reallocating deleted keys only
AFTER passing out the last "pristine" key -- but the risk still exists.

There's no perfect solution. An implementation could maintain enough
state, for example, a "generation count" on each key (AND on each TSD
value for each thread) to recognize that the TSD value for a thread is
"stale" when pthread_getspecific() is called on a key that was deleted
and later re-allocated -- but that's expensive (it's also not 100%
reliable, since the generation count could, in theory, wrap). POSIX
doesn't even allow the vastly more expensive and architecturally messy
solution of calling destructors in all active threads (whew).

The thread library could also, on pthread_key_delete, simply clear the
TSD value for all active threads. That'd be "safe", and, I believe,
legal -- the application can no longer get a value for that key in any
thread, since the key is now invalid, and destructors will not be run.
Of course it's a memory leak, since whatever the TSD value points to
won't be freed: but the memory leak is unavoidable at that point, and
it solves the stale-data problem. On the other hand, no application
could count on that behavior, since it isn't portable -- meaning any
code that creates a key, and runs in a process where some code deletes
a key, is "on its own".

I've worked out a viable (though perhaps not ideal) solution, which will
be in a future version of Digital UNIX (although whether it's documented
for general use is still an open issue -- heck, I just checked in the
code today). This isn't really the right place to get into details, but
if any other POSIX thread implementors are interested in the problem
(along with other TSD issues such as the memory leak across fork), a
"semi-portable" solution would be better than each going their own way.
Maybe we should talk, off-line.