Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Threads and Signals

0 views
Skip to first unread message

Sarat Babu Kamisetty

unread,
Feb 1, 2002, 1:09:57 AM2/1/02
to
Hi,

I have a very simple program (pasted below) where a main thread creates two
other threads and sleeps for a second. These two threads install a signal
handler for same signal SIGURG and just block by calling select(). Now the
main thread sends a signal to each of those threads using thr_kill(). For
some reason, only one thread is getting the signal and then it core dumps.
My code, sample run and gdb analysis are below. Looks like when delivering
the second signal, it is accessing a NULL pointer in kernel. I think after
delivering the first signal kernel seems to be resetting the signal handler
pointer to NULL and so while delivering the second signal it crashes. I was
wondering if anyone have an explanation for this behavior ? Both the
following approaches work to avoid the crash:

1) Changing the default_signal_handler() to call signal() function again to
re-install the signal handler AND by putting sleep(1) between the two calls
to thr_kill() in main avoids the crash.
2) Each thread installs signal handler for DIFFERENT signals

Coming to the actual problem, i want to create multiple threads each of
these install signal handler for the SAME signal. All of these call select()
at one point or the other either for waiting for socket data or any signals
from other threads. When some active thread (say A) wants to signal other
threads (B, C, D), it calls thr_kill() with each of those threads IDs (in a
loop). According to my observations described earlier, it looks only one of
B/C/D will get the signal and rest of them block. How do i wake up all those
threads ? Unix interface seems to be strange, you can install signal handler
for the whole process for a given signal (not per each thread), but can
signal based on thread ids. Can any help me out here ? Since i have large
number of threads i cannot go with approach 2). Approach 1) does not seem to
be acceptable because it will introduce many delays. I also have to use
select() (not anything like conditional variables or mutexes since i have to
watch for socket data as well as signals too).

Thanks,
Sarat

************* Code ***************
#include <thread.h>
#include <signal.h>

void default_signal_handler(int signal) {
printf("tid = %d in signal handler\n", thr_self());
}

void func(void) {
printf("thread tid = %d created\n", thr_self());
signal(SIGURG, default_signal_handler);
select(0, NULL, NULL, NULL, 0);
while(1);
}

int main() {
unsigned int tid1, tid2;
int ret;

thr_create(NULL, NULL, (void *)func, NULL, THR_DETACHED, &tid1);
thr_create(NULL, NULL, (void *)func, NULL, THR_DETACHED, &tid2);

sleep(1);

thr_kill(tid1, SIGURG);
thr_kill(tid2, SIGURG);

while (1);
}

Sample Output:

>gcc -g thr.c -lpthread
>a.out
thread tid = 4 created
thread tid = 5 created
tid = 4 in signal handler
Segmentation fault (core dumped)

GDB Analysis:

>gdb a.out
(gdb) core core
Core was generated by `a.out'.
Program terminated with signal 9, Killed.
Reading symbols from /usr/lib/libpthread.so.1...done.
Reading symbols from /usr/lib/libc.so.1...done.
Reading symbols from /usr/lib/libdl.so.1...done.
Reading symbols from
/usr/platform/SUNW,Ultra-Enterprise/lib/libc_psr.so.1...
done.
Reading symbols from /usr/lib/libthread.so.1...done.
#0 0x0 in ?? ()
(gdb) bt
#0 0x0 in ?? ()
#1 0xef656658 in __sighndlr ()
#2 <signal handler called>
#3 0xef6b743c in poll ()
#4 0x11ad8 in func ()


David Butenhof

unread,
Feb 1, 2002, 6:17:52 AM2/1/02
to
Sarat Babu Kamisetty wrote:

> I have a very simple program (pasted below) where a main thread creates
> two other threads and sleeps for a second. These two threads install a
> signal handler for same signal SIGURG and just block by calling select().
> Now the main thread sends a signal to each of those threads using
> thr_kill(). For some reason, only one thread is getting the signal and
> then it core dumps. My code, sample run and gdb analysis are below. Looks
> like when delivering the second signal, it is accessing a NULL pointer in
> kernel. I think after delivering the first signal kernel seems to be
> resetting the signal handler pointer to NULL and so while delivering the
> second signal it crashes. I was wondering if anyone have an explanation
> for this behavior ? Both the following approaches work to avoid the crash:
>
> 1) Changing the default_signal_handler() to call signal() function again
> to re-install the signal handler AND by putting sleep(1) between the two
> calls to thr_kill() in main avoids the crash.

Right. Because handlers installed by signal() go away when the signal is
delivered. That also means your second thr_kill(), for the other thread,
won't find a handler (which is why you need the sleep() in addition to
reloading the handler). Use sigaction() instead, which won't reset the
signal action to SIG_DFL before delivering it (unless you tell it to,
which, presumably, you wouldn't. ;-) )

> 2) Each thread installs signal handler for DIFFERENT signals

Yes, because a signal handler is a PROCESS resource. There's only one for
the address space, not one for each thread. When two threads install
separate handlers for the same signal, that's no different from a
traditional single threaded program calling signal() (or sigaction()) twice
in a row for the same signal number. "There can be only one."

Two threads CAN wait in sigwait() for the same signal, and then do whatever
they want (independently of each other) when they receive the signal. You
can thr_kill() or pthread_kill() whichever thread you want; but of course
they need to be waiting. You could also POLL for signals, by running with
the signal blocked, and calling sigtimedwait() when you want to check. Of
course that won't help if you want to block in select().

Either work out a solution that uses a single process-wide signal handler,
or work out something entirely different to wake your select(), such as
adding a special fd that's not normally used for anything... and simply
write to it when you want your select() to awaken.

For the purpose you describe, it wouldn't be hard to use a single handler.
In fact, there's no reason it needs to do anything at all except to return,
which would cause your select() calls to return with EINTR. You'd need to
be sure that each thread blocked that signal except around calls to
select(), though, or else you'd risk interrupting some other syscall.

/------------------[ David.B...@compaq.com ]------------------\
| Compaq Computer Corporation POSIX Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-----[ http://home.earthlink.net/~anneart/family/dave.html ]-----/

Alexander Terekhov

unread,
Feb 1, 2002, 9:05:30 AM2/1/02
to
David Butenhof <David.B...@compaq.com> wrote in message news:<QWu68.315$am1....@news.cpqcorp.net>...

Sorry, I am surely missing something because I fail
to understand how to prevent/avoid a "race condition"
with respect to select and NOOP signal handler...

How could one ensure that by the time a thread will
process this NOOP signal it will indeed be in the
state of "select shall return [EINTR]"?

Why could it (i.e. signal processing) NOT happen
too early or too late wrt select->EINTR and therefore
just have no effect whatsoever?

I also have a question about threads and *pselect*
which does have "const sigset_t *restrict sigmask"
argument:

"31190 If sigmask is not a null pointer, then the pselect()
function shall replace the signal mask of the
31191 process by the set of signals pointed to by sigmask
^^^^^^^ ?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?
before examining the descriptors, and shall
31192 restore the signal mask of the process before
^^^^^^^ ?!?!?!?!?!?!
returning."

Thanks!

regards,
alexander.

David Butenhof

unread,
Feb 1, 2002, 10:21:39 AM2/1/02
to
Alexander Terekhov wrote:

> David Butenhof <David.B...@compaq.com> wrote in message

>>

>> For the purpose you describe, it wouldn't be hard to use a single
>> handler. In fact, there's no reason it needs to do anything at all except
>> to return, which would cause your select() calls to return with EINTR.
>> You'd need to be sure that each thread blocked that signal except around
>> calls to select(), though, or else you'd risk interrupting some other
>> syscall.
>
> Sorry, I am surely missing something because I fail
> to understand how to prevent/avoid a "race condition"
> with respect to select and NOOP signal handler...
>
> How could one ensure that by the time a thread will
> process this NOOP signal it will indeed be in the
> state of "select shall return [EINTR]"?
>
> Why could it (i.e. signal processing) NOT happen
> too early or too late wrt select->EINTR and therefore
> just have no effect whatsoever?

And your point is? I never said trying to do this with signals was a good
idea. I merely gave you some pointers if you really want to try. Telling
people not to try to do what they want to do usually doesn't work very
well; though helping them to realize the complications sometimes will.

Your sample code, of course, has the same race, even if it worked as you'd
like. A signal could arrive between your signal() call and the select()
call, you know.

Perhaps you should explain a little more about what you intend to
accomplish? Are you expecting that your thr_kill() will somehow kill the
thread rather than merely breaking out of a select() with EINTR? (It won't;
nor is there anything more you can do in a signal handler, since functions
like thr_exit/pthread_exit aren't allowed at signal level.) The sample just
goes into a compute bound loop on return from select(), without even
testing the status, so I can't infer what you'd expect "real code" to do at
that point.

> I also have a question about threads and *pselect*
> which does have "const sigset_t *restrict sigmask"
> argument:
>
> "31190 If sigmask is not a null pointer, then the pselect()
> function shall replace the signal mask of the
> 31191 process by the set of signals pointed to by sigmask
> ^^^^^^^ ?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?!?
> before examining the descriptors, and shall
> 31192 restore the signal mask of the process before
> ^^^^^^^ ?!?!?!?!?!?!
> returning."

Sounds like pselect() would solve your race by allowing you to enable
SIGURG for the thread only within the pselect() call.

The fact that the man page says "process" is certain to be an error in the
man page. On any OS with anything approaching real thread support
(including Solaris) it's virtually impossible to affect other threads'
state as the text implies. There is no such thing as a process signal mask.

Of course, pselect(), while interesting, is a completely nonstandard and
nonportable function. If that's important to you. (Though the UI thread
interface you're using isn't very portable either.)

Alexander Terekhov

unread,
Feb 1, 2002, 2:05:46 PM2/1/02
to
David Butenhof <David.B...@compaq.com> wrote in message news:<pvy68.331$am1....@news.cpqcorp.net>...
> Alexander Terekhov wrote:
^^^^^^^^^^^^^^^^^^
(OP in this thread, who is using UI thread interfaces, etc is
Sarat Babu Kamisetty; NOT me! ;-)

>
> > David Butenhof <David.B...@compaq.com> wrote in message
>
> >>
> >> For the purpose you describe, it wouldn't be hard to use a single
> >> handler. In fact, there's no reason it needs to do anything at all except
> >> to return, which would cause your select() calls to return with EINTR.
> >> You'd need to be sure that each thread blocked that signal except around
> >> calls to select(), though, or else you'd risk interrupting some other
> >> syscall.
> >
> > Sorry, I am surely missing something because I fail
> > to understand how to prevent/avoid a "race condition"
> > with respect to select and NOOP signal handler...
> >
> > How could one ensure that by the time a thread will
> > process this NOOP signal it will indeed be in the
> > state of "select shall return [EINTR]"?
> >
> > Why could it (i.e. signal processing) NOT happen
> > too early or too late wrt select->EINTR and therefore
> > just have no effect whatsoever?
>
> And your point is? I never said trying to do this with signals was a good
> idea. I merely gave you some pointers if you really want to try. Telling
> people not to try to do what they want to do usually doesn't work very
> well; though helping them to realize the complications sometimes will.

*My* point is that perhaps *Sarat Babu Kamisetty* should take a look
at thread *cancellation* or an extra/special fd, already suggested by
you!

The quotes were taken from my copy of the Final
Draft 7 (IEEE Std 1003.1-2001)! Also, how about
this:

http://www.opengroup.org/onlinepubs/007904975/functions/pselect.html

> On any OS with anything approaching real thread support
> (including Solaris) it's virtually impossible to affect other threads'
> state as the text implies. There is no such thing as a process signal mask.
>
> Of course, pselect(), while interesting, is a completely nonstandard and
> nonportable function. If that's important to you. (Though the UI thread
> interface you're using isn't very portable either.)

"nonstandard and nonportable" given the link above? Or you
just mean that there are not so many SUS/Version *3* systems
currently available out there? ;-)

regards,
alexander.

David Butenhof

unread,
Feb 4, 2002, 8:55:34 AM2/4/02
to
Alexander Terekhov wrote:

> David Butenhof <David.B...@compaq.com> wrote in message
> news:<pvy68.331$am1....@news.cpqcorp.net>...
>> Alexander Terekhov wrote:
> ^^^^^^^^^^^^^^^^^^
> (OP in this thread, who is using UI thread interfaces, etc is
> Sarat Babu Kamisetty; NOT me! ;-)

Correction noted, Alexander. Sometimes this business of quoting multiple
levels of reply can get confusing. It's too easy to start replying to
various paragraphs without noting from which level of quote it originated.

So, sorry. ;-)

>> > David Butenhof <David.B...@compaq.com> wrote in message
>>
>> >> For the purpose you describe, it wouldn't be hard to use a single
>> >> handler. In fact, there's no reason it needs to do anything at all
>> >> except to return, which would cause your select() calls to return with
>> >> EINTR. You'd need to be sure that each thread blocked that signal
>> >> except around calls to select(), though, or else you'd risk
>> >> interrupting some other syscall.
>> >
>> > Sorry, I am surely missing something because I fail
>> > to understand how to prevent/avoid a "race condition"
>> > with respect to select and NOOP signal handler...
>> >
>> > How could one ensure that by the time a thread will
>> > process this NOOP signal it will indeed be in the
>> > state of "select shall return [EINTR]"?
>> >
>> > Why could it (i.e. signal processing) NOT happen
>> > too early or too late wrt select->EINTR and therefore
>> > just have no effect whatsoever?
>>
>> And your point is? I never said trying to do this with signals was a good
>> idea. I merely gave you some pointers if you really want to try. Telling
>> people not to try to do what they want to do usually doesn't work very
>> well; though helping them to realize the complications sometimes will.
>
> *My* point is that perhaps *Sarat Babu Kamisetty* should take a look
> at thread *cancellation* or an extra/special fd, already suggested by
> you!

I decided not to diverge into cancellation, but, yeah, you're right;
cancellation MAY be a viable alternative, depending on the actual intent.
Since you didn't actually MENTION the word "cancel" anywhere in the reply I
quoted, I'm not quite sure how anyone was to have inferred that noble
intent, though.

(And, again, I slid into replying to you as if you had posted the original
question, and I apologize for the lapse. ;-) )

Irrelevant note: as with POSIX, be careful about the subtle but critical
distinctions between "final draft" and "standard".

However, in this case, two things are evident:

1) This newfangled pselect() went into the specification entirely without
my notice. Interesting. The manpage format of the specification makes it
easier to overlook, as it's combined with select(). In reviewing such a
massive stack of documentation it's easy to miss something new combined
with something tediously familiar.

2) You've uncovered a bug in the specification that should be repaired in
the corrigenda. There is no such thing as a "process signal mask", and even
if it existed pselect() would have no business altering it. It needs to be
fixed to specify the THREAD signal mask.

>> On any OS with anything approaching real thread support
>> (including Solaris) it's virtually impossible to affect other threads'
>> state as the text implies. There is no such thing as a process signal
>> mask.
>>
>> Of course, pselect(), while interesting, is a completely nonstandard and
>> nonportable function. If that's important to you. (Though the UI thread
>> interface you're using isn't very portable either.)
>
> "nonstandard and nonportable" given the link above? Or you
> just mean that there are not so many SUS/Version *3* systems
> currently available out there? ;-)

OK, so you've got me there. It's "standard". Fine.

However; even though you're reading from the document designated as UNIX
2001, POSIX 1003.1-2001, and soon ISO/IEC 9945-1:2001, which is about as
"standard" as you can get, don't let that lead you to make any unfounded
assumptions about portability. The new functions in this standard aren't
likely to be widely available for quite some time.

Alexander Terekhov

unread,
Feb 5, 2002, 4:56:01 AM2/5/02
to
David Butenhof <David.B...@compaq.com> wrote in message news:<Uww78.387$am1....@news.cpqcorp.net>...
[...]

> > *My* point is that perhaps *Sarat Babu Kamisetty* should take a look
> > at thread *cancellation* or an extra/special fd, already suggested by
> > you!
>
> I decided not to diverge into cancellation, but, yeah, you're right;
> cancellation MAY be a viable alternative, depending on the actual intent.
> Since you didn't actually MENTION the word "cancel" anywhere in the reply I
> quoted, I'm not quite sure how anyone was to have inferred that noble
> intent, though.

Yeah, I should have been more specific with respect
to cancel, sorry. Partly that is just because I got
concerned/puzzled with this race condition/interrupting
select() call in the context of thread cancellation
discussion[1] not so long time ago and just forgot
(not realized) that it might not be clear to others.

[...]


> However, in this case, two things are evident:
>
> 1) This newfangled pselect() went into the specification entirely without
> my notice. Interesting. The manpage format of the specification makes it
> easier to overlook, as it's combined with select(). In reviewing such a
> massive stack of documentation it's easy to miss something new combined
> with something tediously familiar.
>
> 2) You've uncovered a bug in the specification that should be repaired in
> the corrigenda. There is no such thing as a "process signal mask", and even
> if it existed pselect() would have no business altering it. It needs to be
> fixed to specify the THREAD signal mask.

I've just sent you the list referring to more "process
signal mask"/"signal mask of the process"/etc appearances
in the POSIX.1-2001/SUSv3 standard PDFs[2] as "Reply to
Sender Only".

BTW, it would be nice to "commit" the following
pthread_once-"in"/pthread_cond_signal-
pthread_cond_broadcast-"out" correction as well:

http://groups.google.com/groups?as_umsgid=c29b5e33.0202010305.5c12381d%40posting.google.com
(unless, of course, I've got something wrong here ;-)

regards,
alexander.

[1] http://groups.google.com/groups?as_umsgid=3C2CC94A.503FB8E2%40web.de

[2] http://www.opengroup.org/publications/mem-online/c950/c950.pdf
http://www.opengroup.org/publications/mem-online/c951/c951.pdf
http://www.opengroup.org/publications/mem-online/c952/c952.pdf
http://www.opengroup.org/publications/mem-online/c953/c953.pdf
http://www.opengroup.org/publications/mem-online/c610/c610.pdf

Registration and free membership to get access:

http://www.opengroup.org/austin

David Butenhof

unread,
Feb 6, 2002, 7:56:00 AM2/6/02
to
Alexander Terekhov wrote:

> I've just sent you the list referring to more "process
> signal mask"/"signal mask of the process"/etc appearances
> in the POSIX.1-2001/SUSv3 standard PDFs[2] as "Reply to
> Sender Only".

That's nice, but I have no control over the specification, really. I'm just
one of the people who talks about it a lot. ;-)

Of course, I can enter an Aardvark (formal SUS problem report) pointing out
the problem. But, by the same token, so can you. I would encourage anyone
who notices a problem like this to report it. Don't worry too much about
whether you're "absolutely provably right". Others, often including me,
will get involved in discussing the problem reports if there's any question
at all, and a decision will be made.

> BTW, it would be nice to "commit" the following
> pthread_once-"in"/pthread_cond_signal-
> pthread_cond_broadcast-"out" correction as well:

Yes, it's quite true that applications can have no useful dependencies on
signal or broadcast for data coherency. The defined application data
involved is the PREDICATE, and other shared data protected by the
associated mutex. The condition signal/broadcast really affects only the
condition variable (and its queue of waiting threads), which is internal
data. In a way, the inclusion of those function is more a matter of
pragmatics; the thread library WILL require data coherency support within
those functions, or they cannot operate correctly. Therefore, the question
isn't really whether an application requires knowledge of that coherency;
but really whether the knowledge should be hidden from it. We listed them
because we figured the application developer might as well know the simple
and inevitable truth. ;-)

However, pthread_once() should certainly join them. It IS, and must be,
sufficient merely to call pthread_once() for a given init routine in each
thread, without requring the application to resort to additional
synchronization. Or else what would be the point of the silly function? I
find it hard to get too excited about this, though. I had trouble agreeing
with the requirement in draft 5 for static mutex initialization; until
someone pointed out that it made pthread_once() irrelevant. I liked that;
and ever since I've been telling people to avoid pthread_once() unless they
really want to use it and it does EXACTLY what they want. (Which I think is
rarely the case.)

So, if I don't see anyone else point out the "process signal mask" bug
soon, I'll write it up. I see no point in removing signal and broadcast
from the memory list, and I'm not going to champion the cause. I'm not at
all sure I'd support it, because I could imagine arcane applications that
might be broken by a change -- if I actually thought any implementation
could take advantage of the change by making the operations FAIL to
synchronize memory. As for pthread_once()? Yeah, it ought to be added. If I
do click over to the problem report page, I'd probably write that up, too.

0 new messages