pthread_atfork handler - what can they do?

1177 views
Skip to first unread message

Ivan Skytte Jørgensen

unread,
Jul 16, 2003, 1:48:34 PM7/16/03
to
What is the handler functions specified in pthread_atfork() allowed to
do?

I was using the child handler of pthread_atfork() to re-launch a few
daemon threads but the results was not what I expected.

The documentation of pthread_atfork() does say that the handlers can
only use async-signal-safe functions because the handlers must at
least be able to lock and unlock mutexes. But I get varying result
when using pthread_create from the handlers().

The following test program illustrates the differences:
---snip---
#include <pthread.h>
#include <sys/time.h>
#include <unistd.h>
#include <stdio.h>

extern "C" {
static void *thread_function(void *pv) {
printf("Hello from thread\n"); fflush(stdout);
timeval tv;
tv.tv_sec=1;
tv.tv_usec=0;
select(0,NULL,NULL,NULL,&tv);
printf("Hello again from thread\n"); fflush(stdout);

return 0;
}

static void func_after_child() {
pthread_t tid;
pthread_create(&tid,0,thread_function,0);
}
} //extern C

int main(void) {
pthread_atfork(0,0,func_after_child);

if(fork()==0) {
printf("Hello from main in child\n"); fflush(stdout);
sleep(10);
printf("Hello again from main in child\n"); fflush(stdout);
}

return 0;
}
---snip---

And the results are:
Linuxthreads 0.9:
Hello from main in child
Hello again from thread
Hello again from main in child
<<Hangs - does not even return to prompt>>

Solaris 8 (n:m scheduler)
Hello from main in child
Hello from thread
<<returns to prompt>>
Hello again from thread
Hello again from main in child

Solaris 8 (1:1 scheduler)
Hello from main in child
Hello from thread
Hello again from thread
Hello again from main in child

HP-UX 11.0
Hello from thread
Hello from main in child
<<returns to prompt>>
Hello again from main in child
(but the simple test program uses 2 CPU seconds)

HP-UX 11.11
./pthread_atfork_pthread_create
Hello from thread
Hello from main in child
<<returns to prompt>>
Hello again from main in child
(but the simple test program uses 8 CPU seconds)

So my question is:
What is a pthread_atfork() handler allowed to do?

Patrick TJ McPhee

unread,
Jul 17, 2003, 12:00:24 PM7/17/03
to
In article <3F158FF2...@image.dk>,
Ivan Skytte Jørgensen <isj@dont_forget_to_remove_this.image.dk.invalid> wrote:

% What is the handler functions specified in pthread_atfork() allowed to
% do?

Call asynch-signal-safe functions.

% I was using the child handler of pthread_atfork() to re-launch a few
% daemon threads but the results was not what I expected.

Which cannot be done using asynch-signal-safe functions.

% The documentation of pthread_atfork() does say that the handlers can
% only use async-signal-safe functions because the handlers must at

And since you knew this, it's surprising that you didn't get the results
you expected.

You are advised not to use fork() in a multi-threaded program the
way you would use it in a single-threaded one. Don't try to carry on,
just exec a new program.

Since you've decided to ignore that advice, I'll suggest creating the
new threads in the child process after the return from fork(). Don't
try to do it in the handlers, that's not what they're for.
--

Patrick TJ McPhee
East York Canada
pt...@interlog.com

Loic Domaigne

unread,
Jul 17, 2003, 8:47:26 PM7/17/03
to
Hi!

[ Ivan Skytte Jørgensen wrote: ]

> > What is the handler functions specified in pthread_atfork() allowed to

> > do?

Oh yes, that's a *good* question...


> > The documentation of pthread_atfork() does say that the handlers can

> > only use async-signal-safe functions because the handlers must at

> > least be able to lock and unlock mutexes.

This description is not logic: to lock/unlock mutexes, you must be
able to use thing like pthread_mutex_lock() and
pthread_mutex_unlock(), and these functions (as all Posix.1c functions
actually) are precisely not async-signal-safe...

Maybe a copy-paste problem??


[ Patrick TJ McPhee wrote: ]


> % What is the handler functions specified in pthread_atfork() allowed to
> % do?
>
> Call asynch-signal-safe functions.

That's correct, but not the complete list. At least
pthread_mutex_lock() and pthread_mutex_unlock are allowed (refer e.g.
to Programming with POSIX threads, §6.1.1 pp 199-203).

The motivation behind pthread_atfork() was to maintain state
consistency accross a fork(). The solution may e.g. require locking
mutex variables during the fork().

Now, here comes the trouble: fork() is an async-signal-safe function.
This would mean that some - not async-safe - functions like
pthread_mutex_lock() or pthread_mutex_unlock() might be called in
signal handler because a fork handler is executed within a signal
catching function. That's where the recommandation "call only
async-signal-safe functions in fork handler" comes from...

AFAIK, that inconsistency causes a lot of headaches to the POSIX guys.
IEEE Std 1003.1, 2003 Edition states:

| While the fork() function is async-signal-safe, there is no way for
| an implementation to determine whether the fork handlers established
| by pthread_atfork() are async-signal-safe. The fork handlers may
| attempt to execute portions of the implementation that are not
| async-signal-safe, such as those that are protected by mutexes,
| leading to a deadlock condition. It is therefore undefined for the
| fork handlers to execute functions that are not async-signal-safe
| when fork() is called from a signal handler.

When you call fork() outside a signal catching function,
pthread_mutex_lock() and pthread_mutex_unlock() can be used in fork
handler. I believe this might also be safe to call the following
functions:

pthread_mutex_trylock()
pthread_mutex_timedlock()
pthread_cond_wait()
pthread_cond_timedwait()

This list is _WITHOUT ANY GUARANTEE_ (IEEE Std 1003.1, 2003 Edition
speaks about "locking and releasing mutexes", and these functions are
associated to these operations. So I would expect them to be present.
However, I have only seen examples using pthread_mutex_lock() and
pthread_mutex_unlock()).

Whether Posix.1c nor IEEE Std 1003.1, 2003 Edition talk explicitly
(well, at least from what I have read/understood) about other Posix.1c
that might be safetely used in fork handler. This might mean: "Using
other Pthreads functions are undefined".



> % I was using the child handler of pthread_atfork() to re-launch a few
> % daemon threads but the results was not what I expected.
>

> Which cannot be done using async-signal-safe functions.

According to the previous explanations, using pthread_create() in fork
handler might have an undefined behavior...


> You are advised not to use fork() in a multi-threaded program the
> way you would use it in a single-threaded one. Don't try to carry on,
> just exec a new program.
>
> Since you've decided to ignore that advice, I'll suggest creating the
> new threads in the child process after the return from fork(). Don't
> try to do it in the handlers, that's not what they're for.

Yes, Yes and Yes!


Loic.

Alexander Terekhov

unread,
Jul 18, 2003, 5:50:27 AM7/18/03
to

Loic Domaigne wrote:
[...]

> This description is not logic: to lock/unlock mutexes, you must be
> able to use thing like pthread_mutex_lock() and
> pthread_mutex_unlock(), and these functions (as all Posix.1c functions
> actually) are precisely not async-signal-safe...
>
> Maybe a copy-paste problem??

Loic, it's all in the {google} archives of this newsgroup. Really.

The "right way" to do a synchronous fork is to have all threads safely
meet each other at "fork points" and replicate address space with all
threads.

http://google.com/groups?q=pthread_forkall_point

The problem is that the standard insists on async-signal-safe forking,
to begin with. ;-)

regards,
alexander.

David Butenhof

unread,
Jul 23, 2003, 9:24:26 AM7/23/03
to
Ivan Skytte Jørgensen wrote:

> What is the handler functions specified in pthread_atfork() allowed to
> do?

The real answer is that pthread_atfork() is a completely useless and stupid
mechanism that was a well intentioned but ultimately pointless attempt to
carve a "back door" solution out of an inherently insoluable design
conflict.

Forget even about signals and async-signal-safety for a moment, though that
definitely piles an extra few tons of straw on the broken camel's back.
(Where DID the silly expression "the straw that broke the camel's back"
ever originate, anyway? Why the heck would anyone even imagine such a
thing!?)

In a threaded process, fork() is an asynchronous event that cannot
practically be synchronized. Yet it captures the complete state of a
process and attempts to continue execution in a new process as if nothing
had happened. There were two proposed models for handling this: for the new
process to start up single-threaded, so that it knows what happened; or for
the new process to start up with clones of all the threads in the parent
and not way to inform any but the caller of fork() that this has happened.
("forkone" vs "forkall".) Both models have severe, fundamental, and
unresolvable problems, so POSIX went with the simpler. The real solution
would be to drop fork(), but that just wasn't practical. POSIX tried to
address this later by adding posix_spawn()... but it was really too late
even aside from the fact that ADDING something doesn't solve the existing
problem.

On the whole, it might have been better simply to disallow fork() when there
are any other threads running or ready or when any locks are held. But
that'd be hard to implement and hard to use.

And when we added pthread_atfork(), we completely neglected to fix up a
critical statement in fork(), that the child process is allowed to call
only async-signal-safe functions between the return from fork() and a call
to an exec*() family function. That is, POSIX declares it illegal to do
anything involving threads or synchronization in the child anyway. Not only
can you not unlock any mutexes you might have locked in PREPARE handlers;
you can't do anything that would require them to be locked (or unlocked) in
the first place.

I noticed this sometime later and brought it up in the POSIX interpretations
committee. As we investigated the consequences, we came to realize that
pthread_atfork() is, at best, an unreliable and incomplete mechanism that
nevertheless MIGHT sometimes be useful for SOME applications, but could
never be perfect or complete without fundamental and broad changes to the
POSIX architecture. Our final conclusion was that we might have omitted
pthread_atfork() entirely if we'd considered this early enough; but at the
time of the discussion it was wonderful blind luck that we'd overlooked
that little bit of text saving implementations from having to try to
guarantee the impossible.

A bit ironically, you CAN legally and constructively use pthread_atfork();
but not to protect THREAD resources across fork. You could use atfork
handlers to open or close files, to read from or write to files (using
read() and write(), of course, not stdio!). Anything, in fact, in the list
of async-signal-safe functions. You just can't legally use it for any of
the purposes for which it was intended. ;-)

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\----[ http://homepage.mac.com/dbutenhof/Threads/Threads.html ]---/

Ivan Skytte Jørgensen

unread,
Jul 26, 2003, 5:03:25 AM7/26/03
to
Ivan Skytte Jørgensen wrote:
>
> The documentation of pthread_atfork() does say that the handlers can
> only use async-signal-safe functions because the handlers must at
> least be able to lock and unlock mutexes.

I made a slight error there. The documentation does NOT say that
pthread_atfork handlers are limited to async-signal-safe function.

The rationale for pthread_atfork() explicitly say "The expected usage
is that the prepare handler acquires all mutex locks and the other two
fork handlers release them"

The response from Loic Domaigne shows that "the POSIX guys" know this
problem.

David Butenhof goes on and provides more details of why
pthread_atfork() is inconsistent.


It basically boils down to what pthread_atfork() handlers are intended
to do they cannot do (portably/reliably).

My experiments show that pthread_atfork handlers can do
pthread_mutex_lock/unlock when the fork() is done in a controlled
manner, but you cannot assume much beyond that.

The thing that initially cought my attention was that it almost worked
on HP-UX - the only symptom was that any system call that specified a
timeout would use 100% CPU until the timeout expired.


Anyway, the issue is solved for my program because I luckily could
modify the main program to call a reinitialize-after-fork function
that spawns the background threads again.


David Butenhof wrote:
> Where DID the silly expression "the straw that broke the camel's back"
> ever originate, anyway?

You may find http://www.jimloy.com/logic/camel.htm amusing.


Regards,
Ivan

David Butenhof

unread,
Jul 28, 2003, 8:56:33 AM7/28/03
to
Ivan Skytte Jørgensen wrote:

> Ivan Skytte Jørgensen wrote:
>>
>> The documentation of pthread_atfork() does say that the handlers can
>> only use async-signal-safe functions because the handlers must at
>> least be able to lock and unlock mutexes.
>
> I made a slight error there. The documentation does NOT say that
> pthread_atfork handlers are limited to async-signal-safe function.

Yes, it does. Or at least, the CHILD handlers are, because they occur in the
child after forking, and are therefore subject to fork's async-safety rule.
Now, there's no rule that CHILD handlers need to be the converse of PREPARE
handlers, so in theory you might lock mutexes in the PREPARE handler and
just ignore them in the CHILD handler. There would be some value in program
consistency (at least in some cases) to having PREPARE and PARENT handlers
that locked and unlocked mutexes to ensure that the child's state is
consistent. Unfortunately, there's no legal/portable/reliable way for the
child process to unlock. But it CAN depend on having consistent application
state, if you've properly used PREPARE/PARENT handlers and can make the
child code paths avoid the mutexes.



> The rationale for pthread_atfork() explicitly say "The expected usage
> is that the prepare handler acquires all mutex locks and the other two
> fork handlers release them"

Unfortunately, RATIONALE is explicitly not part of the standard. It's just
commentary, history, and miscellaneous ramblings. Nowhere in THE STANDARD
is it suggested that one may/should/could manipulate mutexes in the atfork
handlers. Weird, but true.

> It basically boils down to what pthread_atfork() handlers are intended
> to do they cannot do (portably/reliably).
>
> My experiments show that pthread_atfork handlers can do
> pthread_mutex_lock/unlock when the fork() is done in a controlled
> manner, but you cannot assume much beyond that.

The earlier paragraph suggests you know this, but just to accentuate it for
everyone... Your expermiments can only show that YOUR particular use
HAPPENS to work in the exact circumstances you've tested, on a PARTICULAR
version of a particular system. Don't read too much into that. When the
standard says that a standard function cannot be used for the intended
purpose, you're starting out from a rather bad position. ;-)

> The thing that initially cought my attention was that it almost worked
> on HP-UX - the only symptom was that any system call that specified a
> timeout would use 100% CPU until the timeout expired.

Nice that was the only symptom. I suppose there are even situations where
one might not consider that a serious problem. But somehow this reminds me
of a line from a book, where the character describes having seen a
competition offroad motorcycle fall from a cliff and suffer no damage other
than a slightly bent fork and a dead rider... ;-)

> Anyway, the issue is solved for my program because I luckily could
> modify the main program to call a reinitialize-after-fork function
> that spawns the background threads again.

Again, while this may work on a particular system under specific
circumstances, the standard says you can only call async-signal safe
functions in the forked child prior to a call to an exec*() function. That
means you cannot "reinitialize" most things, and you certainly cannot
create new threads.

Yes, I like that. "The straw that smashed the camel flat." I'll have to
remember to start using that. It's so much more, uh, "graphic". ;-)

Reply all
Reply to author
Forward
0 new messages