I was wanting to create a small C++ abstraction on top of win32 and
POSIX process creation. I want the API to have a way to join/wait for
the child, and I want a way to obtain the child's exit code after a
successful join/wait.
I'm wondering what best practices there are for cleanup on the error
code paths. It's possible that your child is badly behaved and won't
finish up short of sending it a SIGTERM or SIGKILL. Should you
prearrange for it to be re-parent-able with the fork+fork trick?
Should you code by default your error code paths to send SIGTERM? A
SIGTERM followed by a SIGKILL if it doesn't play nicely and die in a
short period of time? Is this too much of a vague question to even
answer?
Also, I was somewhat dismayed at finding that there is no easy way to
"detach" a child. After a fork, the parent must die or wait/waitpid on
the child, or you will have a resource leak aka a zombie process.
Wouldn't it be a sensible addition to the kernel to add some function
which says "I don't care if my child with this pid is running anymore,
and I don't care about its exit code. Once it finishes, remove it from
the process table as if I had called wait/waitpid. Equivalently: re-
parent it to init right now."
Note that the signal handler approach doesn't work AFAIK. If you have
a SIG_CHLD signal handler that just waits on any child, then you
cannot (easily) get child exit codes as the signal handler may have
"wait"ed on it before you. Right?
I'm just trying to figure out what might work as a nice general-
purpose process creation API that allows no-fail no-leak "non-
blocking" error code paths. However, I'm afraid I'm asking for magic.
The only approach that seems to work guaranteed is to send a SIGKILL
on the error code paths, but of course this won't catch any children
of the children. Still, given the limitations, my gut says it seems
nice to give the user of my API the option to "detach" a running child
process, but I must jump through some hoops and allocate an additional
process to make this work (ala the fork+fork trick). I'm not even sure
what are the right questions to ask. Can anyone help me out, please?
Joshua Maurice wrote:
> Also, I was somewhat dismayed at finding that there is no easy way to
> "detach" a child. After a fork, the parent must die or wait/waitpid on
> the child, or you will have a resource leak aka a zombie process.
> Wouldn't it be a sensible addition to the kernel to add some function
> which says "I don't care if my child with this pid is running anymore,
> and I don't care about its exit code. Once it finishes, remove it from
> the process table as if I had called wait/waitpid. Equivalently: re-
> parent it to init right now."
Take a look at the SA_NOCLDWAIT flag for sigaction().
> Note that the signal handler approach doesn't work AFAIK. If you have
> a SIG_CHLD signal handler that just waits on any child, then you
> cannot (easily) get child exit codes as the signal handler may have
> "wait"ed on it before you. Right?
POSIX recommends that applications should never "wait for any child"
(whether in a signal handler or not). Specifically, it says:
"Calls to wait() will collect information about any child process.
This may result in interactions with other interfaces that may be
waiting for their own children (such as by use of system()).
For this and other reasons it is recommended that portable
applications not use wait(), but instead use waitpid(). For these
same reasons, the use of waitpid() with a pid argument of -1, and
the use of waitid() with the idtype argument set to P_ALL, are
also not recommended for portable applications."
In article <f9e1b966-cb56-4f26-97fc-bd4a9c49b...@r2g2000pbs.googlegroups.com>,
Joshua Maurice <joshuamaur...@gmail.com> wrote:
> Also, I was somewhat dismayed at finding that there is no easy way to
> "detach" a child. After a fork, the parent must die or wait/waitpid on
> the child, or you will have a resource leak aka a zombie process.
> Wouldn't it be a sensible addition to the kernel to add some function
> which says "I don't care if my child with this pid is running anymore,
> and I don't care about its exit code. Once it finishes, remove it from
> the process table as if I had called wait/waitpid. Equivalently: re-
> parent it to init right now."
The standard idiom for this is to fork twice. The original parent forks a child, the child forks a grandchild, and then the child exits. The parent waits for the child (no need for a SIGCHLD handler, since this should be almost instantaneous), and then the grandchild is inherited by init.
-- Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
> Joshua Maurice wrote:
> > Also, I was somewhat dismayed at finding that there is no easy way to
> > "detach" a child. After a fork, the parent must die or wait/waitpid on
> > the child, or you will have a resource leak aka a zombie process.
> > Wouldn't it be a sensible addition to the kernel to add some function
> > which says "I don't care if my child with this pid is running anymore,
> > and I don't care about its exit code. Once it finishes, remove it from
> > the process table as if I had called wait/waitpid. Equivalently: re-
> > parent it to init right now."
> Take a look at the SA_NOCLDWAIT flag for sigaction().
> > Note that the signal handler approach doesn't work AFAIK. If you have
> > a SIG_CHLD signal handler that just waits on any child, then you
> > cannot (easily) get child exit codes as the signal handler may have
> > "wait"ed on it before you. Right?
> POSIX recommends that applications should never "wait for any child"
> (whether in a signal handler or not). Specifically, it says:
> "Calls to wait() will collect information about any child process.
> This may result in interactions with other interfaces that may be
> waiting for their own children (such as by use of system()).
> For this and other reasons it is recommended that portable
> applications not use wait(), but instead use waitpid(). For these
> same reasons, the use of waitpid() with a pid argument of -1, and
> the use of waitid() with the idtype argument set to P_ALL, are
> also not recommended for portable applications."
Yes, of course, sorry. Miscommunication here.
Let's see... If you use sigaction and set SA_NOCLDWAIT for SIGCHLD,
can you still call waitpid on children and get exit codes? I assume
not. If child processes leave the kernel process table as soon as they
finish, then there isn't a way to get the exit code et al. I assume
this is more likely true if the child finishes quickly before the
thread in the parent gets a chance to enter waitpid.
Again, my major question seems to be that it's not easy, for a parent
executable which may live on for a long time, to get the exit code of
a child process and be able to properly clean up on error code paths.
On the error code paths, if you don't call wait or waitpid on it, nor
use a signal handler to clean up, then you're going to have a resource
leak in the form of a zombie process. Having such a potentially long
blocking or potential-hang call to wait / waitpid on an error path
seems like a bad idea. And as far as I can tell, you can't use a
signal handler solution if you want to be able to reliably get the
exit code. Dittos for SA_NOCLDWAIT. The only reasonable solutions I
see are always use a stub intermediary and require the app code to
waitpid on it, require the APP code to kill SIGKILL the child process
on error code paths, or even uglier solutions.
PS: SA_NOCLDWAIT appears to be a minor helper thing to avoid writing a
10-20 line signal handler that calls waitpid(-1, <whatever that
nonblocking flag is>) in a loop in the signal handler. Am I right? I
forget if that nonblocking flag is POSIX and/or portable.
> In article
> <f9e1b966-cb56-4f26-97fc-bd4a9c49b...@r2g2000pbs.googlegroups.com>,
> Joshua Maurice <joshuamaur...@gmail.com> wrote:
> > Also, I was somewhat dismayed at finding that there is no easy way to
> > "detach" a child. After a fork, the parent must die or wait/waitpid on
> > the child, or you will have a resource leak aka a zombie process.
> > Wouldn't it be a sensible addition to the kernel to add some function
> > which says "I don't care if my child with this pid is running anymore,
> > and I don't care about its exit code. Once it finishes, remove it from
> > the process table as if I had called wait/waitpid. Equivalently: re-
> > parent it to init right now."
> The standard idiom for this is to fork twice. The original parent forks
> a child, the child forks a grandchild, and then the child exits. The
> parent waits for the child (no need for a SIGCHLD handler, since this
> should be almost instantaneous), and then the grandchild is inherited by
> init.
Yes, but then you can't get the exit code and the other exit
information. I would like to know if the child reported that it failed
horribly or something. I could maintain a stub intermediary whose
correctness I can guarantee to accomplish the same effect and get the
exit code et al, but that seems .. inelegant.
> On May 8, 8:13 am, Barry Margolin<bar...@alum.mit.edu> wrote:
>> In article
>> <f9e1b966-cb56-4f26-97fc-bd4a9c49b...@r2g2000pbs.googlegroups.com>,
>> Joshua Maurice<joshuamaur...@gmail.com> wrote:
>>> Also, I was somewhat dismayed at finding that there is no easy way to
>>> "detach" a child. After a fork, the parent must die or wait/waitpid on
>>> the child, or you will have a resource leak aka a zombie process.
>>> Wouldn't it be a sensible addition to the kernel to add some function
>>> which says "I don't care if my child with this pid is running anymore,
>>> and I don't care about its exit code. Once it finishes, remove it from
>>> the process table as if I had called wait/waitpid. Equivalently: re-
>>> parent it to init right now."
>> The standard idiom for this is to fork twice. The original parent forks
>> a child, the child forks a grandchild, and then the child exits. The
>> parent waits for the child (no need for a SIGCHLD handler, since this
>> should be almost instantaneous), and then the grandchild is inherited by
>> init.
> Yes, but then you can't get the exit code and the other exit
> information. I would like to know if the child reported that it failed
> horribly or something. I could maintain a stub intermediary whose
> correctness I can guarantee to accomplish the same effect and get the
> exit code et al, but that seems .. inelegant.
The best you can probably do is have the child check the grandchild is running before exiting. Still messy (how long to wait for example), but better than nothing.
If you want to know the status of the grandchild, you could use some form of shared semaphore.
> > On May 8, 8:13 am, Barry Margolin<bar...@alum.mit.edu> wrote:
> >> In article
> >> <f9e1b966-cb56-4f26-97fc-bd4a9c49b...@r2g2000pbs.googlegroups.com>,
> >> Joshua Maurice<joshuamaur...@gmail.com> wrote:
> >>> Also, I was somewhat dismayed at finding that there is no easy way to
> >>> "detach" a child. After a fork, the parent must die or wait/waitpid on
> >>> the child, or you will have a resource leak aka a zombie process.
> >>> Wouldn't it be a sensible addition to the kernel to add some function
> >>> which says "I don't care if my child with this pid is running anymore,
> >>> and I don't care about its exit code. Once it finishes, remove it from
> >>> the process table as if I had called wait/waitpid. Equivalently: re-
> >>> parent it to init right now."
> >> The standard idiom for this is to fork twice. The original parent forks
> >> a child, the child forks a grandchild, and then the child exits. The
> >> parent waits for the child (no need for a SIGCHLD handler, since this
> >> should be almost instantaneous), and then the grandchild is inherited by
> >> init.
> > Yes, but then you can't get the exit code and the other exit
> > information. I would like to know if the child reported that it failed
> > horribly or something. I could maintain a stub intermediary whose
> > correctness I can guarantee to accomplish the same effect and get the
> > exit code et al, but that seems .. inelegant.
> The best you can probably do is have the child check the grandchild is
> running before exiting. Still messy (how long to wait for example), but
> better than nothing.
> If you want to know the status of the grandchild, you could use some
> form of shared semaphore.
Yes, but if I understand you correctly, this shared semaphore requires
cooperation with the child executable. Suppose I want to exec g++ or
the visual studios command line compiler, or some other executable
whose code I can't change. Your idea wouldn't work.
I think I would like a "detachprocess" call, aka "reparent my child to
init now", but I'm still not sure what I want to do by default in the
error code path for child processes. Suppose we have something like
make, which spawns processes to do jobs, and it doesn't know at all
what these jobs do. Suppose further we encounter an error in the
parent, or we want to cancel the "job", or something, but leave other
jobs running. In the cleanup code path, I think the theoretical
options are:
1- ignore the child and have a zombie process leak
2- call a system call to "detach" or "reparent to init now"
3- wait for the child to finish
4- kill child, possibly with SIGTERM before a SIGKILL to be nice
Like for a generic job server, if you want to cancel the job, should
you ignore the child processes of that job, try to kill them with
SIGTERM, actually kill them with SIGTERM, or wait possibly without end
for the child to finish? I think any approach would require user
intervention because of the lack of an effective "kill child and all
children". (Hell, nevermind, I don't want to open that can of worms.
You can't nest process groups, which makes them far less useful. For
example, the similar situation exists on win32, and I cannot create a
process group for a job that includes the visual studios compiler
because IIRC the compiler itself tries to create a process group, and
you can't nest process groups.)
I don't like any of them. The more I think about it, the more it seems
heavily dependent on the particular application. Still, option 2 is
unavailable to me now short of some rather annoying code I'd have to
write. To get option 2 with the current POSIX API, I think I'd have to
write a separate executable for the stub process because I would need
2 threads - one to read on a pipe from the parent, and a second thread
to waitpid on the child - and you can't create threads between fork
and exec IIRC, and thus you need a separate executable. (Or reuse the
current executable but that seems excessively hacky.)
> I think I would like a "detachprocess" call, aka "reparent my child to
> init now", but I'm still not sure what I want to do by default in the
> error code path for child processes. Suppose we have something like
> make, which spawns processes to do jobs, and it doesn't know at all
> what these jobs do. Suppose further we encounter an error in the
> parent, or we want to cancel the "job", or something, but leave other
> jobs running. In the cleanup code path, I think the theoretical
> options are:
> 1- ignore the child and have a zombie process leak
> 2- call a system call to "detach" or "reparent to init now"
> 3- wait for the child to finish
> 4- kill child, possibly with SIGTERM before a SIGKILL to be nice
> Like for a generic job server, if you want to cancel the job, should
> you ignore the child processes of that job, try to kill them with
> SIGTERM, actually kill them with SIGTERM, or wait possibly without end
> for the child to finish? I think any approach would require user
> intervention because of the lack of an effective "kill child and all
> children". (Hell, nevermind, I don't want to open that can of worms.
> You can't nest process groups, which makes them far less useful. For
> example, the similar situation exists on win32, and I cannot create a
> process group for a job that includes the visual studios compiler
> because IIRC the compiler itself tries to create a process group, and
> you can't nest process groups.)
> I don't like any of them. The more I think about it, the more it seems
> heavily dependent on the particular application. Still, option 2 is
> unavailable to me now short of some rather annoying code I'd have to
> write. To get option 2 with the current POSIX API, I think I'd have to
> write a separate executable for the stub process because I would need
> 2 threads - one to read on a pipe from the parent, and a second thread
> to waitpid on the child - and you can't create threads between fork
> and exec IIRC, and thus you need a separate executable. (Or reuse the
> current executable but that seems excessively hacky.)
Actually, I have a new idea that I'm starting to like. My process
creation abstraction library can expose a process_detach call which
will add it to a global list. A single background thread will
periodically wake up, call nonblocking waitpid on everything in this
global list, and remove the terminated child pids. I like this
background thread more than having an extra process for each processed
spawned - that would be annoying to me as a user looking at the output
of ps and trying to figure out what needs to die vs not.
> Actually, I have a new idea that I'm starting to like. My process
> creation abstraction library can expose a process_detach call which
> will add it to a global list. A single background thread will
> periodically wake up, call nonblocking waitpid on everything in this
> global list, and remove the terminated child pids. I like this
> background thread more than having an extra process for each processed
> spawned - that would be annoying to me as a user looking at the output
> of ps and trying to figure out what needs to die vs not.
You could have a look and see how gmake manages it child processes.
On May 8, 3:10 pm, Ian Collins <ian-n...@hotmail.com> wrote:
> On 05/ 9/12 09:50 AM, Joshua Maurice wrote:
> > Actually, I have a new idea that I'm starting to like. My process
> > creation abstraction library can expose a process_detach call which
> > will add it to a global list. A single background thread will
> > periodically wake up, call nonblocking waitpid on everything in this
> > global list, and remove the terminated child pids. I like this
> > background thread more than having an extra process for each processed
> > spawned - that would be annoying to me as a user looking at the output
> > of ps and trying to figure out what needs to die vs not.
> You could have a look and see how gmake manages it child processes.
Make is perhaps a bad example. It isn't a "job server". It runs a
single "job". That is, an external cancel will cancel "the whole
process". There's no finer "jobs" in make. In this case, it's fine if
you have zombie processes for a little because the make process is
about to die, and the children will get reparented to init. The
question becomes much more interesting if you have a parent process
that is doing a bunch of independent "jobs", where you can cancel one
without affecting the other. This cancel can be either user initiated
or initiated from an internal error.
I'm just trying to write a general purpose process creation API, and
I'm wondering what I should put as the Process object's destructor, or
equivalently what function(s) should I provide for error cleanup? I'd
like to allow the user of the library to make the choice whether the
child process needs to die, whether we need to wait on it, or whether
we should "detach" and forget about it. I still guess those are all
valid strategies - I think - and I'd like to offer them all.
"If a recipe fails and the `-k' or `--keep-going' option was not given (see Summary of Options), make aborts execution. If make terminates for any reason (including a signal) with child processes running, it waits for them to finish before actually exiting."
> I'm just trying to write a general purpose process creation API, and I'm > wondering what I should put as the Process object's destructor, or > equivalently what function(s) should I provide for error cleanup? I'd > like to allow the user of the library to make the choice whether the > child process needs to die, whether we need to wait on it, or whether we > should "detach" and forget about it. I still guess those are all valid > strategies - I think - and I'd like to offer them all.
I'll assume that the job server won't die unexpectedly before all of its children terminate.
Keep a table of all forked processes (that is, forked by the job server explicitly). Install a SIGCHLD handler that iterates over the table PID by PID, with waitpid() / WNOHANG. If some children exited, note the exit status in the table, and set a global flag. (Make the entire table volatile for visibility.)
The handler won't be reentrant per default (don't set SA_NODEFER).
Keep SIGCHLD blocked at all times, except inside the pselect() call you use for waiting. Afterwards you can check the global flag to see if the table was updated by the handler -- iterate over all entries, collect the exit statuses, call the handlers, prune the table, and reset the flag.
(The "volatile sig_atomic_t" requirement is "APPLICATION USAGE", and I do believe if you make the table volatile and only allow the handler to run while pselect() temporarily unblocks SIGCHLD, you'll be safe. Roughly, the masking plus the pselect() should make the delivery synchronous, and volatile should ensure full visibility (... inside the same thread of control, as always).)
The signal is blocked even when forking; if fork() succeeds, you can safely extend the table. It doesn't even matter if the child dies *before* you add the entry to the table, the signal will remain pending until you come around pselect() next time.
Let's see how this maps to the ideas imlpemented by the pthread primitives (which should be comprehensive):
- PTHREAD_CREATE_DETACHED: you shouldn't have to care about this (ie. disowning a child if that piece of information is available right at forking time). The signal handler should call waitpid() soon enough, you just won't do anything special with the exit status.
- pthread_detach(): you can set your own "don't do anything special about this process" flag in the table (or any separate handler table indexed by PID) when this info becomes known.
- pthread_join(): that's the main functionality.
- pthread_cancel(): send a SIGTERM, and schedule a SIGKILL in 5 seconds in your timer priority queue. (The "generic handler" for any child exit, after pselect() returned, should include cleaning up the timer pqueue (or invalidating entries in it) for the PIDs that have exited.) Whichever signal does the job, the handler will take care of the zombie (and if it was the SIGTERM, you'd deschedule the SIGKILL for the PID). This SIGTERM / SIGKILL combo is what init does when shutting down the system.
(BTW systemd uses cgroups AFAIK to keep processes belonging to a single "service" as a tight bunch, and when the service is stopped, cgroups allow systemd to massacre all related processes at once.)
"Ersek, Laszlo" <la...@caesar.elte.hu> writes:
> This message is in MIME format. The first part should be readable text,
> while the remaining parts are likely unreadable without MIME-aware tools.
> On Tue, 8 May 2012, Joshua Maurice wrote:
> > I'm just trying to write a general purpose process creation API, and I'm
> > wondering what I should put as the Process object's destructor, or
> > equivalently what function(s) should I provide for error cleanup? I'd
> > like to allow the user of the library to make the choice whether the
> > child process needs to die, whether we need to wait on it, or whether we
> > should "detach" and forget about it. I still guess those are all valid
> > strategies - I think - and I'd like to offer them all.
> I'll assume that the job server won't die unexpectedly before all of its
> children terminate.
> Keep a table of all forked processes (that is, forked by the job server
> explicitly). Install a SIGCHLD handler that iterates over the table PID by
> PID, with waitpid() / WNOHANG. If some children exited, note the exit
> status in the table, and set a global flag. (Make the entire table
> volatile for visibility.)
> The handler won't be reentrant per default (don't set SA_NODEFER).
> Keep SIGCHLD blocked at all times, except inside the pselect() call you
> use for waiting. Afterwards you can check the global flag to see if the
> table was updated by the handler -- iterate over all entries, collect the
> exit statuses, call the handlers, prune the table, and reset the flag.
> (The "volatile sig_atomic_t" requirement is "APPLICATION USAGE", and I do
> believe if you make the table volatile and only allow the handler to run
> while pselect() temporarily unblocks SIGCHLD, you'll be safe. Roughly, the
> masking plus the pselect() should make the delivery synchronous, and
> volatile should ensure full visibility (... inside the same thread of
> control, as always).)
> The signal is blocked even when forking; if fork() succeeds, you can
> safely extend the table. It doesn't even matter if the child dies *before*
> you add the entry to the table, the signal will remain pending until you
> come around pselect() next time.
> Let's see how this maps to the ideas imlpemented by the pthread
> primitives (which should be comprehensive):
> - PTHREAD_CREATE_DETACHED: you shouldn't have to care about this (ie.
> disowning a child if that piece of information is available right at
> forking time). The signal handler should call waitpid() soon enough, you
> just won't do anything special with the exit status.
> - pthread_detach(): you can set your own "don't do anything special about
> this process" flag in the table (or any separate handler table indexed by
> PID) when this info becomes known.
> - pthread_join(): that's the main functionality.
> - pthread_cancel(): send a SIGTERM, and schedule a SIGKILL in 5 seconds in
> your timer priority queue. (The "generic handler" for any child exit,
> after pselect() returned, should include cleaning up the timer pqueue (or
> invalidating entries in it) for the PIDs that have exited.) Whichever
> signal does the job, the handler will take care of the zombie (and if it
> was the SIGTERM, you'd deschedule the SIGKILL for the PID). This SIGTERM /
> SIGKILL combo is what init does when shutting down the system.
> (BTW systemd uses cgroups AFAIK to keep processes belonging to a single
> "service" as a tight bunch, and when the service is stopped, cgroups allow
> systemd to massacre all related processes at once.)
Thanks for your quite thorough message. Still, I question whether I
can write that code for the signal handler before I get the C11/C++11
atomics. That table needs to expand on demand, which means potentially
reallocating the table and copying over entries, which means I need
some basic memory visibility guarantees, which means I need C11/C++11
atomics or their functional equivalents. Perhaps I am mistaken. Could
you explain a little more thoroughly how you would code that signal
handler?
At the moment, the single background thread with sleep loop is quite
easily implementable, and it seems almost as good as the signal
handler approach.
Also, if I go the signal handler approach, then this might break
libraries which also do process spawning. I might intercept their wait/
waitpid, causing their internals to fail. If I want to write generic
code and still be able to leverage other libraries, then setting
signal handlers doesn't seem to be a very "nice" / "cooperative" way
of accomplishing that. What if their library decided to do the same
thing as me? We'd be trampling each other's signal handlers. What if
I'm trying to write a simple utility (which I am), and I don't know
who or what main() will even be doing? Again, it seems good policy to
only set signal handlers if you're the one guy in control of "main()",
and it'd be nice to have my process spawning utility not have such a
requirement (setting a [global] signal handler) for its use. Unless
you want me to chain the signal handlers, which is maybe doable... I
don't know enough to comment besides saying this seems finicky and
sketchy.
On Wed, 9 May 2012, Scott Lurndal wrote:
> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:
>> This message is in MIME format. The first part should be readable text,
>> while the remaining parts are likely unreadable without MIME-aware tools.
> Was mime necessary?
Absolutely not. Apologies.
"pine" was replaced with "alpine" on this system I use for Usenet; it may have tricked me. I just checked, I have the following option enabled:
FEATURE: Enable 8bit NNTP Posting
This feature affects Alpine's behavior when posting news.
The Internet standard for exchanging USENET news messages
(RFC-1036) specifies that USENET messages should conform to
Internet mail standards and contain only 7bit characters, but
much of the news transport software in use today is capable of
successfully sending messages containing 8bit characters. Hence,
many people believe that it is appropriate to send 8bit news
messages without any MIME encoding.
Moreover, there is no Internet standard for explicitly
negotiating 8bit transfer, as there is for Internet email.
Therefore, Alpine provides the option of posting unencoded 8bit
news messages, though not as the default. Setting this feature
will turn OFF Alpine's MIME encoding of newsgroup postings that
contain 8bit characters.
Note, articles may cross a path or pass through news transport
software that is unsafe or even hostile to 8bit characters. At
best this will only cause the posting to become garbled. The
safest way to transmit 8bit characters is to leave Alpine's MIME
encoding turned on, but recipients who lack MIME-aware tools are
often annoyed when they receive MIME-encoded messages.
I don't know why it didn't work. Sorry for the inconvenience. Perhaps postponing the message forced MIME?...
I doubt I wrote anything outside of 7-bit ASCII to begin with. Nonetheless, alpine reports my posting back to me as "ISO-8859-15" which I never use deliberately. It may have been triggered by something in Joshua's mail that I replied to. And then "news transport software" beyond my client could have forced MIME. (I hope it won't happen now!) The NNTP server I use is "news.eternal-september.org".
"Ersek, Laszlo" <la...@caesar.elte.hu> writes:
> On Wed, 9 May 2012, Scott Lurndal wrote:
>> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:
>>> This message is in MIME format. The first part should be readable text,
>>> while the remaining parts are likely unreadable without MIME-aware tools.
>> Was mime necessary?
> Absolutely not. Apologies.
> "pine" was replaced with "alpine" on this system I use for Usenet; it
> may have tricked me. I just checked, I have the following option
> enabled:
> FEATURE: Enable 8bit NNTP Posting
> This feature affects Alpine's behavior when posting news.
> The Internet standard for exchanging USENET news messages
> (RFC-1036) specifies that USENET messages should conform to
> Internet mail standards and contain only 7bit characters, but
> much of the news transport software in use today is capable of
> successfully sending messages containing 8bit characters. Hence,
> many people believe that it is appropriate to send 8bit news
> messages without any MIME encoding.
> Moreover, there is no Internet standard for explicitly
> negotiating 8bit transfer, as there is for Internet email.
> Therefore, Alpine provides the option of posting unencoded 8bit
> news messages, though not as the default. Setting this feature
> will turn OFF Alpine's MIME encoding of newsgroup postings that
> contain 8bit characters.
> Note, articles may cross a path or pass through news transport
> software that is unsafe or even hostile to 8bit characters. At
> best this will only cause the posting to become garbled. The
> safest way to transmit 8bit characters is to leave Alpine's MIME
> encoding turned on, but recipients who lack MIME-aware tools are
> often annoyed when they receive MIME-encoded messages.
> I don't know why it didn't work. Sorry for the inconvenience. Perhaps
> postponing the message forced MIME?...
It did use 8bit. But it also wrapped up the message into a
multipart/mixed container. Since the message consists of only a single
part, that's an inexplicably bizarre thing to do.
> I doubt I wrote anything outside of 7-bit ASCII to begin
> with. Nonetheless, alpine reports my posting back to me as
> "ISO-8859-15" which I never use deliberately. It may have been
> triggered by something in Joshua's mail that I replied to. And then
> "news transport software" beyond my client could have forced MIME. (I
> hope it won't happen now!) The NNTP server I use is
> "news.eternal-september.org".
Joshua's message had a non-breaking space in the attribution line.
On Tue, 8 May 2012, Joshua Maurice wrote:
> Thanks for your quite thorough message. Still, I question whether I
> can write that code for the signal handler before I get the C11/C++11
> atomics. That table needs to expand on demand, which means potentially
> reallocating the table and copying over entries, which means I need
> some basic memory visibility guarantees, which means I need C11/C++11
> atomics or their functional equivalents. Perhaps I am mistaken. Could
> you explain a little more thoroughly how you would code that signal
> handler?
What I proposed to be done in the handler may easily be overkill. Setting only the volatile sig_atomic_t flag in the handler and nothing else (and allowing it to interrupt pselect()) could be enough. That way you would only work with the table in the normal context.
Anyway the idea was something like this:
struct child_status
{
pid_t pid;
int status,
exited;
};
I tried to handle EINTR with respect to other signals.
If waitpid() fails with a different errno, we made a programming mistake. If waitpid() succeeds, it can either return the exact PID we passed in, or with 0 (no status available yet).
I used a separate "status" variable (with auto storage class) because passing &children.statuses[i].status directly to waitpid() would cast away "volatile" from the (volatile int *). (The explicit assignment later on keeps it.) Actually it's an incompatible pointer type and we'd have to force the cast, but we exactly don't want to do that.
I made the "children" struct volatile (ie. made volatile all of children.num, children.statuses (= the pointer itself), children.exited) so that one can use realloc() in the main context. (children.num doesn't have to be the size of the table, just the number of used entries in it; the allocated size of the table doesn't have to be maintained as a "global".)
I assumed SA_NOCLDSTOP was set when installing the handler, and none of SA_NODEFER, SA_NOCLDWAIT, SA_SIGINFO, SA_RESETHAND were set.
Again, this may be overkill; simply setting "children.exited = 1" and then doing this same loop after pselect() in the main context could work the same way. I just vaguely remember some requirement (a comment from some source code) that waitpid() had to be called in the SIGCHLD handler (the child has to be reaped there); that's why my first idea was to put the loop there.
> At the moment, the single background thread with sleep loop is quite
> easily implementable, and it seems almost as good as the signal
> handler approach.
Seems so (saying this right now without thinking about fork(), signals, and multiple threads in the same process).
> Also, if I go the signal handler approach, then this might break > libraries which also do process spawning. I might intercept their wait/ > waitpid, causing their internals to fail.
You could interfere with their installed SIGCHLD handlers, but sigaction() can return the old handler, and you could perhaps invoke the previously installed handler inside your handler as first or last act, "chaining" it.
The example code I gave above tries to follow the recommendation Geoff cited earlier; it only calls waitpid() on the PIDs that we forked "explicitly" -- we extend the children.statuses table only with PIDs our own selves forked. If another library spawns a child, it won't get added to our table, and we won't try to collect its exit status. Plus by "chaining" the SIGCHLD handlers we give the other library a chance to reap its own zombies. (Hopefully it won't reap ours -- it could be wise to call the previously installed handler as last act.)
I agree this is messy. "Event loop" libraries want you to program "inside out": register your fd's, signals and timers with them, they'll call pselect() for you, and invoke your callbacks. Try to combine two such libraries in the same process :) At some point one has to open-code his/her own "main loop" ultimately.
"Ersek, Laszlo" wrote:
> On Wed, 9 May 2012, Scott Lurndal wrote:
>> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:
>>> This message is in MIME format. The first part should be readable text,
>>> while the remaining parts are likely unreadable without MIME-aware tools.
>> Was mime necessary?
> Absolutely not. Apologies.
> "pine" was replaced with "alpine" on this system I use for Usenet; it may > have tricked me. I just checked, I have the following option enabled:
Joshua Maurice wrote:
> On May 8, 5:47 am, Geoff Clare <ge...@clare.See-My-Signature.invalid>
> wrote:
>> Joshua Maurice wrote:
>> > Also, I was somewhat dismayed at finding that there is no easy way to
>> > "detach" a child. After a fork, the parent must die or wait/waitpid on
>> > the child, or you will have a resource leak aka a zombie process.
>> > Wouldn't it be a sensible addition to the kernel to add some function
>> > which says "I don't care if my child with this pid is running anymore,
>> > and I don't care about its exit code. Once it finishes, remove it from
>> > the process table as if I had called wait/waitpid. Equivalently: re-
>> > parent it to init right now."
>> Take a look at the SA_NOCLDWAIT flag for sigaction().
[...]
> Let's see... If you use sigaction and set SA_NOCLDWAIT for SIGCHLD,
> can you still call waitpid on children and get exit codes? I assume
> not.
Correct, but you did say "... and I don't care about its exit code" in
the part of your post this response was to.
> Again, my major question seems to be that it's not easy, for a parent
> executable which may live on for a long time, to get the exit code of
> a child process and be able to properly clean up on error code paths.
As others have suggested elsewhere in the thread, the best practice
is probably to do what shells do.
> PS: SA_NOCLDWAIT appears to be a minor helper thing to avoid writing a
> 10-20 line signal handler that calls waitpid(-1, <whatever that
> nonblocking flag is>) in a loop in the signal handler. Am I right? I
> forget if that nonblocking flag is POSIX and/or portable.
The WNOHANG flag is in POSIX.
There are some subtle differences with SA_NOCLDWAIT. One is that
you can call wait() and it will wait for all your child processes
to terminate and then return -1 with ECHILD. Not sure how useful
that is. Another difference is that POSIX warns application
writers not to use wait() or waitpid(-1, ...) because they may
"steal" the child exit status from a child started by a function
like system(), whereas it has no such warning about SA_NOCLDWAIT.
This suggests to me that system() is required to have some internal
method of waiting for its child that is not affected by SA_NOCLDWAIT,
but I don't know whether that's a deliberate intention by the POSIX
developers.
Geoff Clare <ge...@clare.See-My-Signature.invalid> writes:
> Joshua Maurice wrote:
>> Geoff Clare <ge...@clare.See-My-Signature.invalid> wrote:
>>> Joshua Maurice wrote:
>>> > Also, I was somewhat dismayed at finding that there is no easy way to
>>> > "detach" a child. After a fork, the parent must die or wait/waitpid on
>>> > the child, or you will have a resource leak aka a zombie process.
>>> > Wouldn't it be a sensible addition to the kernel to add some function
>>> > which says "I don't care if my child with this pid is running anymore,
>>> > and I don't care about its exit code. Once it finishes, remove it from
>>> > the process table as if I had called wait/waitpid. Equivalently: re-
>>> > parent it to init right now."
>>> Take a look at the SA_NOCLDWAIT flag for sigaction().
> [...]
>> Let's see... If you use sigaction and set SA_NOCLDWAIT for SIGCHLD,
>> can you still call waitpid on children and get exit codes? I assume
>> not.
> Correct, but you did say "... and I don't care about its exit code" in
> the part of your post this response was to.
The point AIUI is to be able to retroactively "detach" a subprocess when
events have rendered its eventual exit status irrelevant.
> Geoff Clare <ge...@clare.See-My-Signature.invalid> writes:
> > Joshua Maurice wrote:
> >> Geoff Clare <ge...@clare.See-My-Signature.invalid> wrote:
> >>> Joshua Maurice wrote:
> >>> > Also, I was somewhat dismayed at finding that there is no easy way to
> >>> > "detach" a child. After a fork, the parent must die or wait/waitpid on
> >>> > the child, or you will have a resource leak aka a zombie process.
> >>> > Wouldn't it be a sensible addition to the kernel to add some function
> >>> > which says "I don't care if my child with this pid is running anymore,
> >>> > and I don't care about its exit code. Once it finishes, remove it from
> >>> > the process table as if I had called wait/waitpid. Equivalently: re-
> >>> > parent it to init right now."
> >>> Take a look at the SA_NOCLDWAIT flag for sigaction().
> > [...]
> >> Let's see... If you use sigaction and set SA_NOCLDWAIT for SIGCHLD,
> >> can you still call waitpid on children and get exit codes? I assume
> >> not.
> > Correct, but you did say "... and I don't care about its exit code" in
> > the part of your post this response was to.
> The point AIUI is to be able to retroactively "detach" a subprocess when
> events have rendered its eventual exit status irrelevant.
Perhaps you could expand on the double-fork method I described.
Fork a child. The child forks a grandchild, waits for it, and exits with the same status it got.
When the original parent wishes to detach the subprocess, it kills the child and waits for it.
-- Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
On Wed, 9 May 2012, Marc wrote:
> "Ersek, Laszlo" wrote:
>> "pine" was replaced with "alpine" on this system I use for Usenet; it may
>> have tricked me. I just checked, I have the following option enabled:
>> FEATURE: Enable 8bit NNTP Posting
> Maybe this one?
> downgrade-multipart-to-text
Looks good. I've set it now; hopefully my config won't violate netiquette anymore...
> In article <87mx5h4h63....@araminta.anjou.terraraq.org.uk>,
> Richard Kettlewell <r...@greenend.org.uk> wrote:
> > Geoff Clare <ge...@clare.See-My-Signature.invalid> writes:
> > > Joshua Maurice wrote:
> > >> Geoff Clare <ge...@clare.See-My-Signature.invalid> wrote:
> > >>> Joshua Maurice wrote:
> > >>> > Also, I was somewhat dismayed at finding that there is no easy way to
> > >>> > "detach" a child. After a fork, the parent must die or wait/waitpid on
> > >>> > the child, or you will have a resource leak aka a zombie process.
> > >>> > Wouldn't it be a sensible addition to the kernel to add some function
> > >>> > which says "I don't care if my child with this pid is running anymore,
> > >>> > and I don't care about its exit code. Once it finishes, remove it from
> > >>> > the process table as if I had called wait/waitpid. Equivalently: re-
> > >>> > parent it to init right now."
> > >>> Take a look at the SA_NOCLDWAIT flag for sigaction().
> > > [...]
> > >> Let's see... If you use sigaction and set SA_NOCLDWAIT for SIGCHLD,
> > >> can you still call waitpid on children and get exit codes? I assume
> > >> not.
> > > Correct, but you did say "... and I don't care about its exit code" in
> > > the part of your post this response was to.
> > The point AIUI is to be able to retroactively "detach" a subprocess when
> > events have rendered its eventual exit status irrelevant.
> Perhaps you could expand on the double-fork method I described.
> Fork a child. The child forks a grandchild, waits for it, and exits
> with the same status it got.
> When the original parent wishes to detach the subprocess, it kills the
> child and waits for it.
Yeah, I think that or the background thread are the best bets. You can
unconditionally kill the stub without worry. Actually, this means I
wouldn't need to spawn a thread in the stub, so I wouldn't need a
separate executable either. Thanks. I didn't see that before.
> Joshua Maurice wrote:
> > Also, I was somewhat dismayed at finding that there is no easy way to
> > "detach" a child. After a fork, the parent must die or wait/waitpid on
> > the child, or you will have a resource leak aka a zombie process.
> > Wouldn't it be a sensible addition to the kernel to add some function
> > which says "I don't care if my child with this pid is running anymore,
> > and I don't care about its exit code. Once it finishes, remove it from
> > the process table as if I had called wait/waitpid. Equivalently: re-
> > parent it to init right now."
> Take a look at the SA_NOCLDWAIT flag for sigaction().
> > Note that the signal handler approach doesn't work AFAIK. If you have
> > a SIG_CHLD signal handler that just waits on any child, then you
> > cannot (easily) get child exit codes as the signal handler may have
> > "wait"ed on it before you. Right?
> POSIX recommends that applications should never "wait for any child"
> (whether in a signal handler or not). Specifically, it says:
> "Calls to wait() will collect information about any child process.
> This may result in interactions with other interfaces that may be
> waiting for their own children (such as by use of system()).
> For this and other reasons it is recommended that portable
> applications not use wait(), but instead use waitpid(). For these
> same reasons, the use of waitpid() with a pid argument of -1, and
> the use of waitid() with the idtype argument set to P_ALL, are
> also not recommended for portable applications."
Man, I need to reread this several more times. Is "opengroup.org" my
best bet for official POSIX documentation? That's what I usually use.
Somehow, I managed to not notice the rather large rational section at
the bottom. It doesn't contain your text, but contains a bunch of
related text.
There's this really interesting bit here:
[quote]
Guarantee #4
Although possible to make this guarantee, system() would have to
set the SIGCHLD handler to SIG_DFL so that the SIGCHLD signal
generated by its fork() would be discarded (the SIGCHLD default action
is to be ignored), then restore it to its previous setting. This would
have the undesirable side effect of discarding all SIGCHLD signals
pending to the process.
[/quote]
I don't fully understand this, nor most of the other text. My
knowledge of POSIX signal handling is limited, at best. I'll have to
do a lot more reading. I almost parse this as saying it's possible to
have a signal handler set for SIGCHLD and also for waitpid(child, ...)
to always get the child exit stuff. This seems to be impossible from
my limited understanding - in the case that you call waitpid after the
child has finished, I don't see how that could work. The signal
handler will have already dealt with it.
> On May 8, 5:47 am, Geoff Clare <ge...@clare.See-My-Signature.invalid>
> wrote:
> > POSIX recommends that applications should never "wait for any child"
> > (whether in a signal handler or not). Specifically, it says:
> > "Calls to wait() will collect information about any child process.
> > This may result in interactions with other interfaces that may be
> > waiting for their own children (such as by use of system()).
> > For this and other reasons it is recommended that portable
> > applications not use wait(), but instead use waitpid(). For these
> > same reasons, the use of waitpid() with a pid argument of -1, and
> > the use of waitid() with the idtype argument set to P_ALL, are
> > also not recommended for portable applications."
> Man, I need to reread this several more times. Is "opengroup.org" my
> best bet for official POSIX documentation? That's what I usually use.
> Somehow, I managed to not notice the rather large rational section at
> the bottom. It doesn't contain your text, but contains a bunch of
> related text.
> There's this really interesting bit here:
> [quote]
> Guarantee #4
> Although possible to make this guarantee, system() would have to
> set the SIGCHLD handler to SIG_DFL so that the SIGCHLD signal
> generated by its fork() would be discarded (the SIGCHLD default action
> is to be ignored), then restore it to its previous setting. This would
> have the undesirable side effect of discarding all SIGCHLD signals
> pending to the process.
> [/quote]
> I don't fully understand this, nor most of the other text. My
> knowledge of POSIX signal handling is limited, at best. I'll have to
> do a lot more reading. I almost parse this as saying it's possible to
> have a signal handler set for SIGCHLD and also for waitpid(child, ...)
> to always get the child exit stuff. This seems to be impossible from
> my limited understanding - in the case that you call waitpid after the
> child has finished, I don't see how that could work. The signal
> handler will have already dealt with it.
> As I said, I have some more reading to do.
Well, I've done some more reading, and I'm still as confused as ever.
Question:
I assume that SIGCHLD is not blocked "by default". That is, in a
"fresh" process just after an exec() call, SIGCHLD will not be blocked
by the sole thread of the process. Is this right? I can't quite seem
to find where it specifies the "default" signal mask, signal handlers,
etc.
Question:
Consider the following program:
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/wait.h>
int main()
{
char cmd[] = "/bin/true";
char* argv[1];
argv[0] = 0;
pid_t child = fork();
if (child == -1)
abort();
if (child == 0)
{ //child
if (-1 == execvp(cmd, argv))
abort();
}
else
{ //parent
int status;
for (;;)
{
if (-1 == waitpid(child, &status, 0))
{
if (errno == EINTR)
continue;
abort();
}
break;
}
if ( ! WIFEXITED(status) && ! WIFSIGNALED(status))
abort(); //impossible according to spec
if ( ! WIFEXITED(status))
abort(); //our child got signaled
return WEXITSTATUS(status);
}
}
This program works, right? That is, it will wait for the termination
of the child, and it will correctly "0" from main, right?
Furthermore, I know this is silly, but I don't see it explicitly
documented on that page at all: Once I get back a good wait() or
waitpid() call, how do I know if it's still a zombie? Specifically, "
WIFEXITED(status) || WIFSIGNALED(status) " implies that I don't have a
resource leak, right? Specifically, " WIFEXITED(status) ||
WIFSIGNALED(status) " implies the child is now terminated, is not a
zombie, and is gone from the kernel process table, right?
Question:
Consider the following:
http://pubs.opengroup.org/onlinepubs/009695399/functions/waitpid.html quote]
If _POSIX_REALTIME_SIGNALS is defined, and the implementation queues
the SIGCHLD signal, then if wait() or waitpid() returns because the
status of a child process is available, any pending SIGCHLD signal
associated with the process ID of the child process shall be
discarded. Any other pending SIGCHLD signals shall remain pending.
Otherwise, if SIGCHLD is blocked, if wait() or waitpid() return
because the status of a child process is available, any pending
SIGCHLD signal shall be cleared unless the status of another child
process is available.
For all other conditions, it is unspecified whether child status will
be available when a SIGCHLD signal is delivered.
[/quote]
What the hell does that last sentence mean? What are the "other
conditions"? What are the "initial" conditions to which there are
"other" conditions? I've thus far concluded that SIGCHLD is not
blocked "by default", and I've concluded that my above program (which
has SIGCHLD not blocked by my reasoning) is guaranteed to get the
child's status. Yet, the last sentence seems to read that I don't have
that guarantee. -- I must be misreading this somehow. Can anyone help
me here, please?
Question:
Consider the following:
http://pubs.opengroup.org/onlinepubs/009695399/functions/waitpid.html [quote]
In particular, an implementation that does accept (discard) the
SIGCHLD signal can make the following guarantees regardless of the
queuing depth of signals in general (the list of waitable children can
hold the SIGCHLD queue):
[...]
The system() function will not cause a process' SIGCHLD handler to be
called as a result of the fork()/ exec executed within system()
because system() will accept the SIGCHLD signal when it performs a
waitpid() for its child process. This is a desirable behavior of
system() so that it can be used in a library without causing side
effects to the application linked with the library.
[/quote]
This is a bug in the spec, right? I can see how the internals of
system() may be something like:
pid_t child = fork();
// ...
waitpid(child, & status, 0);
However, there's a small window of time where the child may finish up
before the parent enters waitpid(), which means AFAIK that the signal
handler will be called, quite contrary to the quoted text above. So I
say again, this is a bug / mistake in the POSIX standard, right?
On Tue, 08 May 2012 15:24:46 -0700, Joshua Maurice wrote:
> I'm just trying to write a general purpose process creation API, and
> I'm wondering what I should put as the Process object's destructor, or
> equivalently what function(s) should I provide for error cleanup? I'd
> like to allow the user of the library to make the choice whether the
> child process needs to die, whether we need to wait on it, or whether
> we should "detach" and forget about it. I still guess those are all
> valid strategies - I think - and I'd like to offer them all.
You can't package something as complex as the Unix process API inside a
simple interface without making a lot of trade-offs.
As for clean-up, I'd suggest doing what Python's subprocess module does:
it maintains a list of "orphan" processes. The destructor for the
Python object checks whether the process is still alive (waitpid),
and if so the PID is added to the list. Every time a new process is
created, the _cleanup() function is run to prune any zombies from the list.