Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Best practices for avoiding leak zombie processes while retaining child exit code?

Received: by 10.68.230.98 with SMTP id sx2mr636012pbc.1.1336513084115;
        Tue, 08 May 2012 14:38:04 -0700 (PDT)
Path: pr3ni3769pbb.0!nntp.google.com!news2.google.com!postnews.google.com!k7g2000pbo.googlegroups.com!not-for-mail
From: Joshua Maurice <joshuamaur...@gmail.com>
Newsgroups: comp.unix.programmer
Subject: Re: Best practices for avoiding leak zombie processes while retaining
 child exit code?
Date: Tue, 8 May 2012 14:24:49 -0700 (PDT)
Organization: http://groups.google.com
Lines: 82
Message-ID: <9dcc0bba-84b2-41b5-80fc-b88450ebb25c@k7g2000pbo.googlegroups.com>
References: <f9e1b966-cb56-4f26-97fc-bd4a9c49ba74@r2g2000pbs.googlegroups.com>
 <barmar-F9C437.11130208052012@news.eternal-september.org> <ef985b29-def6-4a4a-8f0d-2f17ab8c0675@l4g2000pbv.googlegroups.com>
 <a0tfr4F6uiU2@mid.individual.net>
NNTP-Posting-Host: 12.108.188.134
Mime-Version: 1.0
X-Trace: posting.google.com 1336513083 16307 127.0.0.1 (8 May 2012 21:38:03 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 8 May 2012 21:38:03 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: k7g2000pbo.googlegroups.com; posting-host=12.108.188.134; posting-account=C7XBLgoAAAAxMpmeFo8Iv_pud1pyFhjy
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0,gzip(gfe)
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On May 8, 2:00=A0pm, Ian Collins <ian-n...@hotmail.com> wrote:
> On 05/ 9/12 07:18 AM, Joshua Maurice wrote:
>
> > On May 8, 8:13 am, Barry Margolin<bar...@alum.mit.edu> =A0wrote:
> >> In article
> >> <f9e1b966-cb56-4f26-97fc-bd4a9c49b...@r2g2000pbs.googlegroups.com>,
> >> =A0 Joshua Maurice<joshuamaur...@gmail.com> =A0wrote:
>
> >>> Also, I was somewhat dismayed at finding that there is no easy way to
> >>> "detach" a child. After a fork, the parent must die or wait/waitpid o=
n
> >>> the child, or you will have a resource leak aka a zombie process.
> >>> Wouldn't it be a sensible addition to the kernel to add some function
> >>> which says "I don't care if my child with this pid is running anymore=
,
> >>> and I don't care about its exit code. Once it finishes, remove it fro=
m
> >>> the process table as if I had called wait/waitpid. Equivalently: re-
> >>> parent it to init right now."
>
> >> The standard idiom for this is to fork twice. The original parent fork=
s
> >> a child, the child forks a grandchild, and then the child exits. =A0Th=
e
> >> parent waits for the child (no need for a SIGCHLD handler, since this
> >> should be almost instantaneous), and then the grandchild is inherited =
by
> >> init.
>
> > Yes, but then you can't get the exit code and the other exit
> > information. I would like to know if the child reported that it failed
> > horribly or something. I could maintain a stub intermediary whose
> > correctness I can guarantee to accomplish the same effect and get the
> > exit code et al, but that seems .. inelegant.
>
> The best you can probably do is have the child check the grandchild is
> running before exiting. =A0Still messy (how long to wait for example), bu=
t
> better than nothing.
>
> If you want to know the status of the grandchild, you could use some
> form of shared semaphore.

Yes, but if I understand you correctly, this shared semaphore requires
cooperation with the child executable. Suppose I want to exec g++ or
the visual studios command line compiler, or some other executable
whose code I can't change. Your idea wouldn't work.

I think I would like a "detachprocess" call, aka "reparent my child to
init now", but I'm still not sure what I want to do by default in the
error code path for child processes. Suppose we have something like
make, which spawns processes to do jobs, and it doesn't know at all
what these jobs do. Suppose further we encounter an error in the
parent, or we want to cancel the "job", or something, but leave other
jobs running. In the cleanup code path, I think the theoretical
options are:
1- ignore the child and have a zombie process leak
2- call a system call to "detach" or "reparent to init now"
3- wait for the child to finish
4- kill child, possibly with SIGTERM before a SIGKILL to be nice

Like for a generic job server, if you want to cancel the job, should
you ignore the child processes of that job, try to kill them with
SIGTERM, actually kill them with SIGTERM, or wait possibly without end
for the child to finish? I think any approach would require user
intervention because of the lack of an effective "kill child and all
children". (Hell, nevermind, I don't want to open that can of worms.
You can't nest process groups, which makes them far less useful. For
example, the similar situation exists on win32, and I cannot create a
process group for a job that includes the visual studios compiler
because IIRC the compiler itself tries to create a process group, and
you can't nest process groups.)

I don't like any of them. The more I think about it, the more it seems
heavily dependent on the particular application. Still, option 2 is
unavailable to me now short of some rather annoying code I'd have to
write. To get option 2 with the current POSIX API, I think I'd have to
write a separate executable for the stub process because I would need
2 threads - one to read on a pipe from the parent, and a second thread
to waitpid on the child - and you can't create threads between fork
and exec IIRC, and thus you need a separate executable. (Or reuse the
current executable but that seems excessively hacky.)