Best way to prevent zombie processes

Cecil Westerhof

unread,

May 31, 2015, 5:54:20 PM5/31/15

to

At the moment I have the following code:
os.chdir(directory)
for document in documents:
subprocess.Popen(['evince', document])

With this I can open several documents at once. But there is no way to
know when those documents are going to be closed. This could/will lead
to zombie processes. (I run it on Linux.) What is the best solution to
circumvent this?

I was thinking about putting all Popen instances in a list. And then
every five minutes walk through the list and check with poll if the
process has terminated. If it has it can be released from the list.
Of-course I need to synchronise those events. Is that a good way to do
it?

--
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof

Marko Rauhamaa

unread,

May 31, 2015, 6:22:42 PM5/31/15

to

Cecil Westerhof <Ce...@decebal.nl>:

> At the moment I have the following code:
> os.chdir(directory)
> for document in documents:
> subprocess.Popen(['evince', document])
>
> With this I can open several documents at once. But there is no way to
> know when those documents are going to be closed. This could/will lead
> to zombie processes. (I run it on Linux.) What is the best solution to
> circumvent this?
>
> I was thinking about putting all Popen instances in a list. And then
> every five minutes walk through the list and check with poll if the
> process has terminated. If it has it can be released from the list.
> Of-course I need to synchronise those events. Is that a good way to do
> it?

If you don't care to know when child processes exit, you can simply
ignore the SIGCHLD signal:

import signal
signal.signal(signal.SIGCHLD, signal.SIG_IGN)

That will prevent zombies from appearing.

On the other hand, if you want to know when a process exits, you have
several options:

* What you propose would work, but is all but elegant. You want to
react as soon as child processes die.

* You could actually trap the SIGCHLD signal by setting a signal
handler. You should not do any actual processing in the signal
handler itself but rather convert the signal into a file descriptor
event (with a pipe; Python doesn't seem to support signal file
descriptors or event file descriptors).

* Instead of a using a signal handler, you could capture the output of
the child process. At its simplest, you capture the standard output,
but you could also open an extra pipe for the purpose. You keep
reading the standard output and as soon as you read an EOF, you wait
the process out.

Marko

Cameron Simpson

unread,

May 31, 2015, 10:26:10 PM5/31/15

to pytho...@python.org

On 31May2015 23:33, Cecil Westerhof <Ce...@decebal.nl> wrote:
>At the moment I have the following code:
> os.chdir(directory)
> for document in documents:
> subprocess.Popen(['evince', document])
>
>With this I can open several documents at once. But there is no way to
>know when those documents are going to be closed. This could/will lead
>to zombie processes. (I run it on Linux.) What is the best solution to
>circumvent this?

The standard trick is to make the process a grandchild instead of a child.
Fork, kick off subprocess, exit (the forked child).

But provided you will collect the children eventually then zombies are only
untidy, not very resource wasteful. They are essentially just slots in the
process table left around so that exit status can be collected; the resources
associated with the full process (memory, open file, etc) have already been
freed.

>I was thinking about putting all Popen instances in a list. And then
>every five minutes walk through the list and check with poll if the
>process has terminated. If it has it can be released from the list.
>Of-course I need to synchronise those events. Is that a good way to do
>it?

It is reasonable.

Alternatively, and more responsively, you could have a process collection
Thread. It would loop indefinitely, calling os.wait(). That will block until a
child process exits (any of them), so it is not a busy loop. When wait()
returns, look up the pid in your list of dispatched subprocess children and do
whatever cleanup you intend. (If it is just zombies, the os.wait() cleans that
part for you.)

One thing you do need to keep in mind with such a task is that it will
os.wait() _all_ child processes. If there are other children which exit, not
spawned by your specific calls (earlier children or children made by some
library function), you must be prepared to receive pids which (a) are not in
your list of "evince" etc tasks and (b) to be collecting pids which other
subsystems might have wanted to collect themselves. The latter is uncommon (at
least, uncommon for such things to occur and you the programmer to be unaware
of them).

Anyway, that is simple and effective and immediate.

Cheers,
Cameron Simpson <c...@zip.com.au>

If everyone is thinking alike, then someone isn't thinking. - Patton

Ben Finney

unread,

Jun 1, 2015, 12:37:19 AM6/1/15

to pytho...@python.org

Cameron Simpson <c...@zip.com.au> writes:

> The standard trick is to make the process a grandchild instead of a
> child. Fork, kick off subprocess, exit (the forked child).

For Cecil Westerhof's benefit: If you haven't seen it, the
‘python-daemon’ library is designed to get this, and other fiddly
aspects of daemonising the current program, correct in Python code.

<URL:https://pypi.python.org/pypi/python-daemon/>

--
\ “Injustice is relatively easy to bear; what stings is justice.” |
`\ —Henry L. Mencken |
_o__) |
Ben Finney

Cecil Westerhof

unread,

Jun 1, 2015, 5:52:14 AM6/1/15

to

Op Monday 1 Jun 2015 03:03 CEST schreef Cameron Simpson:

> On 31May2015 23:33, Cecil Westerhof <Ce...@decebal.nl> wrote:
>> At the moment I have the following code:
>> os.chdir(directory)
>> for document in documents:
>> subprocess.Popen(['evince', document])
>>
>> With this I can open several documents at once. But there is no way
>> to know when those documents are going to be closed. This
>> could/will lead to zombie processes. (I run it on Linux.) What is
>> the best solution to circumvent this?
>
> The standard trick is to make the process a grandchild instead of a
> child. Fork, kick off subprocess, exit (the forked child).
>
> But provided you will collect the children eventually then zombies
> are only untidy, not very resource wasteful. They are essentially
> just slots in the process table left around so that exit status can
> be collected; the resources associated with the full process
> (memory, open file, etc) have already been freed.

I do not like untidy. ;-)

But Marko already gave the solution:
import signal
signal.signal(signal.SIGCHLD, signal.SIG_IGN)

Cecil Westerhof

unread,

Jun 1, 2015, 5:52:14 AM6/1/15

to

Op Monday 1 Jun 2015 00:22 CEST schreef Marko Rauhamaa:

> Cecil Westerhof <Ce...@decebal.nl>:
>
>> At the moment I have the following code:
>> os.chdir(directory)
>> for document in documents:
>> subprocess.Popen(['evince', document])
>>
>> With this I can open several documents at once. But there is no way
>> to know when those documents are going to be closed. This
>> could/will lead to zombie processes. (I run it on Linux.) What is
>> the best solution to circumvent this?
>>
>> I was thinking about putting all Popen instances in a list. And
>> then every five minutes walk through the list and check with poll
>> if the process has terminated. If it has it can be released from
>> the list. Of-course I need to synchronise those events. Is that a
>> good way to do it?
>
> If you don't care to know when child processes exit, you can simply
> ignore the SIGCHLD signal:
>
> import signal
> signal.signal(signal.SIGCHLD, signal.SIG_IGN)
>
> That will prevent zombies from appearing.

In this case I do not care. I just do not want to create zombie
processes.

It works. What I find kind of strange because
https://docs.python.org/2/library/signal.html
says:
signal.SIG_DFL

This is one of two standard signal handling options; it will
simply perform the default function for the signal. For
example, on most systems the default action for SIGQUIT is to
dump core and exit, while the default action for SIGCHLD is to
simply ignore it.

signal.SIG_IGN

This is another standard signal handler, which will simply
ignore the given signal.

As it looks it does not matter in this case, because when a process
terminates it will still communicate its exit status to Popen. But
what if I want for certain Popen signals SIG_IGN and others SIG_DFL.
How should I do that?

Chris Angelico

unread,

Jun 1, 2015, 6:05:12 AM6/1/15

to pytho...@python.org

On Mon, Jun 1, 2015 at 7:20 PM, Cecil Westerhof <Ce...@decebal.nl> wrote:
> But
> what if I want for certain Popen signals SIG_IGN and others SIG_DFL.
> How should I do that?

You can't. A signal is a signal; you can't specify default handling
for some and not others. The only way is to actually handle them, and
then you can decide what to do on a per-process basis.

ChrisA

Marko Rauhamaa

unread,

Jun 1, 2015, 6:22:09 AM6/1/15

to

Chris Angelico <ros...@gmail.com>:

Signals are a very crude, old-school UNIX system programming concept.

One of their worst problems is their global scope. You can't
compartmentalize signals through ordinary means of encapsulation
(libraries, classes).

Linux is addressing this particular issue "as we speak:"

This patch series introduces a new clone flag, CLONE_FD, which lets
the caller handle child process exit notification via a file
descriptor rather than SIGCHLD.

<URL: https://lwn.net/Articles/636646/>

I guess it will be a while before this facility will be available to
Python programmers. For example, signalfd and eventfd were introduced in
2007 and 2010, respectively, but still aren't supported by Python.

Marko

Marko Rauhamaa

unread,

Jun 1, 2015, 6:25:31 AM6/1/15

to

Cecil Westerhof <Ce...@decebal.nl>:

> It works. What I find kind of strange because
> https://docs.python.org/2/library/signal.html

You should not rely on Python documentation on Linux system specifics.
The wait(2) manual page states:

POSIX.1-2001 specifies that if the disposition of SIGCHLD is set to
SIG_IGN or the SA_NOCLDWAIT flag is set for SIGCHLD (see
sigaction(2)), then children that terminate do not become zombies
and a call to wait() or waitpid() will block until all children have
terminated, and then fail with errno set to ECHILD. (The original
POSIX standard left the behavior of setting SIGCHLD to SIG_IGN
unspecified. Note that even though the default disposition of
SIGCHLD is "ignore", explicitly setting the disposition to SIG_IGN
results in different treatment of zombie process children.)

Marko

Cecil Westerhof

unread,

Jun 1, 2015, 8:48:26 AM6/1/15

to

Op Sunday 31 May 2015 23:33 CEST schreef Cecil Westerhof:

> At the moment I have the following code:
> os.chdir(directory)
> for document in documents:
> subprocess.Popen(['evince', document])
>
> With this I can open several documents at once. But there is no way
> to know when those documents are going to be closed. This could/will
> lead to zombie processes. (I run it on Linux.) What is the best
> solution to circumvent this?
>
> I was thinking about putting all Popen instances in a list. And then
> every five minutes walk through the list and check with poll if the
> process has terminated. If it has it can be released from the list.
> Of-course I need to synchronise those events. Is that a good way to
> do it?

With some investigation I decided on something completely different.

I made a class:
import os
import subprocess

from os.path import expanduser
from threading import Thread

class DocumentsToShow:
def show_documents(self):
if self._desktop != -1:
subprocess.check_call(['wmctrl', '-s', str(self._desktop - 1)])
os.chdir(self._directory)
for document in self._documents:
Popen_without_zombie(['evince', document])

def __init__(self, name, desktop, directory, documents):
'''Initialise the class'''

self._name = name
self._desktop = desktop
self._directory = expanduser(directory)
self._documents = documents

And this class uses the following function:
def Popen_without_zombie(command):
p = subprocess.Popen(command)
Thread(target = p.wait).start()

The class takes a name for the collection, which desktop to display it
on, the directory that contains the documents and a list of documents.
With show_documents they are displayed.

How about this way of solving it?

About the improvement of the class. Now it is always evince that is
used to open a document. But it would be nice if it was a little less
picky. What is the preferred way to solve this?
- Defining a set the user can choose from.
- Making it a string parameter and expecting the user to know what he
is doing.

Marko Rauhamaa

unread,

Jun 1, 2015, 9:32:21 AM6/1/15

to

Cecil Westerhof <Ce...@decebal.nl>:

> Thread(target = p.wait).start()
>
> [...]

>
> How about this way of solving it?

It works.

Marko

Cecil Westerhof

unread,

Jun 1, 2015, 9:46:44 AM6/1/15

to

Op Monday 1 Jun 2015 14:16 CEST schreef Cecil Westerhof:

I let the user give the command to open the documents, but check if
the command exists:

class DocumentsToShow:
def show_documents(self):
if self._desktop != -1:
subprocess.check_call(['wmctrl', '-s', str(self._desktop - 1)])
os.chdir(self._directory)
for document in self._documents:

Popen_without_zombie([self._command, document])

def __init__(self, name, desktop, command, directory, documents):
'''Initialise the class'''

try:
subprocess.check_call(['which', command])
except subprocess.CalledProcessError:
raise ValueError('Command \'{0}\' is not known'.format(command))

self._name = name
self._desktop = desktop

self._command = command

self._directory = expanduser(directory)
self._documents = documents

Cecil Westerhof

unread,

Jun 1, 2015, 10:57:39 AM6/1/15

to

Op Monday 1 Jun 2015 15:32 CEST schreef Marko Rauhamaa:

That I knew: I tested it before I posted it. What I mend is this
better, worse, or the same as working with signal. In my eye it is
much better, but maybe there is a reason that would make my line of
thinking wrong.

Grant Edwards

unread,

Jun 1, 2015, 10:59:40 AM6/1/15

to

On 2015-05-31, Marko Rauhamaa <ma...@pacujo.net> wrote:
> Cecil Westerhof <Ce...@decebal.nl>:
>
>> At the moment I have the following code:
>> os.chdir(directory)
>> for document in documents:
>> subprocess.Popen(['evince', document])
>>
>> With this I can open several documents at once. But there is no way to
>> know when those documents are going to be closed. This could/will lead
>> to zombie processes. (I run it on Linux.) What is the best solution to
>> circumvent this?
>>
>> I was thinking about putting all Popen instances in a list. And then
>> every five minutes walk through the list and check with poll if the
>> process has terminated. If it has it can be released from the list.
>> Of-course I need to synchronise those events. Is that a good way to do
>> it?
>
> If you don't care to know when child processes exit, you can simply
> ignore the SIGCHLD signal:
>
> import signal
> signal.signal(signal.SIGCHLD, signal.SIG_IGN)
>
> That will prevent zombies from appearing.

Bravo! I've been writing Unix apps for 30 years, and I did not know
that. Is this something recent[1], or have I somehow managed to avoid
this useful bit of info for that long?

[1] "Recent" of course being rather subjective and highly
age-dependent.

--
Grant Edwards grant.b.edwards Yow! How's it going in
at those MODULAR LOVE UNITS??
gmail.com

Marko Rauhamaa

unread,

Jun 1, 2015, 11:39:16 AM6/1/15

to

Grant Edwards <inv...@invalid.invalid>:

> On 2015-05-31, Marko Rauhamaa <ma...@pacujo.net> wrote:
>> If you don't care to know when child processes exit, you can simply
>> ignore the SIGCHLD signal:
>>
>> import signal
>> signal.signal(signal.SIGCHLD, signal.SIG_IGN)
>>
>> That will prevent zombies from appearing.
>
> Bravo! I've been writing Unix apps for 30 years, and I did not know
> that. Is this something recent[1], or have I somehow managed to avoid
> this useful bit of info for that long?

I wasn't aware of it ever having been any other way. However:

POSIX.1-1990 disallowed setting the action for SIGCHLD to SIG_IGN.
POSIX.1-2001 allows this possibility, so that ignoring SIGCHLD can be
used to prevent the creation of zombies (see wait(2)). Nevertheless,
the historical BSD and System V behaviors for ignoring SIGCHLD
differ, so that the only completely portable method of ensuring that
terminated children do not become zombies is to catch the SIGCHLD
signal and perform a wait(2) or similar.

<URL: http://man7.org/linux/man-pages/man2/sigaction.2.html>

If you possess the Stevens book, you'd probably find out how SIGC(H)LD
was treated in BSD and System V. I can't remember despite programming
for either environment.

Marko