[boost] [process] Process.Spawn leaves the child in zombie state on posix

281 views
Skip to first unread message

Benedek Tass via Boost

unread,
Jul 6, 2022, 10:37:07 AM7/6/22
to bo...@lists.boost.org, Benedek Tass
Hi, everyone!

This bug was observed in 2022.07.05, using boost version 1.79.0.
Similar behaviour was commented on before on online messageboards.


Bug description

When starting a new, detached process with Process.Spawn in a posix system,
if the parent process outlives the child, the child process remains in
zombie state for the parent process' lifetime.
The bug described above is demonstrated in the CMake project attached to
this document.


Analysis

The spawn function injects the syscall "signal(SIGCHLD, SIG_IGN)" with a
functor of type boost::process::detail::posix::sig_init_. This is done to
the forked child process, to no avail. And this is not done (and should not
be done) in the parent process being too intrusive.


Possible mitigation

Introducing an in-between forked process that serves as the parent of the
to-be-spawned process with SIGCHLD set to SIG_IGN would prevent the spawned
process to become zombie, and simultaneously does not disturb the parent
process' signal handlers.
Introducing another class alternative to sig_init_ with the functionality
described above would be a reasonable approach. An implementation sketch of
the double fork method can be found in the attached project.


System used

OS: Ubuntu 18.04.6 LTS
arch: x86_64
compiler: gcc-7.5.0
libc: libc-2.27
boost: 1.79.0 (built from source)


Best regards,
Benedek Tass
boost_process_spawn_bug_example.zip

Andrey Semashev via Boost

unread,
Jul 6, 2022, 11:27:59 AM7/6/22
to bo...@lists.boost.org, Andrey Semashev
You should probably report bugs on GitHub:

https://github.com/boostorg/process/issues

In the bug, it is always desirable to post a small compilable code
sample that reproduces the issue.

Regarding the proposed fix, it's not clear how introducing an
intermediate process would fix the parent not calling waitpid() or
equivalent. You'd just get a different zombie process.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Benedek Tass via Boost

unread,
Jul 7, 2022, 7:08:51 AM7/7/22
to bo...@lists.boost.org, Benedek Tass
I was planning to send an example project, I'm sending it now as an
attachment.
As the demo sketch shows, I would absolutely call waitpid() in the parent
process, the sig_init_-like class' on_success() method would be a
reasonable place for it. The current implementation doesn't lend itself to
an easy fix in case of this bug, I don't have a fully fletched idea how to
do it.


Andrey Semashev via Boost <bo...@lists.boost.org> ezt írta (időpont: 2022.
júl. 6., Sze, 17:28):
boost_process_spawn_bug_example.zip

Klemens Morgenstern via Boost

unread,
Jul 7, 2022, 7:21:57 AM7/7/22
to bo...@lists.boost.org, Klemens Morgenstern
I don't think this is a solvable problem.
The reason is that you need to set SIGCHLD to SIGIGN for the whole
application in order to prevent zombies. Doesn't really work as a general
solution, which is why I'll deprecate that function at some point.

You can probably `waitpid(-1, &status, 0); from time to time to reap the
zombie processes or you can put them in a process group; but I couldn't
come up with a satisfying solution, which is why the only way will be to
just hold a handle to the child process.

On Thu, Jul 7, 2022 at 7:08 PM Benedek Tass via Boost <bo...@lists.boost.org>
wrote:

Benedek Tass via Boost

unread,
Jul 7, 2022, 7:45:15 AM7/7/22
to bo...@lists.boost.org, Benedek Tass
Have you checked out the sketch i provided? It seems to me that it solves
the problem reasonably well.

Klemens Morgenstern via Boost <bo...@lists.boost.org> ezt írta (időpont:
2022. júl. 7., Cs, 13:22):

Andrey Semashev via Boost

unread,
Jul 7, 2022, 7:47:58 AM7/7/22
to bo...@lists.boost.org, Andrey Semashev
On 7/7/22 14:21, Klemens Morgenstern via Boost wrote:
> I don't think this is a solvable problem.
> The reason is that you need to set SIGCHLD to SIGIGN for the whole
> application in order to prevent zombies. Doesn't really work as a general
> solution, which is why I'll deprecate that function at some point.
>
> You can probably `waitpid(-1, &status, 0); from time to time to reap the
> zombie processes or you can put them in a process group; but I couldn't
> come up with a satisfying solution, which is why the only way will be to
> just hold a handle to the child process.

Isn't SIGCHLD main purpose exactly to call waitpid? If anything, you
should be recommending to either plug a Boost.Process API call in user's
SIGCHLD handler (that will call waitpid internally) or allow the user to
set Boost.Process' own SIGCHLD handler that will do this. Ignoring
SIGCHLD does not seem like a right thing to do.

If Boost.Process doesn't allow to join terminated processes as they
terminate, this seems like a major design flaw to me. Asking users to
call waitpid periodically sounds like a kludge.

Benedek Tass via Boost

unread,
Jul 8, 2022, 4:00:56 AM7/8/22
to bo...@lists.boost.org, Benedek Tass
The following figure may clarify the idea i propose:

PARENT -->fork()----------------------------------waitpid()--->
\ /
\ signal(SIGCHLD,SIG_IGN) /
CHILD fork()----signal(SIGHUP,SIG_IGN)----exit() # this process
takes the burden of ignoring SIGCHLD from parent
\ setsid()
\
GRANDCHILD execve()------>


Benedek Tass <tass.b...@gmail.com> ezt írta (időpont: 2022. júl. 7., Cs,
13:44):

Reply all
Reply to author
Forward
0 new messages