I am only using subprocess as an example for demonstrate SIGCHLD.
import gevent, subprocess
def run():
proc = subprocess.Popen(['sleep', '1'])
while proc.poll() is None:
gevent.sleep()
return proc.poll()
gevent.joinall([gevent.spawn(run)])
This will hang because os.waitpid cannot receive SIGCHLD.
strace shows that:
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, {0x2b7f0f1c01c0, ~[RTMIN RT_1], SA_RESTORER|
SA_RESTART, 0x333040eb10}, NULL, 8) = 0
changing the code to the following makes it work.
import gevent, subprocess
def run():
import signal
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
proc = subprocess.Popen(['sleep', '1'])
while proc.poll() is None:
gevent.sleep()
return proc.poll()
gevent.joinall([gevent.spawn(run)])
In fact, import signal; signal.signal(signal.SIGCHLD, signal.SIG_DFL)
is only needed once once the eventloop starts.
> This will hang because os.waitpid cannot receive SIGCHLD.
> strace shows that:
> rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
> rt_sigaction(SIGCHLD, {0x2b7f0f1c01c0, ~[RTMIN RT_1], SA_RESTORER|
> SA_RESTART, 0x333040eb10}, NULL, 8) = 0
> changing the code to the following makes it work.
> import gevent, subprocess
> def run():
> import signal
> signal.signal(signal.SIGCHLD, signal.SIG_DFL)
> proc = subprocess.Popen(['sleep', '1'])
> while proc.poll() is None:
> gevent.sleep()
> return proc.poll()
> gevent.joinall([gevent.spawn(run)])
> In fact, import signal; signal.signal(signal.SIGCHLD, signal.SIG_DFL)
> is only needed once once the eventloop starts.
On Fri, 2012-04-27 at 14:56 -0400, Jim Fulton wrote:
> So we have to set an environment variable to not have implicit monkey
> patching of the standard library?
It's not a monkey patch, it's a save/restore of the SIGCHLD handler.
Apparently (looking at the code) it's libev which installs its own
signal handler for SIGCHLD, and setting that envvar prevents that
(by saving/restoring the 'real' signal handler around it).
Unfortunately, there is a distinct lack of comments in gevent/core.ppyx.
*Sigh*.
Other than Matthias rightfully disagreeing with my characterization of
the signal handler as a monkey patch, I didn't see any other responses
to my questions and points.
Am I misunderstanding something? This looks like a pretty serious
issue with 1.0b2. I'm not using it because of this. The last thing I
saw from Denis was "It's not a bug", but I can't agree, either that or
it's a serious miss-feature.
On Fri, Apr 27, 2012 at 2:56 PM, Jim Fulton <j...@zope.com> wrote:
> On Fri, Apr 27, 2012 at 2:24 PM, Denis Bilenko <denis.bile...@gmail.com> wrote:
>> It's not a bug, the child watchers
> What are child watchers?
>> were added in 1.0b2 and when those
>> are enabled
> Are they enabled by default?
>> makes libev reap all children.
>> It's unfortunately breaks os.waitpid.
> That's really really bad.
>> We'll have a replacement for it
>> that works.
> A replacement for what? Surely not os.waitpid.
>> This behavior can be disabled with GEVENT_BACKEND=nochild enviroment setting.
> So we have to set an environment variable to not have implicit monkey
> patching of the standard library?
On Thu, May 10, 2012 at 2:01 AM, Jim Fulton <j...@zope.com> wrote:
> Other than Matthias rightfully disagreeing with my characterization of
> the signal handler as a monkey patch, I didn't see any other responses
> to my questions and points.
> Am I misunderstanding something? This looks like a pretty serious
> issue with 1.0b2. I'm not using it because of this. The last thing I
> saw from Denis was "It's not a bug", but I can't agree, either that or
> it's a serious miss-feature.
It makes no sense to avoid 1.0b2 since you can disable this feature.
It does break various subprocess implementations but the trunk already
have gevent.subprocess module so it's not a problem in practice.
We could make it disabled by default, but why? People will create
their own SIGCHLD handlers, all doing the same thing.
On Thu, May 10, 2012 at 1:09 PM, Denis Bilenko <denis.bile...@gmail.com> wrote:
> On Thu, May 10, 2012 at 2:01 AM, Jim Fulton <j...@zope.com> wrote:
>> Other than Matthias rightfully disagreeing with my characterization of
>> the signal handler as a monkey patch, I didn't see any other responses
>> to my questions and points.
>> Am I misunderstanding something? This looks like a pretty serious
>> issue with 1.0b2. I'm not using it because of this. The last thing I
>> saw from Denis was "It's not a bug", but I can't agree, either that or
>> it's a serious miss-feature.
> It makes no sense to avoid 1.0b2 since you can disable this feature.
My library, zc.resumelb, uses gevent. It is also used with other
applications, including some I may have no control over. It is at
best unseemly and at worse a source of hard to debug bugs to require
use of an environment variable to prevent breakage.
> It does break various subprocess implementations but the trunk already
> have gevent.subprocess module so it's not a problem in practice.
Gevent will often be part of some larger applications. Asking people
to change unrelated parts of their code to use something (say
zc.resumelb) that happens to use gevent is a non-starter (at lease for
zc.resumelb).
> We could make it disabled by default, but why?
So as not to cause mysterious breakage for a feature that people may
not want. Heck, I don't want it, I'm not even sure what it is. :)
I'm guessing that this is a in support of gevent subprocess. I think
it would be fine to install the signal handler at the point that
gevent.subprocess is first used (not inmported, but actually used).
(I'm hoping that you'll tell me that that the signal handler isn't
installed unless someone actually creates a subprocess with
gevent.subprocess and that I'm just confused about the nature of the
problem. :)
> People will create
> their own SIGCHLD handlers, all doing the same thing.
I won't. I can understand why someone woould want to use gevent to
manage subprocesses, but I don't and I want my code to be useable with
other people's code that doesn't use gevent.
Gevent isn't my world. It's just a tool I'm using. I should be able
to use it independently of other libraries I'm using. If I can't, then
I won't use gevent.
On Fri, May 11, 2012 at 6:45 PM, Denis Bilenko <denis.bile...@gmail.com> wrote:
> I patched libev to only install SIGCHLD handler when a child watcher
> is started for the first time.
> So as long as gevent.subprocess and loop.child were not used, the
> subprocess libraries that rely on waitpid() should continue working.
Awesome. Thanks.
> Please try the trunk with your subprocess library and let me know if it works.
OK, I'll have to construct a case that is broken in b2 first. My concern
was hypothetical, bases on the discussion.
For all intents and purposes, zc.resumelb is a WSGI server. People
can plug anything into it.
On Fri, May 11, 2012 at 7:05 PM, Jim Fulton <j...@zope.com> wrote:
> On Fri, May 11, 2012 at 6:45 PM, Denis Bilenko <denis.bile...@gmail.com> wrote:
>> I patched libev to only install SIGCHLD handler when a child watcher
>> is started for the first time.
>> So as long as gevent.subprocess and loop.child were not used, the
>> subprocess libraries that rely on waitpid() should continue working.
> Awesome. Thanks.
>> Please try the trunk with your subprocess library and let me know if it works.
> OK, I'll have to construct a case that is broken in b2 first. My concern
> was hypothetical, bases on the discussion.
... and somewhat confused. It appears that the breakage only occurred
when gevent.sleep was used to avoid blocking when polling for a
sub-process to die. If waiting via blocking calls
(e.g. subprocess.call), there wasn't a problem. Of course, blocking
calls are uncool in the gevent context, but my concern was for calls
made in (non-gevent) worker threads.
So this turns out not to have been a big deal. I do still think the
change you made is an improvement.
I am still really, really troubled by the behavior of having libev reap child processes behind the scenes. In my module, I use gevent's signal class to register a SIGCHLD handler (which works great!). When gevent calls my SIGCHILD handler, my module then calls a method in another module (which I cannot modify) that manages processes and calls Popen.poll() to check whether any of its processes completed. So, the other module now always fails to detect completion of the subprocesses that it manages.
Behind-the-scenes reaping of child processes sounds good in theory, but is bad in practice. Production-quality code will, in many cases, need to reap its subprocesses explicitly in order to detect completion of specific subprocess and get their completion status code.
Also, having to disable the reaping via special environment settings kludges the interface and makes testing and deployment of programs more error-prone.
Thank you, Vitaly
P.S., Love gevent, but am very uncomfortable with this unfortunate side-effect.
On Friday, July 6, 2012 5:57:56 PM UTC-7, vitaly wrote:
> I am still really, really troubled by the behavior of having libev reap > child processes behind the scenes. In my module, I use gevent's signal > class to register a SIGCHLD handler (which works great!). When gevent > calls my SIGCHILD handler, my module then calls a method in another module > (which I cannot modify) that manages processes and calls Popen.poll() to > check whether any of its processes completed. So, the other module now > always fails to detect completion of the subprocesses that it manages.
> Behind-the-scenes reaping of child processes sounds good in theory, but is > bad in practice. Production-quality code will, in many cases, need to reap > its subprocesses explicitly in order to detect completion of specific > subprocess and get their completion status code.
> Also, having to disable the reaping via special environment settings > kludges the interface and makes testing and deployment of programs more > error-prone.
> Thank you, > Vitaly
> P.S., Love gevent, but am very uncomfortable with this unfortunate > side-effect.
Also, when I set GEVENT_BACKEND=nochild for my module, gevent.signal stops working -- I no longer get SIGCHILD callbacks. This is with gevent 1.0b2. I thought that GEVENT_BACKEND=nochild would only disable the behind-the-scenes reaping of subprocesses, but it also broke gevent.signal(SIGCHLD,...).
On Sat, Jul 7, 2012 at 5:59 AM, vitaly <vitaly.krugl.nume...@gmail.com> wrote:
> Also, when I set GEVENT_BACKEND=nochild for my module, gevent.signal stops
> working -- I no longer get SIGCHILD callbacks. This is with gevent 1.0b2.
> I thought that GEVENT_BACKEND=nochild would only disable the
> behind-the-scenes reaping of subprocesses, but it also broke
> gevent.signal(SIGCHLD,...).
Try the trunk. Here libev's child reaping is not enabled until a child
watcher or gevent.subprocess is used.
On Friday, July 6, 2012 11:31:28 PM UTC-7, Denis Bilenko wrote:
> On Sat, Jul 7, 2012 at 5:59 AM, vitaly <> wrote: > > Also, when I set GEVENT_BACKEND=nochild for my module, gevent.signal > stops > > working -- I no longer get SIGCHILD callbacks. This is with gevent > 1.0b2. > > I thought that GEVENT_BACKEND=nochild would only disable the > > behind-the-scenes reaping of subprocesses, but it also broke > > gevent.signal(SIGCHLD,...).
> Try the trunk. Here libev's child reaping is not enabled until a child > watcher or gevent.subprocess is used.
This all kept me from sleeping well last night (really :)). As I see it, the problem is that features like this inadvertently break other modules. If a program/process leaves entirely in a 100% gevent-enabled world, then all works great. Unfortunately, far too many useful/necessary modules are not gevent-enabled and won't be (ever or anytime soon). gevent's SIGCHILD watcher offers very handy integration with a gevent-enabled module; however, as soon as my program makes use of such a module, it breaks other non-gevent-enabled modules that start subprocesses and wait for their completion via os.waitpid() or similar (e.g., from a separate thread).
I understand that gevent/libev need to reap child processes in order to implement gevent-compatible blocking semantics for process wait and similar methods. Conceptually, it would be ideal if gevent/libev would only reap processes that were created via gevent.subprocess, and left other processes alone. This would imply that gevent/libev would need to poll all outstanding gevent.subprocess-created pid's via waitpid (or similar) on every SIGCHLD dispatch instead of reaping whatever child/children actually completed via wait3 (or similar). This might incur some performance penalty (I don't know the extent), but would reduce breakage of other modules in the non-100%-gevent-enabled universe.
On Sat, Jul 7, 2012 at 7:00 PM, vitaly <vitaly.krugl.nume...@gmail.com> wrote:
> This all kept me from sleeping well last night (really :)). As I see it,
> the problem is that features like this inadvertently break other modules.
> If a program/process leaves entirely in a 100% gevent-enabled world, then
> all works great. Unfortunately, far too many useful/necessary modules are
> not gevent-enabled and won't be (ever or anytime soon). gevent's SIGCHILD
> watcher offers very handy integration with a gevent-enabled module; however,
> as soon as my program makes use of such a module, it breaks other
> non-gevent-enabled modules that start subprocesses and wait for their
> completion via os.waitpid() or similar (e.g., from a separate thread).
You have to make a choice, either use child watchers and/or
gevent.subprocess or some other subprocess module.
> Conceptually, it would be ideal if gevent/libev would only reap
> processes that were created via gevent.subprocess, and left other processes
> alone. This would imply that gevent/libev would need to poll all outstanding
> gevent.subprocess-created pid's via waitpid (or similar) on every SIGCHLD
> dispatch instead of reaping whatever child/children actually completed via
> wait3 (or similar). This might incur some performance penalty (I don't know
> the extent), but would reduce breakage of other modules in the
> non-100%-gevent-enabled universe.
It's not ideal, because the performance impact can be quite severe.
Here's Twisted's experience, which does handle each pid individually:
http://twistedmatrix.com/trac/ticket/2967
On Saturday, July 7, 2012 8:22:41 AM UTC-7, Denis Bilenko wrote:
> On Sat, Jul 7, 2012 at 7:00 PM, vitaly <> wrote: > > This all kept me from sleeping well last night (really :)). As I see > it, > > the problem is that features like this inadvertently break other > modules. > > If a program/process leaves entirely in a 100% gevent-enabled world, > then > > all works great. Unfortunately, far too many useful/necessary modules > are > > not gevent-enabled and won't be (ever or anytime soon). gevent's > SIGCHILD > > watcher offers very handy integration with a gevent-enabled module; > however, > > as soon as my program makes use of such a module, it breaks other > > non-gevent-enabled modules that start subprocesses and wait for their > > completion via os.waitpid() or similar (e.g., from a separate thread).
> You have to make a choice, either use child watchers and/or > gevent.subprocess or some other subprocess module.
> > Conceptually, it would be ideal if gevent/libev would only reap > > processes that were created via gevent.subprocess, and left other > processes > > alone. This would imply that gevent/libev would need to poll all > outstanding > > gevent.subprocess-created pid's via waitpid (or similar) on every > SIGCHLD > > dispatch instead of reaping whatever child/children actually completed > via > > wait3 (or similar). This might incur some performance penalty (I don't > know > > the extent), but would reduce breakage of other modules in the > > non-100%-gevent-enabled universe.
> It's not ideal, because the performance impact can be quite severe. > Here's Twisted's experience, which does handle each pid individually: > http://twistedmatrix.com/trac/ticket/2967
I was afraid of that (re. performance impact), but not entirely surprised - it must be a bunch of expensive kernel calls. However, the Twisted link points to an alternative that should solve this problem efficiently: "SA_SIGINFO flag with sigaction". This way, gevent/libev SIGCHLD handler would discover the pid of the dead child process (from the siginfo_t arg), look it up in a map of outstanding gevent.subprocess-created processes, and reap it only if it was found in the map.
On Friday, July 6, 2012 11:31:28 PM UTC-7, Denis Bilenko wrote:
> On Sat, Jul 7, 2012 at 5:59 AM, vitaly <> wrote: > > Also, when I set GEVENT_BACKEND=nochild for my module, gevent.signal > stops > > working -- I no longer get SIGCHILD callbacks. This is with gevent > 1.0b2. > > I thought that GEVENT_BACKEND=nochild would only disable the > > behind-the-scenes reaping of subprocesses, but it also broke > > gevent.signal(SIGCHLD,...).
> Try the trunk. Here libev's child reaping is not enabled until a child > watcher or gevent.subprocess is used.
If the process was spawned with GEVENT_BACKEND=nochild, I think that gevent.signal() should throw an exception instead of silently failing to work. It takes time to debug it, especially if gevent.signal() was being used by a 3rd party module.
On Sat, Jul 7, 2012 at 7:53 PM, vitaly <vitaly.krugl.nume...@gmail.com> wrote:
>> Try the trunk. Here libev's child reaping is not enabled until a child
>> watcher or gevent.subprocess is used.
> If the process was spawned with GEVENT_BACKEND=nochild, I think that
> gevent.signal() should throw an exception instead of silently failing to
> work. It takes time to debug it, especially if gevent.signal() was being
> used by a 3rd party module.
First, "nochild" option is removed from the trunk. Second, when does
gevent.signal() silently fails to work?
On Saturday, July 7, 2012 1:22:56 PM UTC-7, Denis Bilenko wrote:
> On Sat, Jul 7, 2012 at 7:53 PM, vitaly <> wrote: > > If the process was spawned with GEVENT_BACKEND=nochild, I think that > > gevent.signal() should throw an exception instead of silently failing to > > work. It takes time to debug it, especially if gevent.signal() was > being > > used by a 3rd party module.
> First, "nochild" option is removed from the trunk. Second, when does > gevent.signal() silently fails to work?
When I turned on GEVENT_BACKEND=nochild for my program that uses gevent 1.0b2, my program stopped getting SIGCHLD callbacks from gevent.signal().
On Sun, Jul 8, 2012 at 6:23 PM, vitaly <vitaly.krugl.nume...@gmail.com> wrote:
> When I turned on GEVENT_BACKEND=nochild for my program that uses gevent
> 1.0b2, my program stopped getting SIGCHLD callbacks from gevent.signal().
Try it with the trunk. If you still have the issue (and are sure
nothing else sets SIGCHLD in your program), please send me a
reproducing script.
On Saturday, July 7, 2012 8:44:52 AM UTC-7, vitaly wrote:
> On Saturday, July 7, 2012 8:22:41 AM UTC-7, Denis Bilenko wrote:
>> On Sat, Jul 7, 2012 at 7:00 PM, vitaly <> wrote:
>> > Conceptually, it would be ideal if gevent/libev would only reap >> > processes that were created via gevent.subprocess, and left other >> processes >> > alone. This would imply that gevent/libev would need to poll all >> outstanding >> > gevent.subprocess-created pid's via waitpid (or similar) on every >> SIGCHLD >> > dispatch instead of reaping whatever child/children actually completed >> via >> > wait3 (or similar). This might incur some performance penalty (I don't >> know >> > the extent), but would reduce breakage of other modules in the >> > non-100%-gevent-enabled universe.
>> It's not ideal, because the performance impact can be quite severe. >> Here's Twisted's experience, which does handle each pid individually: >> http://twistedmatrix.com/trac/ticket/2967
> I was afraid of that (re. performance impact), but not entirely surprised > - it must be a bunch of expensive kernel calls. However, the Twisted link > points to an alternative that should solve this problem efficiently: "SA_SIGINFO > flag with sigaction". This way, gevent/libev SIGCHLD handler would > discover the pid of the dead child process (from the siginfo_t arg), look > it up in a map of outstanding gevent.subprocess-created processes, and reap > it only if it was found in the map.
Denis -- what are your thoughts on "SA_SIGINFO flag with sigaction" as a solution? It seems like this would be just as efficient as the current solution, while at the same time avoiding interference with externally, non-gevent-managed subprocesses in other modules used by the app.