If both child programs exit with 0, then the script runs to
completion. But if prog2 exits with non-0, prog1 does not exit and the
script hangs (i.e. prog1.poll() always returns None) -- unless I
uncomment the 2 lines marked by XXX to close prog1.stdout.
I was expecting that I don't have to explicitly close prog1.stdout,
whether prog2 succeeds or fails. Is the current behavior a bug in the
subprocess module or is it expected? Or am I doing something wrong?
Thanks.
import subprocess
import time
# prog1: a program that writes lots of data to the pipe
cmd = ['zcat', '--force', 'a_large_file']
prog1 = subprocess.Popen(cmd, bufsize=-1, stdout=subprocess.PIPE)
# prog2: a program that fails without reading much data from the pipe
cmd = ['python', '-c', 'import time; time.sleep(10); asdf']
prog2 = subprocess.Popen(cmd, bufsize=-1, stdin=prog1.stdout,
stdout=open('popen.out', 'w'))
print 'waiting for a while'
retCodeProg2 = prog2.wait()
print 'prog2 returns', retCodeProg2
# XXX
# if retCodeProg2 != 0:
# prog1.stdout.close()
while prog1.poll() is None:
print 'sleep a bit'
time.sleep(1)
retCodeProg1 = prog1.poll()
print 'prog1 returns', retCodeProg1
> Below, I have a Python script that launches 2 child programs, prog1
> and prog2, with prog1's stdout connected to prog2's stdin via a pipe.
> (It's like executing "prog1 | prog2" in the shell.)
>
> If both child programs exit with 0, then the script runs to
> completion. But if prog2 exits with non-0, prog1 does not exit and the
> script hangs (i.e. prog1.poll() always returns None) -- unless I
> uncomment the 2 lines marked by XXX to close prog1.stdout.
>
> I was expecting that I don't have to explicitly close prog1.stdout,
> whether prog2 succeeds or fails. Is the current behavior a bug in the
> subprocess module or is it expected? Or am I doing something wrong?
>
> Thanks.
>
> import subprocess
> import time
>
> # prog1: a program that writes lots of data to the pipe
> cmd = ['zcat', '--force', 'a_large_file']
> prog1 = subprocess.Popen(cmd, bufsize=-1, stdout=subprocess.PIPE)
>
> # prog2: a program that fails without reading much data from the pipe
> cmd = ['python', '-c', 'import time; time.sleep(10); asdf']
> prog2 = subprocess.Popen(cmd, bufsize=-1, stdin=prog1.stdout,
> stdout=open('popen.out', 'w'))
I think that you should close prog1.stdout here. Otherwise, there will
be two readers on the pipe (the calling process and prog2). Even if one of
them dies, there's always the possibility that the caller might eventually
decide to read prog1.stdout itself. If you close it in the caller, when
prog2 terminates there will be no readers, and prog1 will get SIGPIPE (or
write() will fail with EPIPE if SIGPIPE is handled).
Thanks for raising a great point, that prog1.stdout is also readable
by the calling process, not just by prog2. Therefore, I agree it makes
sense to explicitly call prog1.stdout.close() in the given code (say
right after the creation of prog2).
Suppose now all the prog1.poll() calls/loop are replaced by a single
prog1.wait(). Without the explicit prog1.stdout.close(), prog1.wait()
will not return, so the calling process still hangs. Because calling
prog1.wait() means that the calling process will naturally never read
prog1.stdout, I would argue that prog1.wait() should close the pipe
before actually waiting for prog1 to exit. Makes sense?
output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
After communicate() returns, if you wait for p1 to finish (by calling
p1.poll() repeatedly or p1.wait()), you can hang if the conditions
described in the original post are true, i.e. p1 wrote lots of data to
the pipe and p2 failed without reading much data from the pipe.
Perhaps the doc can be improved to remind folks to close p1.stdout if
the calling process doesn't need it, unless wait() is changed to close
it and p1.wait() is called.
Am I making any sense here?
> Suppose now all the prog1.poll() calls/loop are replaced by a single
> prog1.wait(). Without the explicit prog1.stdout.close(), prog1.wait()
> will not return, so the calling process still hangs. Because calling
> prog1.wait() means that the calling process will naturally never read
> prog1.stdout, I would argue that prog1.wait() should close the pipe
> before actually waiting for prog1 to exit. Makes sense?
prog1.stdout might be being read by a different thread.
> Well, the example code at
> http://www.python.org/ ... /subprocess.html#replacing-shell-pipeline
> has the same issue:
> Perhaps the doc can be improved to remind folks to close p1.stdout if
> the calling process doesn't need it, unless wait() is changed to close
> it and p1.wait() is called.
>
> Am I making any sense here?
The docs should include the p1.stdout.close().
It isn't needed in the typical case, where p2 runs until EOF on stdin, but
(as you have noticed) it matters if p2 terminates prematurely.