Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

multiprocessing module and os.close(sys.stdin.fileno())

2 views
Skip to first unread message

Graham Dumpleton

unread,
Feb 17, 2009, 10:34:39 PM2/17/09
to
Why is the multiprocessing module, ie., multiprocessing/process.py, in
_bootstrap() doing:

os.close(sys.stdin.fileno())

rather than:

sys.stdin.close()

Technically it is feasible that stdin could have been replaced with
something other than a file object, where the replacement doesn't have
a fileno() method.

In that sort of situation an AttributeError would be raised, which
isn't going to be caught as either OSError or ValueError, which is all
the code watches out for.

Graham

Jesse Noller

unread,
Feb 18, 2009, 9:16:11 PM2/18/09
to Graham Dumpleton, pytho...@python.org
> --
> http://mail.python.org/mailman/listinfo/python-list
>

I don't know why it was implemented that way. File an issue on the
tracker and assign it to me (jnoller) please.

Graham Dumpleton

unread,
Feb 19, 2009, 12:23:10 AM2/19/09
to

Created as:

http://bugs.python.org/issue5313

I don't see option to assign, so you are on nosy list to start with.

Graham

Joshua Judson Rosen

unread,
Feb 21, 2009, 12:20:20 AM2/21/09
to
Jesse Noller <jno...@gmail.com> writes:
>
> On Tue, Feb 17, 2009 at 10:34 PM, Graham Dumpleton
> <Graham.D...@gmail.com> wrote:
> > Why is the multiprocessing module, ie., multiprocessing/process.py, in
> > _bootstrap() doing:
> >
> > os.close(sys.stdin.fileno())
> >
> > rather than:
> >
> > sys.stdin.close()
> >
> > Technically it is feasible that stdin could have been replaced with
> > something other than a file object, where the replacement doesn't have
> > a fileno() method.
> >
> > In that sort of situation an AttributeError would be raised, which
> > isn't going to be caught as either OSError or ValueError, which is all
> > the code watches out for.
>
> I don't know why it was implemented that way. File an issue on the
> tracker and assign it to me (jnoller) please.

My guess would be: because it's also possible for sys.stdin to be a
file that's open in read+*write* mode, and for that file to have
pending output buffered (for example, in the case of a socketfile).

There's a general guideline, inherited from C, that one should ensure
that the higher-level close() routine is invoked on a given
file-descriptor in at most *one* process after that descriptor has
passed through a fork(); in the other (probably child) processes, the
lower-level close() routine should be called to avoid a
double-flush--whereby buffered data is flushed out of one process, and
then the *same* buffered data is flushed out of the (other)
child-/parent-process' copy of the file-object.

So, if you call sys.stdin.close() in the child-process in
_bootstrap(), then it could lead to a double-flush corrupting output
somewhere in the application that uses the multiprocessing module.

You can expect similar issues with just about /any/ `file-like objects'
that might have `file-like semantics' of buffering data and flushing
it on close, also--because you end up with multiple copies of the same
object in `pre-flush' state, and each copy tries to flush at some point.

As such, I'd recommend against just using .close(); you might use
something like `if hasattr(sys.stdin, "fileno"): ...'; but, if your
`else' clause unconditionally calls sys.stdin.close(), then you still
have double-flush problems if someone's set sys.stdin to a file-like
object with output-buffering.

I guess you could try calling that an `edge-case' and seeing if anyone
screams. It'd be sort-of nice if there was finer granularity in the
file API--maybe if file.close() took a boolean `flush' argument....

--
Don't be afraid to ask (Lf.((Lx.xx) (Lr.f(rr)))).

Graham Dumpleton

unread,
Feb 21, 2009, 6:58:45 PM2/21/09
to
On Feb 21, 4:20 pm, Joshua Judson Rosen <roz...@geekspace.com> wrote:

> Jesse Noller <jnol...@gmail.com> writes:
>
> > On Tue, Feb 17, 2009 at 10:34 PM, Graham Dumpleton
> > <Graham.Dumple...@gmail.com> wrote:
> > > Why is the multiprocessing module, ie., multiprocessing/process.py, in
> > > _bootstrap() doing:
>
> > >  os.close(sys.stdin.fileno())
>
> > > rather than:
>
> > >  sys.stdin.close()
>
> > > Technically it is feasible that stdin could have been replaced with
> > > something other than a file object, where the replacement doesn't have
> > > a fileno() method.
>
> > > In that sort of situation an AttributeError would be raised, which
> > > isn't going to be caught as either OSError or ValueError, which is all
> > > the code watches out for.
>
> > I don't know why it was implemented that way. File an issue on the
> > tracker and assign it to me (jnoller) please.
>
> My guess would be: because it's also possible for sys.stdin to be a
> file that's open in read+*write* mode, and for that file to have
> pending output buffered (for example, in the case of a socketfile).

If you are going to have a file that is writable as well as readable,
such as a socket, then likely that sys.stdout/sys.stderr are going to
be bound to it at the same time. If that is the case then one should
not be using close() at all as it will then also close the write side
of the pipe and cause errors when code subsequently attempts to write
to sys.stdout/sys.stderr.

In the case of socket you would actually want to use shutdown() to
close just the input side of the socket.

What this all means is that what is the appropriate thing to do is
going to depend on the environment in which the code is used. Thus,
having the behaviour hard wired a certain way is really bad. There
perhaps instead should be a way of a user providing a hook function to
be called to perform any case specific cleanup of stdin, stdout and
stderr, or otherwise reassign them.

That this is currently in the _bootstrap() function, which does other
important stuff, doesn't exactly make it look like it is easily
overridden to work for a specific execution environment which is
different to the norm.

> There's a general guideline, inherited from C, that one should ensure
> that the higher-level close() routine is invoked on a given
> file-descriptor in at most *one* process after that descriptor has
> passed through a fork(); in the other (probably child) processes, the
> lower-level close() routine should be called to avoid a
> double-flush--whereby buffered data is flushed out of one process, and
> then the *same* buffered data is flushed out of the (other)
> child-/parent-process' copy of the file-object.
>
> So, if you call sys.stdin.close() in the child-process in
> _bootstrap(), then it could lead to a double-flush corrupting output
> somewhere in the application that uses the multiprocessing module.
>
> You can expect similar issues with just about /any/ `file-like objects'
> that might have `file-like semantics' of buffering data and flushing
> it on close, also--because you end up with multiple copies of the same
> object in `pre-flush' state, and each copy tries to flush at some point.
>
> As such, I'd recommend against just using .close(); you might use
> something like `if hasattr(sys.stdin, "fileno"): ...'; but, if your
> `else' clause unconditionally calls sys.stdin.close(), then you still
> have double-flush problems if someone's set sys.stdin to a file-like
> object with output-buffering.
>
> I guess you could try calling that an `edge-case' and seeing if anyone
> screams. It'd be sort-of nice if there was finer granularity in the
> file API--maybe if file.close() took a boolean `flush' argument....

Graham

Joshua Judson Rosen

unread,
Feb 21, 2009, 8:52:19 PM2/21/09
to
Graham Dumpleton <Graham.D...@gmail.com> writes:
>
> On Feb 21, 4:20 pm, Joshua Judson Rosen <roz...@geekspace.com> wrote:
> > Jesse Noller <jnol...@gmail.com> writes:
> >
> > > On Tue, Feb 17, 2009 at 10:34 PM, Graham Dumpleton
> > > <Graham.Dumple...@gmail.com> wrote:
> > > > Why is the multiprocessing module, ie., multiprocessing/process.py, in
> > > > _bootstrap() doing:
> >
> > > >  os.close(sys.stdin.fileno())
> >
> > > > rather than:
> >
> > > >  sys.stdin.close()
> >
> > > > Technically it is feasible that stdin could have been replaced with
> > > > something other than a file object, where the replacement doesn't have
> > > > a fileno() method.
> >
> > > > In that sort of situation an AttributeError would be raised, which
> > > > isn't going to be caught as either OSError or ValueError, which is all
> > > > the code watches out for.
> >
> > > I don't know why it was implemented that way. File an issue on the
> > > tracker and assign it to me (jnoller) please.
> >
> > My guess would be: because it's also possible for sys.stdin to be a
> > file that's open in read+*write* mode, and for that file to have
> > pending output buffered (for example, in the case of a socketfile).
>
> If you are going to have a file that is writable as well as readable,
> such as a socket, then likely that sys.stdout/sys.stderr are going to
> be bound to it at the same time.

Yes.

> If that is the case then one should not be using close() at all

If you mean stdin.close(), then that's what I said :)

> as it will then also close the write side of the pipe and cause
> errors when code subsequently attempts to write to
> sys.stdout/sys.stderr.
>
>
> In the case of socket you would actually want to use shutdown() to
> close just the input side of the socket.

Sure--but isn't this "you" the /calling/ code that set the whole thing
up? What the /caller/ does with its stdio is up to /him/, and beyond
the scope of the present discourse. I can appreciate a library forking
and then using os.close() on stdio (it protects my files from any I/O
the subprocess might think it wants to do with them), but I think I
might be even more annoyed if it *shutdown my sockets* than if it
caused double-flushes (there's at least a possibility that I could
cope with the double-flushes by just ensuring that *I* flushed before
the fork--not so with socket.shutdown()!)

> What this all means is that what is the appropriate thing to do is
> going to depend on the environment in which the code is used. Thus,
> having the behaviour hard wired a certain way is really bad. There
> perhaps instead should be a way of a user providing a hook function to
> be called to perform any case specific cleanup of stdin, stdout and
> stderr, or otherwise reassign them.

Usually, I'd say that that's what the methods on the passed-in object
are for. Though, as I said--the file-object API is lacking, here :(

> > As such, I'd recommend against just using .close(); you might use
> > something like `if hasattr(sys.stdin, "fileno"): ...'; but, if your
> > `else' clause unconditionally calls sys.stdin.close(), then you still
> > have double-flush problems if someone's set sys.stdin to a file-like
> > object with output-buffering.
> >
> > I guess you could try calling that an `edge-case' and seeing if anyone
> > screams. It'd be sort-of nice if there was finer granularity in the
> > file API--maybe if file.close() took a boolean `flush' argument....

--

Graham Dumpleton

unread,
Feb 22, 2009, 12:41:00 AM2/22/09
to
On Feb 22, 12:52 pm, Joshua Judson Rosen <roz...@geekspace.com> wrote:

Either. The problem is that same, it close for both read and write and
if was expecting to still be able to write because used for stdout or
stderr, then will not work.

> > as it will then also close the write side of the pipe and cause
> > errors when code subsequently attempts to write to
> > sys.stdout/sys.stderr.
>
> > In the case of socket you would actually want to use shutdown() to
> > close just the input side of the socket.
>
> Sure--but isn't this "you" the /calling/ code that set the whole thing
> up? What the /caller/ does with its stdio is up to /him/, and beyond
> the scope of the present discourse. I can appreciate a library forking
> and then using os.close() on stdio (it protects my files from any I/O
> the subprocess might think it wants to do with them), but I think I
> might be even more annoyed if it *shutdown my sockets*

Ah, yeah, forgot that shutdown does end to end shutdown rather than
just that file object reference. :-)

Graham

0 new messages