clojure.java.shell enhancements

David Powell

unread,

Jun 5, 2010, 5:49:39 PM6/5/10

to cloju...@googlegroups.com

Hi,

For discussion, I've got a few patches to clojure.io.shell.

I think this is a really useful API to have, especially as
java.lang.Process is so tricky to use (and ProcessBuilder doesn't
really solve any of the problems that Process has).

Patch #1 fixes the problem that because the process's stdout and
stderr aren't buffered, if you don't read from them in separate
threads, then one can fill its buffer and block the process from
making any further progress.

Without this patch, something like the following can hang, because
stderr can get blocked and wont be drained until stdout gets closed,
which will never happen:

(sh "cmd" "/c dir 1>&2")

Patch #6 enhances this fix by ensuring that the input to the process
is also done in a separate thread.

Patch #2 just removes a stray debug println that was in sh

Patch #3 uses clojure.java.io to copy the process's streams into the
output. This simplifies the code, and ensures that we aren't copying
the data a byte at a time, which would have very high overheads.

Patch #4 adds an :incenc parameter for specifying the encoding of the
input string. Also renames :out to :outenc to be consistent, and to
avoid confusion with the :out return value.

Patch #5 changes the default encoding to be the platform default
encoding, rather than 'utf-8'. I think that this is appropriate
because it matches how we handle *out* *err* and *in* from clojure.

Whilst utf-8 is a good default for reading/writing textual data to
files, I don't think it is appropriate for piping process streams. On
Windows, utf-8 isn't supported well as a code-page by the C runtime,
so is almost always wrong.

Also, I wonder whether there could be some enhancements for streaming
support for sh. Currently input and output have to be done via
in-memory strings or arrays. Simple for some apps, but it wouldn't be
appropriate if you wanted to write something like a Unix-style
stdin/stdout filter. Think svnfilter. But perhaps we could add an
enhancement for that at a later date.

Also, should we support passing :in as a byte array, and ignoring the
:incenc parameter?

--
Dave

0001-read-stdout-and-stderr-simultanously-from-separate-t.patch

0002-removed-stray-println.patch

0003-use-clojure.java.io-to-copy-streams-avoiding-byte-at.patch

0004-added-inenc-option-specifying-the-input-character-se.patch

0005-changed-default-encoding-to-platform-default-encodin.patch

0006-write-to-stdin-from-a-separate-thread-to-prevent-std.patch

B Smith-Mannschott

unread,

Jun 6, 2010, 1:02:31 PM6/6/10

to cloju...@googlegroups.com

Hi David,

I've applied your patch series atop master (3353eeafb) and am
reviewing git diff 3353eeafb..HEAD since early patches (i.e. the
first) contain changes, which later patches modify.

(I still consider myself a clojure beginner, but hope my input will
still be of some use.)

+ :inenc and :outenc are better names than :in and :out
+ Platform encoding is a better default than UTF-8 in this particular
case.

* stream-to-bytes and stream-to-string

On Sat, Jun 5, 2010 at 23:49, David Powell <djpo...@djpowell.net> wrote:
> Also, I wonder whether there could be some enhancements for streaming
> support for sh. Currently input and output have to be done via
> in-memory strings or arrays. Simple for some apps, but it wouldn't be
> appropriate if you wanted to write something like a Unix-style
> stdin/stdout filter. Think svnfilter. But perhaps we could add an
> enhancement for that at a later date.

Yea, I'm concerned about that too, and I'm really not sure how best to
approach it other than allowing the caller to pass in a function to be
applied to each of the out/err streams, whereby the responsibility to
actually consume them would be delegated to these functions.

I wonder if 'Scopes' http://www.assembla.com/spaces/clojure/tickets/2
would be of any help here. I seem to recall that this is to address
the trouble with putting a seq abstraction over a resource that isn't
managed by the garbage collector. (i.e. like an InputStream). But, I
could be totally out in left field. The ticket isn't very descriptive. :-/

* in (defn sh ...)

(if (:in opts)
(future
(with-open [osw (OutputStreamWriter. (.getOutputStream proc)
(:inenc opts))]
(.write osw (:in opts))))
(.close (.getOutputStream proc)))

You're creating a future, but never dereferencing it for its
result. Can we be sure its body will actually run?

The with-open block will close the OutputStreamWriter. This, in turn
closes the underlying OutputStream. (I checked the source.) We can
probably rely on this since OutputStreamWriter is part of rt.jar and
Java's known for its backward compatability. (Though, in fairness, the
docs of Writer.close() don't promise to close the underlying stream.)
This makes the (.close (.getOutputStream proc)) after the future
appear redundant.

More seriously: isn't this a race condition? What happens if the
future which is supposed to feeding the String (:in opts) to our
spawned process hasn't finished writing when (.close (.getOutputStream
proc)) gets called? boom?

(if (= (:outenc opts) :bytes)
(pmap #(stream-to-bytes %) [stdout stderr])
(pmap #(stream-to-string % (:outenc opts)) [stdout stderr]))

As implemented, pmap does run into-array on stdout and stdin
concurrently on two separate threads, but does it promise to? Is it a
good idea to rely on this for the correctness of this patch?

// Ben

David Powell

unread,

Jun 6, 2010, 3:52:47 PM6/6/10

to cloju...@googlegroups.com

Hi,

> * in (defn sh ...)

> (if (:in opts)
> (future
> (with-open [osw (OutputStreamWriter. (.getOutputStream proc)
> (:inenc opts))]
> (.write osw (:in opts))))
> (.close (.getOutputStream proc)))

> You're creating a future, but never dereferencing it for its
> result. Can we be sure its body will actually run?

I believe that futures get started asap, before anyone tries to
dereference them.

If the process needs to read from stdin before it can exit, then the
process will be blocking on stdout/stderr or waitFor, so the future
writing to stdin will have opportunity to run whilst the process is
blocked. Once the process exits, that indicates that it no longer has
any interest in stdin, so it doesn't matter whether it has run or not.

If the process exits before stdin has been written to, then I believe
that the stdin stream will become invalid, and will throw an exception
when it gets written/flushed by the future. This shouldn't really
matter though.

Basically if a child process is asynchronously reading stdin, with
respect to exiting, then there is a race condition, but it is in the
implementation of the child process, not in our sh function?

> The with-open block will close the OutputStreamWriter. This, in turn
> closes the underlying OutputStream. (I checked the source.) We can
> probably rely on this since OutputStreamWriter is part of rt.jar and
> Java's known for its backward compatability. (Though, in fairness, the
> docs of Writer.close() don't promise to close the underlying stream.)
> This makes the (.close (.getOutputStream proc)) after the future
> appear redundant.

> More seriously: isn't this a race condition? What happens if the
> future which is supposed to feeding the String (:in opts) to our
> spawned process hasn't finished writing when (.close (.getOutputStream
> proc)) gets called? boom?

If what you are talking about is what I think, then that code confused
me at first.

However, "(.close (.getOutputStream proc))" isn't called after the
future runs - it is called instead. Note that it is the second branch
of the if statement. It is called to close the input stream if the sh
caller opted not to supply any input?

Is that what you meant?

> (if (= (:outenc opts) :bytes)
> (pmap #(stream-to-bytes %) [stdout stderr])
> (pmap #(stream-to-string % (:outenc opts)) [stdout stderr]))

> As implemented, pmap does run into-array on stdout and stdin
> concurrently on two separate threads, but does it promise to? Is it a
> good idea to rely on this for the correctness of this patch?

Yeah, it doesn't specifically promise it, but it seems cumbersome to
duplicate the work of pmap just in case.

Hmm - not sure? Are we using pmap legitimately there?
And does it matter? After all this API is part of Clojure - if the
implementation of pmap changes, we can always change the
implementation of this function too?

--
Dave

B Smith-Mannschott

unread,

Jun 6, 2010, 4:35:10 PM6/6/10

to cloju...@googlegroups.com

On Sun, Jun 6, 2010 at 21:52, David Powell <djpo...@djpowell.net> wrote:

>> * in (defn sh ...)
>
>> (if (:in opts)
>> (future
>> (with-open [osw (OutputStreamWriter. (.getOutputStream proc)
>> (:inenc opts))]
>> (.write osw (:in opts))))
>> (.close (.getOutputStream proc)))

... snip ...

>> More seriously: isn't this a race condition? What happens if the
>> future which is supposed to feeding the String (:in opts) to our
>> spawned process hasn't finished writing when (.close (.getOutputStream
>> proc)) gets called? boom?
>
> If what you are talking about is what I think, then that code confused
> me at first.
>
> However, "(.close (.getOutputStream proc))" isn't called after the
> future runs - it is called instead. Note that it is the second branch
> of the if statement. It is called to close the input stream if the sh
> caller opted not to supply any input?