Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

bash process substitution - inconsistent behaviour

383 views
Skip to first unread message

Alexis Huxley

unread,
Nov 22, 2011, 1:46:52 PM11/22/11
to
Hi, I was using process substitution in a script and found some odd behaviour.

Here is a simplified script which demonstrates the problem:


#!/bin/bash

LOCK_DIR=/tmp/$$

spew_and_slurp_with_lock()
{
local I

for ((I=0; I<1000; I++)); do
echo "some junk"
done > >(mkdir $LOCK_DIR; cat > /dev/null; rmdir $LOCK_DIR)
}

main()
{
local J

rm -fr $LOCK_DIR
for ((J=0; J<1000; J++)); do
spew_and_slurp_with_lock
done
}

main

The actual process in my real script's process substitution list was
sqlite3, which was randomly complaining that the database was locked;
for the purposes of demonstrating the problem mkdir+cat+rmdir is a
reasonable simulation of sqlite3 (both sqlite3 and mkdir+cat+rmdir
slurp stdin and use locking).

(I use a main() function here simply to allow be to below unambiguously
references bits of code above.)

The expected output was nothing. The actual output was:

mkdir: cannot create directory `/tmp/2076': File exists
rmdir: failed to remove `/tmp/2076': No such file or directory
mkdir: cannot create directory `/tmp/2076': File exists
rmdir: failed to remove `/tmp/2076': No such file or directory
...

The number of failing mkdir/rmdir pairs is not consistent:

fiori$ ./demo 2>&1 | wc -l
468
fiori$ ./demo 2>&1 | wc -l
470
fiori$ ./demo 2>&1 | wc -l
458
fiori$

I.e. somewhere between 20-25.

It seems to me that the process-substituted list has not finished
before bash moves on to executing the next commmand (in this case:
looping back round in main() to call spew_and_slurp_with_lock()
again). I.e. the N+1'th loop's mkdir is running before the N'th
loop's rmdir, and that results in the 'File exists' message.

The bash man page does not mention that the sustitute process runs
asynchronously, and, indeed, an added call to 'wait' immediately after
the 'for' loop in spew_and_slurp_with_lock() reaps nothing.

Is this a code or documentation bug or did I misunderstand something?
Any ideas please? If the community opinion is that this is a bug then
I'll bashbug it.

I'm using version

GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu)

Thanks!

Alexis

Kaz Kylheku

unread,
Nov 22, 2011, 2:19:31 PM11/22/11
to
On 2011-11-22, Alexis Huxley <ahu...@gmx.net> wrote:
> Any ideas please? If the community opinion is that this is a bug then
> I'll bashbug it.

You should trust your own opinion and bashbug it anyway.

To me, it looks broken. I continue to get a few "failed to remove" messages
even after I kill the script with Ctrl-C.

Bash should wait for and reap all proces substitution jobs before moving on to
the next command.

Barry Margolin

unread,
Nov 22, 2011, 2:19:50 PM11/22/11
to
In article <newscache$36s2vl$t78$1...@farfalle.pasta.net>,
Alexis Huxley <ahu...@gmx.net> wrote:

> The bash man page does not mention that the sustitute process runs
> asynchronously, and, indeed, an added call to 'wait' immediately after
> the 'for' loop in spew_and_slurp_with_lock() reaps nothing.
>
> Is this a code or documentation bug or did I misunderstand something?
> Any ideas please? If the community opinion is that this is a bug then
> I'll bashbug it.
>
> I'm using version
>
> GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu)

It's been like this for a while. I'm running 3.2.48(1)-release:

imac:barmar $ echo foo > >(cat ; sleep 5; echo bar)
foo
imac:barmar $ bar

The second prompt appears immediately, and "bar" appears 5 seconds later.

If you want to write stdout to a process and wait for it to finish, why
not just use an ordinary pipe?

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

Ian Fitchet

unread,
Nov 25, 2011, 10:02:29 AM11/25/11
to
On Nov 22, 7:19 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
> imac:barmar $ echo foo > >(cat ; sleep 5; echo bar)
> foo
> imac:barmar $ bar

Hi,

It's not a bug, it's exactly the right behaviour, the problem is that
your substituted process is outputting to the same tty and is
confusing you.

Process substitution has the effect of substituting a file name for
those commands that require a file. If we remove the IO redirection
from Barry's example we can see that substitution:

% echo foo >(cat ; sleep 5; echo bar)
foo /dev/fd/63
% bar

The ">(cat ; sleep 5; echo bar) " has been replaced by "/dev/fd/63".
Note here that I've not given the substituted process any input, so
its "cat" immediately gets EOF, the substituted process can then
"sleep 5" and finally "echo bar" to its stdout (the tty).

Barry's original example was, in effect:

% [...magic...]
% echo foo > /dev/fd/63
foo
% bar

You would hope that the shell can "echo foo" into a file quite
quickly and then move on to the next command (or wait for you to type
something). The substituted process is churning away, initially
running cat which blocks/reads its stdin until EOF, sleeping for 5
seconds then running "echo bar" to its stdout (the tty).

Suppose instead you'd written:

% echo foo > >(exec > /tmp/foo; cat ; sleep 5; echo bar)

There's no output! That's because the stdout of the substituted
process has been sent to /tmp/foo. If you run it again and/or are
quick:

% cat /tmp/foo
foo
% cat /tmp/foo
foo
bar

Cheers,

Ian

Barry Margolin

unread,
Nov 26, 2011, 1:30:48 AM11/26/11
to
In article
<025fac44-7865-4f62...@l24g2000yqm.googlegroups.com>,
Ian Fitchet <ian.f...@gmail.com> wrote:

> On Nov 22, 7:19 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
> > imac:barmar $ echo foo > >(cat ; sleep 5; echo bar)
> > foo
> > imac:barmar $ bar
>
> Hi,
>
> It's not a bug, it's exactly the right behaviour, the problem is that
> your substituted process is outputting to the same tty and is
> confusing you.

His issue had nothing to do with the output ordering. His complaint is
that the shell doesn't wait for the commands in the process substitution
to exit.

>
> Process substitution has the effect of substituting a file name for
> those commands that require a file. If we remove the IO redirection
> from Barry's example we can see that substitution:
>
> % echo foo >(cat ; sleep 5; echo bar)
> foo /dev/fd/63
> % bar
>
> The ">(cat ; sleep 5; echo bar) " has been replaced by "/dev/fd/63".
> Note here that I've not given the substituted process any input, so
> its "cat" immediately gets EOF, the substituted process can then
> "sleep 5" and finally "echo bar" to its stdout (the tty).
>
> Barry's original example was, in effect:
>
> % [...magic...]
> % echo foo > /dev/fd/63
> foo
> % bar
>
> You would hope that the shell can "echo foo" into a file quite
> quickly and then move on to the next command (or wait for you to type
> something). The substituted process is churning away, initially
> running cat which blocks/reads its stdin until EOF, sleeping for 5
> seconds then running "echo bar" to its stdout (the tty).

I think what he's expecting is that the shell should do something
analogous to:

mkfifo /tmp/foo
(cat : sleep 5; echo bar) </tmp/foo &
proc=$!
echo foo > /tmp/foo
rm /tmp/foo
wait $proc

It's doing all this EXCEPT for the "wait" at the end.

Alan Curry

unread,
Nov 26, 2011, 3:32:47 AM11/26/11
to
>On Nov 22, 7:19 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
>> imac:barmar $ echo foo > >(cat ; sleep 5; echo bar)
>> foo
>> imac:barmar $ bar
>
>Hi,
>
> It's not a bug, it's exactly the right behaviour, the problem is that
>your substituted process is outputting to the same tty and is
>confusing you.

zsh, which originated the >(...) syntax, waits for the process in the above
example. bash is imitating it incorrectly.

Why call the newer, more-confusing behavior "right"?

--
Alan Curry

Kaz Kylheku

unread,
Nov 26, 2011, 12:02:56 PM11/26/11
to
On 2011-11-25, Ian Fitchet <ian.f...@gmail.com> wrote:
> On Nov 22, 7:19 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
>> imac:barmar $ echo foo > >(cat ; sleep 5; echo bar)
>> foo
>> imac:barmar $ bar
>
> Hi,
>
> It's not a bug, it's exactly the right behaviour, the problem is that
> your substituted process is outputting to the same tty and is
> confusing you.

I seriously doubt Barry is confused by tty output, like some newbie.

> Process substitution has the effect of substituting a file name for
> those commands that require a file.

It's not a file name, but the name of a pipe-like object connected to a process
on the other side. (Bash documentation says it may even be a FIFO, on
systems that don't have special /dev or /proc files). The process
on the other side of that pipe-thing is being leaked out of the command. That
is a bug.

This is the same as if ordinary redirection kept running in the background

$ echo foo | (cat ; sleep 5; echo bar)
foo
$ # we got our prompt back immediately!
bar # 5 second later, WTF?

These two commands must behave in the same way for process substitution
to be a sound concept:

$ echo foo | (cat ; sleep 5; echo bar)
$ echo foo > >(cat ; sleep 5; echo bar)

The shell reaps the processes arranged in a pipeline before moving on to the
next pipeline, and this behavior should work in all kinds of substitutions
involving processes. The exceptions to that occur when & is used, or when
interactive job control is issued to background a task.

In effect the syntax > >cmd is the same as | cmd, and the shell
should be free to optimize it that way at the abstract syntax level.

I.e. if the /dev/whatever hidden pathname is the operand of a redirection, then
it is pointless and can be turned into a pipe going directly to the target
process.

Ian Fitchet

unread,
Dec 1, 2011, 12:20:33 PM12/1/11
to
On Nov 26, 5:02 pm, Kaz Kylheku <k...@kylheku.com> wrote:
> I seriously doubt Barry is confused by tty output, like some newbie.

Indeed so and my apologies to Barry. I wanted to use his concise
example but should have clearly directed my subsequent answer to
Alexis.

Perhaps I have missed a subtle semantic.

> It's not a file name, but the name of a pipe-like object

Indeed, hence we have

$ echo foo > *word*
$ echo foo | *cmd*

The semantics of the pipeline are that the commands individually run
to completion before the pipeline is complete.

The semantics of IO redirection is that the command completes when
its output to *word* completes (ie *word* has consumed all of the
command's output). Should echo block until the reader of an arbitrary
fifo completes?

> The process
> on the other side of that pipe-thing is being leaked out of the command. That
> is a bug.

I don't understand what you mean by the process being leaked.

> In effect the syntax > >cmd  is the same as | cmd, and the shell
> should be free to optimize it that way at the abstract syntax level.

I'm happy for "> >(cmd)" to be a synonym for "| cmd" if that's how
it's meant to be although it seems superfluous to have two distinct
syntaxes.

> I.e. if the /dev/whatever hidden pathname is the operand of a redirection, then
> it is pointless and can be turned into a pipe going directly to the target
> process.

On the other hand, having some mechanism to substitute a name for a
cmd has proven quite useful where other commands only accept files:

$ diff <(cmd1) <(cmd2)

However unportable (unpalatable?) that may be.

Cheers,

Ian

Barry Margolin

unread,
Dec 1, 2011, 1:00:32 PM12/1/11
to
In article
<2fb51ee1-e3e0-4518...@g21g2000yqc.googlegroups.com>,
Ian Fitchet <ian.f...@gmail.com> wrote:

> On Nov 26, 5:02 pm, Kaz Kylheku <k...@kylheku.com> wrote:
> > In effect the syntax > >cmd  is the same as | cmd, and the shell
> > should be free to optimize it that way at the abstract syntax level.
>
> I'm happy for "> >(cmd)" to be a synonym for "| cmd" if that's how
> it's meant to be although it seems superfluous to have two distinct
> syntaxes.

Of course, because Unix never has multiple ways to express the same
thing. :)

But this issue isn't really specific to output redirection, it's any
time process substitution is used for an output file:

dd of=>(cmd)

curl -o >(cmd) <url>

Shouldn't these also wait for cmd to complete? If you didn't want it to
wait, you could use >(cmd&).

>
> > I.e. if the /dev/whatever hidden pathname is the operand of a redirection,
> > then
> > it is pointless and can be turned into a pipe going directly to the target
> > process.
>
> On the other hand, having some mechanism to substitute a name for a
> cmd has proven quite useful where other commands only accept files:
>
> $ diff <(cmd1) <(cmd2)
>
> However unportable (unpalatable?) that may be.

That's been my most common use of process substitution.

Kaz Kylheku

unread,
Dec 1, 2011, 1:46:33 PM12/1/11
to
On 2011-12-01, Ian Fitchet <ian.f...@gmail.com> wrote:
> On Nov 26, 5:02 pm, Kaz Kylheku <k...@kylheku.com> wrote:
>> I seriously doubt Barry is confused by tty output, like some newbie.
>
> Indeed so and my apologies to Barry. I wanted to use his concise
> example but should have clearly directed my subsequent answer to
> Alexis.
>
> Perhaps I have missed a subtle semantic.
>
>> It's not a file name, but the name of a pipe-like object
>
> Indeed, hence we have
>
> $ echo foo > *word*
> $ echo foo | *cmd*
>
> The semantics of the pipeline are that the commands individually run
> to completion before the pipeline is complete.

Therefore, it would be desireable if this would continue to hold
if we instead express it is :

echo foo > >(*cmd*)

All commands run to completion: echo and *cmd*.

> The semantics of IO redirection is that the command completes when
> its output to *word* completes (ie *word* has consumed all of the
> command's output).

This is the same as "commands individually run to completion before the
pipeline is complete". Only, you just have one command (*word* is not a
command).

> Should echo block until the reader of an arbitrary
> fifo completes?

No because in echo foo | *cmd*, it does not do so.

The standard input pipe of *cmd* satisfies the definition of "arbitrary fifo".

>> The process
>> on the other side of that pipe-thing is being leaked out of the command. That
>> is a bug.
>
> I don't understand what you mean by the process being leaked.

Leaked means that something has escaped beyond the scope of its
proper duration.

This is the issue exhibited by the Bash implementation of the concept.

The process continues to execute and all of its visible side effets
happen beyond the "fence" that should divide consecutive commands.

>> In effect the syntax > >cmd  is the same as | cmd, and the shell
>> should be free to optimize it that way at the abstract syntax level.
>
> I'm happy for "> >(cmd)" to be a synonym for "| cmd" if that's how
> it's meant to be although it seems superfluous to have two distinct
> syntaxes.

Sure, and likewise since we can do both x + x and 2 * x, let's
throw away one of them.

We must combat redundancy by fighting against it! :)

>> I.e. if the /dev/whatever hidden pathname is the operand of a redirection, then
>> it is pointless and can be turned into a pipe going directly to the target
>> process.
>
> On the other hand, having some mechanism to substitute a name for a
> cmd has proven quite useful where other commands only accept files:
>
> $ diff <(cmd1) <(cmd2)

Well, no kidding; the pipe-like object isn't the operand of a redirection here.

You're just re-iterating that this syntax does something that you can't
easily do otherwise, which is obvious.

Ian Fitchet

unread,
Dec 1, 2011, 3:26:01 PM12/1/11
to
On Dec 1, 6:00 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
> But this issue isn't really specific to output redirection, it's any
> time process substitution is used for an output file:

That's sort of my point. I see ">(cmd) " as a form of co-processing
albeit one where communication is through an object in the filesystem
which turns out to be quite handy for situations where other commands
only operate on files.

The use of filesystem objects is different to commands. Does

$ echo foo | /etc/group

have any useful semantic? Any more than

$ echo foo > ls

(where "ls" here represents the invocation of the command, ls).

It's this difference between filesystem objects and commands that
drives my distinction.


I wanted to present an example along the lines of:

>(while read line ; do sleep 5 ; echo processed $line ; done)

where you could repeatedly "echo foo" into it, ie. a form of co-
processing where you wouldn't expect >(cmd) to finish. Here, the loop
only runs for the first command to write to the construct.

As an aside, the co-process idea can be glimpsed here:

$ exec 3>&1 > >(read line ;echo read $line ; sleep 5 ; echo processed
$line) 4>&1 >&3
[...pause...]
$ echo foo >& 4
$ read foo
processed foo


> dd of=>(cmd)
>
> curl -o >(cmd) <url>
>
> Shouldn't these also wait for cmd to complete?

From my co-processing viewpoint, no :)

Cheers,

Ian

Kaz Kylheku

unread,
Dec 1, 2011, 4:08:08 PM12/1/11
to
On 2011-12-01, Ian Fitchet <ian.f...@gmail.com> wrote:
> On Dec 1, 6:00 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
>> But this issue isn't really specific to output redirection, it's any
>> time process substitution is used for an output file:
>
> That's sort of my point. I see ">(cmd) " as a form of co-processing
> albeit one where communication is through an object in the filesystem
> which turns out to be quite handy for situations where other commands
> only operate on files.

The filesystem is only used to do a lookup for the purpose of constructing the
pipe. After that, it's not different from | cmd.

Pipe created by open() on a fifo, or by pipe() substitute for each other
once the file descriptors are obtained.

> The use of filesystem objects is different to commands. Does
>
> $ echo foo | /etc/group
>
> have any useful semantic? Any more than

I don't see where you are going with this. Yes, some combinations of symbols
have no assigned semantics.

For instance:

echo foo | >(command)


> As an aside, the co-process idea can be glimpsed here:
>
> $ exec 3>&1 > >(read line ;echo read $line ; sleep 5 ; echo processed
> $line) 4>&1 >&3
> [...pause...]
> $ echo foo >& 4
> $ read foo
> processed foo

But now you have background processing that you did not ask for anywhere
(no ampersand).

It's not that this behavior isn't useful, it's that it's there by default,
with no way to disable it.

>> dd of=>(cmd)
>>
>> curl -o >(cmd) <url>
>>
>> Shouldn't these also wait for cmd to complete?
>
> From my co-processing viewpoint, no :)

That should be arranged by

curl -o >(cmd) <url> &
# now curl and cmd are running as a "background process group"

Alan Curry

unread,
Dec 1, 2011, 4:34:10 PM12/1/11
to
In article <afefa2fe-0c5d-4603...@v29g2000yqv.googlegroups.com>,
Ian Fitchet <ian.f...@gmail.com> wrote:
>On Dec 1, 6:00 pm, Barry Margolin <bar...@alum.mit.edu> wrote:
>> But this issue isn't really specific to output redirection, it's any
>> time process substitution is used for an output file:
>
> That's sort of my point. I see ">(cmd) " as a form of co-processing
>albeit one where communication is through an object in the filesystem
>which turns out to be quite handy for situations where other commands
>only operate on files.
>

That's an interesting perspective. I see it instead as an advanced form of
pipeline construction, where you can hook process inputs and outputs up to
each other as pretty much an arbitrary directed acyclic graph, instead of the
straight line that you get with the standard pipeline.

Of course this is only useful for processes that can have more than one input
or more than one output. diff and cmp are good examples for input. For
output, tee:

expensive_computation | tee >(grep thing1 > list1) | grep thing2 > list2

There's no reason that the greps shouldn't be considered equal. The 2 pipes
were constructed using different syntax, but once the job is running, they're
the same. It only makes sense to consider the command finished when all of
the processes have exited. And there's no way other than >(...) syntax to
request that arrangement.

--
Alan Curry

Barry Margolin

unread,
Dec 1, 2011, 5:10:45 PM12/1/11
to
In article
<afefa2fe-0c5d-4603...@v29g2000yqv.googlegroups.com>,
Ian Fitchet <ian.f...@gmail.com> wrote:

> > dd of=>(cmd)
> >
> > curl -o >(cmd) <url>
> >
> > Shouldn't these also wait for cmd to complete?
>
> From my co-processing viewpoint, no :)

If you want a coprocess that can persist, just add an ampersand:

curl -o >(cmd&) <url>

The shell already provides a way to put things into the background, what
it doesn't have is a way to force things into the foreground in this
case.

Ian Fitchet

unread,
Dec 1, 2011, 6:25:21 PM12/1/11
to
On Dec 1, 9:08 pm, Kaz Kylheku <k...@kylheku.com> wrote:
> >  That's sort of my point.  I see ">(cmd) " as a form of co-processing
> > albeit one where communication is through an object in the filesystem
> > which turns out to be quite handy for situations where other commands
> > only operate on files.
>
> The filesystem is only used to do a lookup for the purpose of constructing the
> pipe.  After that, it's not different from   | cmd.

I grant you that

> >(cmd)
| cmd

are functionally equivalent but only because of the use of >. As
others have noted the >(cmd) construct is useful as a standalone
entity where a command (or IO redirection) require a filesystem object
as an argument.

> >  The use of filesystem objects is different to commands.  Does
>
> >  $ echo foo | /etc/group
>
> >  have any useful semantic?  Any more than
>
> I don't see where you are going with this. Yes, some combinations of symbols
> have no assigned semantics.

What I wanted to do was distinguish between "> >(cmd)" and
">(cmd)" (ie. without the IO redirection). The latter, which
substitutes a filesystem entity, is a useful construct as others have
shown. Yes, there are other ways to do this (Barry's example of
mknod, backgrounded subshell with stdin redirected etc.) but the
>(cmd) construct is succinct. Which is what we want from the shell.
I'm sure we could replace | with a sequence of subshells and IO
redirection but what's the point when you have a succinct operator
like |?

> >  $ exec 3>&1 > >(read line ;echo read $line ; sleep 5 ; echo processed

> But now you have background processing that you did not ask for anywhere
> (no ampersand).

A poor man's coproc before it was builtin.

Cheers,

Ian

Ian Fitchet

unread,
Dec 1, 2011, 6:31:02 PM12/1/11
to
On Dec 1, 9:34 pm, pac...@kosh.dhis.org (Alan Curry) wrote:
> That's an interesting perspective. I see it instead as an advanced form of
> pipeline construction, where you can hook process inputs and outputs up to
> each other as pretty much an arbitrary directed acyclic graph, instead of the
> straight line that you get with the standard pipeline.

Yes.

> For output, tee:
>
>   expensive_computation | tee >(grep thing1 > list1) | grep thing2 > list2

tee being another example which requires a filesystem object for its
first argument.

Cheers,

Ian

Janis Papanagnou

unread,
Dec 1, 2011, 7:25:52 PM12/1/11
to
...for all its arguments, I'd say

e_c | tee >(grep one >o1) >(grep two >o2) >(grep three >o3) | grep four >o4


Janis

>
> Cheers,
>
> Ian

0 new messages