Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Strange behaviour on 'read' from a pipe

9 views
Skip to first unread message

Lluís Batlle i Rossell

unread,
Mar 31, 2012, 9:19:38 AM3/31/12
to bug-...@gnu.org
Hello,

I have this script, that I've found to never write "DONE" in my systems, with
bash 4.0, 4.1, 4.2.. until 4.2-p20, my last test.

However, in irc some people told me it prints DONE for them. If I run the script with
bash under 'strace -f', it also prints DONE.

So there is some kind of race in 'read'. The current program output in my computer is:
reading
debug:done


If 'read' unblocked by error (SIGCHLD?), it would print reading again. If
'read' unblocked having read nothing, it would print a newline (it does not).
And of course it does not print the 'DONE' either.

Maybe it's me not understanding something? I suspect that 'read' reacts to
EINTR, and tries to read again, with the error of *reopening the pipe*
(<$PIPE), which will make it block forever.

Here the script:
-----------
#!/var/run/current-system/sw/bin/bash

PIPE=/tmp/pipe

rm -f $PIPE
mkfifo $PIPE

function spawn {
"$@"
echo debug:done
echo DONE > $PIPE
}

spawn sleep 1 &

while true; do
echo reading
while read LINE < $PIPE; do
echo $LINE
exit 0
done
done
-----------------

Regards,
Lluís.

Chet Ramey

unread,
Mar 31, 2012, 9:16:20 PM3/31/12
to Lluís Batlle i Rossell, bug-...@gnu.org, chet....@case.edu
On 3/31/12 9:19 AM, Lluís Batlle i Rossell wrote:
> Hello,
>
> I have this script, that I've found to never write "DONE" in my systems, with
> bash 4.0, 4.1, 4.2.. until 4.2-p20, my last test.
>
> However, in irc some people told me it prints DONE for them. If I run the script with
> bash under 'strace -f', it also prints DONE.

It looks like a simple race condition. I suspect that the scheduler
arranges things so that the child process ends up exiting between the
open and the read, but I don't have any real evidence to back it up.
(Like you, my Mac OS X system prints `DONE'.)

You might want to try using exec to open the FIFO in the parent process
rather than trying to open it on each read.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/

Lluís Batlle i Rossell

unread,
Apr 1, 2012, 4:52:30 AM4/1/12
to Chet Ramey, bug-...@gnu.org
On Sat, Mar 31, 2012 at 09:16:20PM -0400, Chet Ramey wrote:
> On 3/31/12 9:19 AM, Lluís Batlle i Rossell wrote:
> > Hello,
> >
> > I have this script, that I've found to never write "DONE" in my systems, with
> > bash 4.0, 4.1, 4.2.. until 4.2-p20, my last test.
> >
> > However, in irc some people told me it prints DONE for them. If I run the script with
> > bash under 'strace -f', it also prints DONE.
>
> It looks like a simple race condition. I suspect that the scheduler
> arranges things so that the child process ends up exiting between the
> open and the read, but I don't have any real evidence to back it up.
> (Like you, my Mac OS X system prints `DONE'.)
>
> You might want to try using exec to open the FIFO in the parent process
> rather than trying to open it on each read.

Yes, I know I can work around it, but I meant that the race of the parent
getting SIGCHLD in the read call did not bring any useful outcome.

For me, useful outcomes could be:
- read nothing, and unblock with error
- keep on reading the same fd

But it does not do any of those.

Thank you,
Lluís.

Andreas Schwab

unread,
Apr 1, 2012, 5:53:12 AM4/1/12
to chet....@case.edu, Lluís Batlle i Rossell, bug-...@gnu.org
Chet Ramey <chet....@case.edu> writes:

> On 3/31/12 9:19 AM, Lluís Batlle i Rossell wrote:
>> Hello,
>>
>> I have this script, that I've found to never write "DONE" in my systems, with
>> bash 4.0, 4.1, 4.2.. until 4.2-p20, my last test.
>>
>> However, in irc some people told me it prints DONE for them. If I run the script with
>> bash under 'strace -f', it also prints DONE.
>
> It looks like a simple race condition. I suspect that the scheduler
> arranges things so that the child process ends up exiting between the
> open and the read, but I don't have any real evidence to back it up.

Note that the opening of the pipe as part of the redirection in the
parent blocks until there is a writer, ie. until the child opens the
pipe. Can this open call return EINTR?

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Lluís Batlle i Rossell

unread,
Apr 1, 2012, 5:58:16 AM4/1/12
to Andreas Schwab, bug-...@gnu.org, chet....@case.edu
On Sun, Apr 01, 2012 at 11:53:12AM +0200, Andreas Schwab wrote:
> Chet Ramey <chet....@case.edu> writes:
>
> > On 3/31/12 9:19 AM, Lluís Batlle i Rossell wrote:
> >> Hello,
> >>
> >> I have this script, that I've found to never write "DONE" in my systems, with
> >> bash 4.0, 4.1, 4.2.. until 4.2-p20, my last test.
> >>
> >> However, in irc some people told me it prints DONE for them. If I run the script with
> >> bash under 'strace -f', it also prints DONE.
> >
> > It looks like a simple race condition. I suspect that the scheduler
> > arranges things so that the child process ends up exiting between the
> > open and the read, but I don't have any real evidence to back it up.
>
> Note that the opening of the pipe as part of the redirection in the
> parent blocks until there is a writer, ie. until the child opens the
> pipe. Can this open call return EINTR?

Ah, maybe this is the source trouble. EINTR in open, instead of read.

But in this case, even a C program could not be protected against this race
other than blocking signals before opening the descriptor.

Maybe someone else knows better.

Chet Ramey

unread,
Apr 1, 2012, 11:06:22 AM4/1/12
to Andreas Schwab, Lluís Batlle i Rossell, bug-...@gnu.org, chet....@case.edu
On 4/1/12 5:53 AM, Andreas Schwab wrote:

>> It looks like a simple race condition. I suspect that the scheduler
>> arranges things so that the child process ends up exiting between the
>> open and the read, but I don't have any real evidence to back it up.
>
> Note that the opening of the pipe as part of the redirection in the
> parent blocks until there is a writer, ie. until the child opens the
> pipe. Can this open call return EINTR?

open() is supposed to return EINTR only if interrupted by a signal. The
only signal I can see occurring is SIGCHLD, and bash installs the SIGCHLD
handler with SA_RESTART.

Chet

Lluís Batlle i Rossell

unread,
Apr 1, 2012, 1:02:35 PM4/1/12
to Chet Ramey, Andreas Schwab, bug-...@gnu.org
On Sun, Apr 01, 2012 at 11:06:22AM -0400, Chet Ramey wrote:
> On 4/1/12 5:53 AM, Andreas Schwab wrote:
>
> >> It looks like a simple race condition. I suspect that the scheduler
> >> arranges things so that the child process ends up exiting between the
> >> open and the read, but I don't have any real evidence to back it up.
> >
> > Note that the opening of the pipe as part of the redirection in the
> > parent blocks until there is a writer, ie. until the child opens the
> > pipe. Can this open call return EINTR?
>
> open() is supposed to return EINTR only if interrupted by a signal. The
> only signal I can see occurring is SIGCHLD, and bash installs the SIGCHLD
> handler with SA_RESTART.

Then, any idea of what can be happening?

Chet Ramey

unread,
Apr 1, 2012, 6:27:46 PM4/1/12
to Lluís Batlle i Rossell, Andreas Schwab, bug-...@gnu.org, Chet Ramey
It looks like a race condition, like I said. I can't reproduce it on my
system, so I don't have anything to troubleshoot.

Lluís Batlle i Rossell

unread,
Apr 2, 2012, 10:24:18 AM4/2/12
to Chet Ramey, Andreas Schwab, bug-...@gnu.org
On Sun, Apr 01, 2012 at 06:27:46PM -0400, Chet Ramey wrote:
> On 4/1/12 1:02 PM, Lluís Batlle i Rossell wrote:
> > On Sun, Apr 01, 2012 at 11:06:22AM -0400, Chet Ramey wrote:
> >> On 4/1/12 5:53 AM, Andreas Schwab wrote:
> >>
> >>>> It looks like a simple race condition. I suspect that the scheduler
> >>>> arranges things so that the child process ends up exiting between the
> >>>> open and the read, but I don't have any real evidence to back it up.
> >>>
> >>> Note that the opening of the pipe as part of the redirection in the
> >>> parent blocks until there is a writer, ie. until the child opens the
> >>> pipe. Can this open call return EINTR?
> >>
> >> open() is supposed to return EINTR only if interrupted by a signal. The
> >> only signal I can see occurring is SIGCHLD, and bash installs the SIGCHLD
> >> handler with SA_RESTART.
> >
> > Then, any idea of what can be happening?
>
> It looks like a race condition, like I said. I can't reproduce it on my
> system, so I don't have anything to troubleshoot.

Running "strace ./script" says these next. Notice it hangs at open(). If I use
"strace -f" to follow the child, all works.

OTH, I tried in another computer with the same bash+libc (hashes compared), and
it worked there. But in the problematic computer (linux 3.2.11 x86_64) also
other bash/libc versions hanged.

...
read(255, "\nfunction spawn {\n \"$@\"\n e"..., 281) = 201
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
lseek(255, -113, SEEK_CUR) = 168
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7fd40fbbc9d0) = 29446
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
read(255, "\nwhile true; do\n echo reading"..., 281) = 113
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
brk(0x21b2000) = 0x21b2000
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
write(1, "reading\n", 8reading
) = 8
open("/tmp/pipe", O_RDONLYdebug:done
) = ? ERESTARTSYS (To be restarted)
--- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=29446, si_status=0,
si_utime=0, si_stime=0} (Child exited) ---
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 29446
wait4(-1, 0x7fff2c3c0798, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff) = 2
open("/tmp/pipe", O_RDONLY



Lluís Batlle i Rossell

unread,
Apr 2, 2012, 10:39:19 AM4/2/12
to Chet Ramey, Andreas Schwab, bug-...@gnu.org
On Sun, Apr 01, 2012 at 06:27:46PM -0400, Chet Ramey wrote:
> On 4/1/12 1:02 PM, Lluís Batlle i Rossell wrote:
> > On Sun, Apr 01, 2012 at 11:06:22AM -0400, Chet Ramey wrote:
> >> On 4/1/12 5:53 AM, Andreas Schwab wrote:
> >>
> >>>> It looks like a simple race condition. I suspect that the scheduler
> >>>> arranges things so that the child process ends up exiting between the
> >>>> open and the read, but I don't have any real evidence to back it up.
> >>>
> >>> Note that the opening of the pipe as part of the redirection in the
> >>> parent blocks until there is a writer, ie. until the child opens the
> >>> pipe. Can this open call return EINTR?
> >>
> >> open() is supposed to return EINTR only if interrupted by a signal. The
> >> only signal I can see occurring is SIGCHLD, and bash installs the SIGCHLD
> >> handler with SA_RESTART.
> >
> > Then, any idea of what can be happening?
>
> It looks like a race condition, like I said. I can't reproduce it on my
> system, so I don't have anything to troubleshoot.

Trying to reproduce the race, I got rid of 'sleep', and expected this to never
hang. But it hangs where I try. Should I submit this to LKML maybe?

I think it should not hang ever, but maybe I forecast something bad.
-------------
#!/var/run/current-system/sw/bin/bash

PIPE=/tmp/pipe

rm -f $PIPE
mkfifo $PIPE

function spawn {
echo DONE > $PIPE
}

spawn sleep 1 &

while true; do
echo reading
while read LINE < $PIPE; do
echo $LINE
spawn &
done
done
---------------

Lluís Batlle i Rossell

unread,
Apr 2, 2012, 2:46:12 PM4/2/12
to Chet Ramey, Andreas Schwab, bug-...@gnu.org
On Mon, Apr 02, 2012 at 04:39:19PM +0200, Lluís Batlle i Rossell wrote:
> Trying to reproduce the race, I got rid of 'sleep', and expected this to never
> hang. But it hangs where I try. Should I submit this to LKML maybe?
>
> I think it should not hang ever, but maybe I forecast something bad.
> -------------
> #!/var/run/current-system/sw/bin/bash
>
> PIPE=/tmp/pipe
>
> rm -f $PIPE
> mkfifo $PIPE
>
> function spawn {
> echo DONE > $PIPE
> }
>
> spawn sleep 1 &
>
> while true; do
> echo reading
> while read LINE < $PIPE; do
> echo $LINE
> spawn &
> done
> done
> ---------------

Adding a 'sleep 0.1' before 'echo DONE' makes it hang very early in three linux
machines I tried. Let me know if you can reproduce it. Let me know if this helps
you reproduce the problem. Here again:

----------
#!/bin/sh

PIPE=/tmp/pipe

rm -f $PIPE
mkfifo $PIPE
set -x

spawn() {
sleep 0.1
echo DONE > $PIPE
}

spawn &

while true; do

Greg Wooledge

unread,
Apr 2, 2012, 3:29:33 PM4/2/12
to Lluís Batlle i Rossell, bug-...@gnu.org
On Mon, Apr 02, 2012 at 08:46:12PM +0200, Lluís Batlle i Rossell wrote:
> #!/bin/sh

You're running this in sh? But reporting it as a bug in bash?

> PIPE=/tmp/pipe
>
> rm -f $PIPE
> mkfifo $PIPE
> set -x
>
> spawn() {
> sleep 0.1
> echo DONE > $PIPE
> }
>
> spawn &
>
> while true; do
> while read LINE < $PIPE; do
> echo $LINE
> spawn &
> done
> done

I ran this, with #!/bin/bash and with sleep 1 instead of sleep 0.1,
on an HP-UX 10.20 system and a Debian 6.0 (Linux) system. For me,
it printed "DONE" once per second until I pressed ctrl-C, on both
systems.

Changing the sleep 1 back to sleep 0.1 and re-running on the Linux
system produced the same result, just 10 times as fast.

Changing the #!/bin/bash to #!/bin/sh and re-re-running on Linux
still produced the same result.

Lluís Batlle i Rossell

unread,
Apr 2, 2012, 4:17:26 PM4/2/12
to Greg Wooledge, bug-...@gnu.org
On Mon, Apr 02, 2012 at 03:29:33PM -0400, Greg Wooledge wrote:
> On Mon, Apr 02, 2012 at 08:46:12PM +0200, Lluís Batlle i Rossell wrote:
> > #!/bin/sh
>
> You're running this in sh? But reporting it as a bug in bash?

In that case, sh points to bash. Sorry for the confusion I could have caused.

> > while true; do
> > while read LINE < $PIPE; do
> > echo $LINE
> > spawn &
> > done
> > done
>
> I ran this, with #!/bin/bash and with sleep 1 instead of sleep 0.1,
> on an HP-UX 10.20 system and a Debian 6.0 (Linux) system. For me,
> it printed "DONE" once per second until I pressed ctrl-C, on both
> systems.
>
> Changing the sleep 1 back to sleep 0.1 and re-running on the Linux
> system produced the same result, just 10 times as fast.
>
> Changing the #!/bin/bash to #!/bin/sh and re-re-running on Linux
> still produced the same result.

I found the race condition, after implementing it in C. It's there in the script
too.

It can happen that the child spawns, but does not reach the "open pipe writing"
call of echo DONE, while the loop goes another round, runs read LINE again, and
reads *EOF*, exiting the inner while loop.

Sorry for the headache.

Here is the proper version without race condition, that uses the close/read()=0
as synchronisation:

---------------
#!/bin/bash

PIPE=/tmp/pipe

rm -f $PIPE
mkfifo $PIPE

spawn() {
echo DONE > $PIPE
}

spawn &

# read A -- to read EOF from the child, to ensure it closed the pipe.
while (read LINE ; echo $LINE; read A; echo child closed) < $PIPE; do
spawn &
done
----------------

Best regards,
Lluís.

0 new messages