Bash Version: 3.1
Patch Level: 17
Release Status: release
Description:
With Unix-98 ptys, the builtin echo command gets executed
even when writing to stdout or redirecting to stderr fails, and
the output gets written to the wrong file descriptor if any other
redirection is used in the script.
Repeat-By:
For example, with the following script:
while [ 1 ];do
echo Test1
echo Test2 >> file.txt
sleep 1
done
As expected, when this script is run in the background (&), the
console
slowly fills with "Test1" lines, and the file.txt file slowly
fills with
"Test2" lines.
Now exit the shell leaving the script running (don't simply
close the
xterm, that'd kill the script. Type "exit"). Since the terminal has
closed, stdout is closed, so "echo Test1" should fail. It doesn't,
instead it writes "Test1" lines into whatever open file
descriptor it
can find. In this case, file.txt starts filling up with
Test2
Test1
Test2
Test1
...
This does not happen with BSD-style ptys, because apparently
when the
terminal is closed, the tty seen by the detached bash script stays
intact, and whatever is written to the now-closed terminal is simply
discarded by the kernel, so the script keeps seeing open stdout and
stderr file descriptors. In the case of Unix-98 ptys, this bug
happens
because the tty file descriptors the bash script uses are really
closed
This also does not happen with an external echo command: with
/bin/echo,
the redirection fails and the command is not executed, as expected.
Bonjour Pierre-Philippe,
can be reproduced with 3.2.25 and with:
bash -c 'trap "" PIPE; sleep 1; echo a; echo b > a' | :
It seems to be down to the usage of stdio.
According to ltrace, echo seems to be doing printf("Test1\n")
followed by fflush(stdout). When the write(2) underneath
fflush() fails, "Test1\n" remains in the stdio buffer.
Then bash does an dup2(open("file.txt"), fileno(stdout)) instead
of doing an stdio freopen(3), so the next fflush(3) flushes both
"Test1\n" and "Test2\n" to the now working stdout.
Maybe bash can't use freopen(3) as that would mean closing the
original fd. Best is probably not to use stdio at all.
Note that zsh has the same problem, and AT&T ksh seems to have
an even worse problem (in the example above, it outputs "b\n"
twice). ash and pdksh are OK.
Best regards,
Stéphane
> Bonjour Pierre-Philippe,
>
> can be reproduced with 3.2.25 and with:
>
> bash -c 'trap "" PIPE; sleep 1; echo a; echo b > a' | :
I get this:
$ bash -c 'trap "" PIPE; sleep 1; echo a; echo b > a' | :
bash: line 0: echo: write error: Broken pipe
and the file contains only one line.
Andreas.
--
Andreas Schwab, SuSE Labs, sch...@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Hi Andreas,
What OS and version of glibc? I do get the error message but I
get both a and b in the file.
That was on Linux, glibc 2.6.1.
--
Stéphane
Actually,
bash -c 'echo a; echo b > a' >&-
is enough for me to reproduce the problem.
And that program below shows the same behavior when run as
./a.out >&-
#include <stdio.h>
#include <fcntl.h>
int main()
{
printf("a\n");
fflush(stdout);
dup2(open("a", O_WRONLY|O_CREAT, 0644), 1);
printf("b\n");
fflush(stdout);
return 0;
}
--
Stéphane
- akula, my bleeding edge box, is a Debian-unstable box upgraded
yesterday sept 8, 2007. It runs linux-2.6.17.7, libc6-2.6.1
- kilo, my most up-to-date box where bash still seems to behave properly
with regard to that problem, is also a Debian-unstable box, upgraded on
may 1st, 2007. It runs linux2.6.18, libc6-2.5
On both boxes, I tried Stephane's test line with bash-3.1.17 and bash-2.05b:
On both boxes, with bash-3.1.17, I get the "bash: line 0: echo: write
error: Broken pipe" message, and no error message with bash-2.05b.
On akula, both versions of bash generate a file with 2 lines.
On kilo, both version of bash generate a correct file with 1 line.
I hope this helps. I'll keep the kilo box in "working bash state" if
anybody wants ssh access to it to test further.
Would seem to be down to the version of glibc where the behavior
of fflush() would have changed. But I don't explain why Andreas
doesn't get the same behavior as me with the same version of
glibc.
I tried on a glibc 2.3.4 and with the C file I see only "b" in
the "a" file. Same thing with Solaris and HPUX with the system's
libc. So that would confirm that the behavior changed in the
glibc (probably somewhere after 2.5). And the problem may be
more of a glibc problem than a bash problem.
I've checked SUSv3 and it doesn't say wether the output buffer
should be emptied after a non-successful fflush() (as older
glibc seemed to be doing but as newer ones seem no longer to be
doing).
--
Stéphane
> On Sun, Sep 09, 2007 at 07:10:59PM +0100, Stephane Chazelas wrote:
> [...]
>> What OS and version of glibc? I do get the error message but I
>> get both a and b in the file.
>>
>> That was on Linux, glibc 2.6.1.
> [...]
>
> Actually,
>
> bash -c 'echo a; echo b > a' >&-
>
> is enough for me to reproduce the problem.
Guess you have a buggy libc, then.
I wouldn't be surprised if it has to do with the fix to debian
bug #429021. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=429021
(I'm CCing Dmitry who is the author of that change according to
bugs.debian.org)
I was testing with debian package 2.6.1-2 that includes Dmitry's
fix for that bug. I don't know if that fix will is planned to be
included in the GNU tree, it doesn't seem it yet in the glibc
CVS repository.
Now, I'm not sure if we can say that the new glibc behavior
observed is bogus (other than it's different from the behavior
observed in all the libcs I tried with). It is not a harmless
change, for sure as it seems to have broken at least bash, zsh
and possibly ksh93.
Dmitry, you may find that whole thread at:
http://groups.google.com/group/gnu.bash.bug/browse_thread/thread/e311bdd4f945a21e/621b7189217760f1
Best regards,
Stéphane
Anyway, thanks a lot Stéphane and Andreas for testing this!
According to Stephane Chazelas on 9/9/2007 11:17 AM:
> can be reproduced with 3.2.25 and with:
>
> bash -c 'trap "" PIPE; sleep 1; echo a; echo b > a' | :
>
> It seems to be down to the usage of stdio.
Indeed. I raised this very bug several months ago:
http://lists.gnu.org/archive/html/bug-bash/2007-04/msg00070.html
where the cd builtin has the same issue. I also proposed several
approaches for fixing the issue.
- --
Don't work too hard, make some time for fun as well!
Eric Blake eb...@byu.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG5Kom84KuGfSFAYARAgOfAKCiUMKGYRG8+xRJRIoxM5PSnkMoNACgqane
BFh6hhMjAibDs0PgSf032Xw=
=OF9G
-----END PGP SIGNATURE-----
I can reproduce the "bug" with glibc from etch, or even from sarge, so I
really doubt that it comes from this change.
--
.''`. Aurelien Jarno | GPG: 1024D/F1BCDB73
: :' : Debian developer | Electrical Engineer
`. `' aur...@debian.org | aure...@aurel32.net
`- people.debian.org/~aurel32 | www.aurel32.net
[both "a" and "b" seen in file "a".]
> >> Guess you have a buggy libc, then.
> > [...]
> >
> > I wouldn't be surprised if it has to do with the fix to debian
> > bug #429021. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=429021
> > (I'm CCing Dmitry who is the author of that change according to
> > bugs.debian.org)
> >
>
> I can reproduce the "bug" with glibc from etch, or even from sarge, so I
> really doubt that it comes from this change.
[...]
Hi Aurelien.
The reason I suspected that is that Andreas with a glibc-2.6.1
was not seeing the problem so that it could be because it was a
debian issue. Also Pierre-Philippe says it is not in debian
unstable from 1st of May 2007 (glibc-2.5 based). And the only
diff on libio/fileops.c in glibc-2.6.1-2 is that fix for 429021,
and the log for that bug talks of something very related.
I could not reproduce the problem with a glibc-2.3.4 on an old
RedHat system. That version of glibc was inbetween sarge's
(2.3.2) and etch's (2.3.6).
Andreas, could you please confirm which distribution of Linux
you have and which version of the libc package?
All in all, it would suggest that the change was introduced by
debian if not in the fix for 429021. To sum up:
glibc's fflush seems to empty its buffer upon a unsuccessful
fflush() (a fflush(3) where the write(2) fails) on
- debian unstable glibc 2.5 (according to Pierre-Philippe)
- Andreas' glibc 2.6.1
- Some RedHat glibc 2.3.4 (according to me)
- Solaris 7 system libc (not glibc)
- HPUX 11.11 system libc (not glibc)
And it seems not to empty it in
- debian unstable 2.6.1-2 (according to me and
Pierre-Philippe)
- debian etch (2.3.6?) according to Aurelien
- debian sarge (2.3.2?) according to Aurelien
Best regards,
Stéphane
Hi Dmitry,
thanks for replying, I gave a list in another email. I tried on
Solaris 7 and HPUX and both seem to flush the buffer upon an
unsuccessful fflush()
> > It is not a harmless
> > change, for sure as it seems to have broken at least bash, zsh
> > and possibly ksh93.
>
> Unfortunately, you are right. I did not foresee that some shells may use
> "dup2(open("file.txt"), fileno(stdout))". It is a dirty hack, which may
> cause some other problems. Frankly, I am a bit surprised that bash uses
> printf instead of write(2). BTW, you cannot use 'printf' in signal
> handlers, so it seems that you cannot use 'echo' in trap commands too.
>
> Perhaps, we should rollback my patch and give some time for developers
> to fix their broken shells, but, in this case, what is actually broken
> are those shells, not libc!
[...]
On the other end, how would you force the flush of the buffer?
And how would you redirect stdout? We can use freopen() instead
of the hack above for files, but not for pipes or arbitrary fds
(as in >&3). Erik Blake was suggesting to use freopen(NULL) (not
to fix that very problem but because of the fact that if you
reassign stdout to some resource of a different nature, you need
to tell stdio as stdio may need to operate differently), but
that's not very portable according to POSIX. Would freopen(NULL)
flush the output buffer?
You cannot simply assign stdout to some value returned by
fdopen() as that's not portable either...
--
Stéphane
I'll investigate this evening (BTW, it wasn't Solaris 7, but
Solaris 8).
> > On the other end, how would you force the flush of the buffer?
>
> The flush means to _deliver_ data, which is impossible in this case.
Sorry, I meant flush() as in emptying the buffer (wether
flushing it to the fd or down the drain (discard it)).
BTW, does anybody know why our emails don't seem to make it to
the bash mailing list anymore?
--
Stéphane
I can NOT reproduce the problem with glibc from etch, and I do believe
that my patch caused the aforementioned problem, though I do not think
that the patch was incorrect, as to the real bug lies inside of those
shells.
Dmitry
On Mon, Sep 10, 2007 at 09:08:33AM +0100, Stephane Chazelas wrote:
> thanks for replying, I gave a list in another email. I tried on
> Solaris 7 and HPUX and both seem to flush the buffer upon an
> unsuccessful fflush()
I see... I wonder how they work in regard of my original problem
described in the Bug#429021, because it is possible to not discard data
when write failed, but still clean buffer in fflush(). So, functions
like fwrite, printf will not lose some previously written data on error,
but fflush() will always have a clean output buffer at return, so
it will not break existing software, which use dup2 trick.
> On the other end, how would you force the flush of the buffer?
The flush means to _deliver_ data, which is impossible in this case.
> And how would you redirect stdout? We can use freopen() instead
> of the hack above for files, but not for pipes or arbitrary fds
> (as in >&3).
I see... POSIX has fdopen to create a stream based on the existing
file descriptor, but there is no function to change an existing
stream like 'stdout'. So, I don't know any other portable solution
except avoiding 'stdout'. For some implementations, you can just
assign any FILE pointer to stdout like this:
FILE* out = fdopen(fd, mode);
if (out != NULL)
{
fclose(stdout);
stdout = out;
}
else
report_error();
but in general it does not work, because stdout is rvalue.
> Erik Blake was suggesting to use freopen(NULL) (not
> to fix that very problem but because of the fact that if you
> reassign stdout to some resource of a different nature, you need
> to tell stdio as stdio may need to operate differently), but
> that's not very portable according to POSIX. Would freopen(NULL)
> flush the output buffer?
In Glibc, freopen:
if (filename == NULL && _IO_fileno (fp) >= 0)
{
fd = __dup (_IO_fileno (fp));
if (fd != -1)
filename = fd_to_filename (fd);
}
Then it closes, the original stream and opens a new one in
the same place. So I believe it should work with glibc
provided you do that you called it after dup2 and that your
system have /proc, because fd_to_filename relies on it.
freopen in newlib does not do anything special about NULL,
so I believe it does not work with NULL.
Perhaps, freopen("/dev/stdout") is a more portable way to
do what you want.
Regards,
Dmitry
>
> Unfortunately, you are right. I did not foresee that some shells may use
> "dup2(open("file.txt"), fileno(stdout))". It is a dirty hack, which may
> cause some other problems. Frankly, I am a bit surprised that bash uses
> printf instead of write(2). BTW, you cannot use 'printf' in signal
> handlers, so it seems that you cannot use 'echo' in trap commands too.
Luckily, neither of these things is true.
What's needed is a portable interface like BSD's fpurge(3).
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
Live Strong. No day but today.
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/
I was wrong about suggestion freopen("/dev/stdout") in my previous mail.
It cannot be used to redirect stdout.
Regards,
Dmitry
> What's needed is a portable interface like BSD's fpurge(3).
This is also available from glibc as __fpurge (likewise on Solaris).
Gnulib provides this[1]. Maybe you should consider using
gnulib to enhance the portability of future versions of bash.
[1] http://www.gnu.org/software/gnulib/MODULES.html#module=fpurge
--
Eric Blake
--
View this message in context: http://www.nabble.com/builtin-echo-command-redirection-misbehaves-in-detached-scripts-when-terminal-is-closed-tf4409627.html#a12594005
Sent from the Gnu - Bash mailing list archive at Nabble.com.
Yes, though I have an aversion to calling functions with a `__' prefix
from user application code.
However:
"These functions are nonstandard and not portable."
It would be nice to have something standardized. I can certainly add
yet another configure test for this -- I just wish I didn't have to.
Note that zsh seems to have the same problem as bash here
(except that it uses fwrite + fputc instead of printf).
The problem I saw with ksh93 seems to be unrelated as ksh93
doesn't seem to be using stdio.
Dmitry, your t.c in the debian report gives:
On Solaris 8:
$ ./t
signal handler called, sig=2
error at num_bytes=15352
fputs: Interrupted system call
writer: num_bytes=80000 num_lines=10000
reader: num_bytes=74888 num_lines=9361
reader: number of missing bytes: 5112
On HPUX 11.11:
$ ./t
signal handler called, sig=2
error at num_bytes=16376
fputs: Interrupted system call
fclose: Interrupted system call
reader: num_bytes=71816 num_lines=8977
reader: number of missing bytes: 8184
So they don't seem to care either to retry and send the data
if the first write() fails.
With dietlibc:
$ ./t
signal handler called, sig=2
writer: num_bytes=80008 num_lines=10001
writer: expected num_bytes=80000 but was 80008
reader: num_bytes=80007 num_lines=10000
reader: number of missing bytes: -7
And dietlibc behaves the same as glibc patched with your
(Dmitry's) change upon the fflush. That is bash would misbehave
the same if linked against dietlibc.
I've also verified that if I revert your change and recompile
the glibc, bash's (and zsh's) problem goes away, so that would
confirm if needed be that it was that fix that introduced the
change in behavior.
--
Stéphane
Yes, it seems they purge all data in the IO buffer on error.
> With dietlibc:
>
> $ ./t
> signal handler called, sig=2
> writer: num_bytes=80008 num_lines=10001
> writer: expected num_bytes=80000 but was 80008
> reader: num_bytes=80007 num_lines=10000
> reader: number of missing bytes: -7
>
> And dietlibc behaves the same as glibc patched with your
> (Dmitry's) change upon the fflush.
No, glibc with my patch gives:
$ ./t
signal handler called, sig=2
error at num_bytes=69632
fputs: Interrupted system call
writer: num_bytes=80000 num_lines=10000
reader: num_bytes=80000 num_lines=10000
-7 indicates an error in dietlibc. Somehow, dietlibc does not take into
account that write(2) can write only part of data, and it should not be
considered as an error. But this bug in dietlibc is irrelevant to our
problem. Newlib should work as glibc with my patch, but I have not
tested it.
Dmitry
Sorry for the misunderstanding, I meant "upon the fflush", as in
wrt the issue at stake, that is the fact that dietlibc doesn't
seem to empty the output buffer upon an unsuccessful fflush
either, which confirms what you suspected earlier through
reading the dietlibc code. I did not mean that "t" was behaving
the same in glibc and dietlibc. With the glibc, I obtain:
$ ~/t
signal handler called, sig=2
error at num_bytes=66560
fputs: Interrupted system call
reader: num_bytes=80000 num_lines=10000
writer: num_bytes=80000 num_lines=10000
And with your fix reverted:
.../glibc-2.6.1/build-tree/i386-libc$ LD_LIBRARY_PATH=$PWD ~/t
signal handler called, sig=2
error at num_bytes=66560
fputs: Interrupted system call
writer: num_bytes=80000 num_lines=10000
reader: num_bytes=78976 num_lines=9872
reader: number of missing bytes: 1024
as expected.
Best regards,
Stéphane