Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Lost process output in pipe between Emacs and CVS

2 views
Skip to first unread message

Stefan Monnier

unread,
Jul 7, 2002, 7:09:37 PM7/7/02
to

Over the last few years, I've had several times people complain about
PCL-CVS' diff output being incorrect (with parts missing).

Since the FSF' kerberos server was down, I had to switch to SSH
access and finally for the first time I've experienced this lossage
first hand.

Assuming that the problem is unrelated to CVS or to SSH,
it seems that the only difference between when it worked and now
is that access via Kerberos is significantly faster (at least
in terms of latency, but maybe also bandwidth). But from CVS'
point of view, the difference is more significant since Kerberos
is directly supported (I use the :gserver: access method)
whereas SSH goes through an external process.

So it looks a bit like a race condition, but the problem is that
it is surprisingly reproducible.

When I open PCL-CVS on the emacs/src directory and hit `=' on
the `minibuf.c' file, I get a *vc-diff* buffer that contains
the first 4096 chars of the diff output and the last 898
for a total size of 4964.

By contrast

% cvs diff src/minibuf.c |(sleep 120; wc)
768 3146 21472

so 16478 were lost somewhere on the way. I tried to investigate
a bit more by placing a breakpoint on emacs_read and then doing

p read (fildes, buf, nbyte)

and I see the exact same lossage: the first four calls return
1024 chars each (the first 4096 of the output) and the next one
returns the last 898 chars of the output (and the next one returns 0).

Another notable thing is that this whole problem disappears if I
change PCL-CVS to use a pty rather than a pipe for the process' output.
(I noticed it because the problem doesn't appear with VC which is
not careful to use a pipe).

Now my direct `read' calls from GDB make me believe that maybe the
problem is not in Emacs, but in CVS instead (I use cvs-1.11.1p1
from the Redhat distribution, with a mix of Redhat-7.2
and Redhat-7.3 GNU/Linux system).

But the fact that `cvs diff src/minibuf.c |(sleep 120; wc)' works
correctly make me think that maybe it is a bug in Emacs.

Could anybody help me out with insight/hints/patches/chocolates ?


Stefan

Ian Lance Taylor

unread,
Jul 8, 2002, 12:13:38 PM7/8/02
to
"Stefan Monnier" <monnier+gnu/emacs/pre...@rum.cs.yale.edu> writes:

> Over the last few years, I've had several times people complain about
> PCL-CVS' diff output being incorrect (with parts missing).
>

> Now my direct `read' calls from GDB make me believe that maybe the
> problem is not in Emacs, but in CVS instead (I use cvs-1.11.1p1
> from the Redhat distribution, with a mix of Redhat-7.2
> and Redhat-7.3 GNU/Linux system).
>
> But the fact that `cvs diff src/minibuf.c |(sleep 120; wc)' works
> correctly make me think that maybe it is a bug in Emacs.
>
> Could anybody help me out with insight/hints/patches/chocolates ?

I use CVS over SSH. I see the same problem when I pipe the output of
cvs diff to less. I never see it when I redirect cvs output to a
file. There is no emacs involved, so it's not an emacs bug. I'm
running an old CVS client though--pre 1.11. I haven't tried to track
down the problem.

Ian

Ian Lance Taylor

unread,
Jul 8, 2002, 2:15:40 PM7/8/02
to
"Stefan Monnier" <monnier+gnu/emacs/pre...@rum.cs.yale.edu> writes:

> Another notable thing is that this whole problem disappears if I
> change PCL-CVS to use a pty rather than a pipe for the process' output.
> (I noticed it because the problem doesn't appear with VC which is
> not careful to use a pipe).
>
> Now my direct `read' calls from GDB make me believe that maybe the
> problem is not in Emacs, but in CVS instead (I use cvs-1.11.1p1
> from the Redhat distribution, with a mix of Redhat-7.2
> and Redhat-7.3 GNU/Linux system).
>
> But the fact that `cvs diff src/minibuf.c |(sleep 120; wc)' works
> correctly make me think that maybe it is a bug in Emacs.
>
> Could anybody help me out with insight/hints/patches/chocolates ?

I see the problem. It only happens when you do the equivalent of
cvs COMMAND 2>&1
and you are using ssh.

When CVS execs ssh, it sets up pipes for file descriptors 0 and 1, but
not for file descriptor 2. Thus ssh inherits file descriptor 2 from
CVS. If you have done 2>&1, this is the same as file descriptor 1.

ssh puts file descriptors 0, 1, 2 into non-blocking mode. Since ssh
and CVS are using the same file descriptor for descriptor 2, this has
the effect of putting CVS's file descriptor 2 into non-blocking mode.
Since we're talking about the case of 2>&1, this has the effect of
putting CVS's file descriptor 1 into non-blocking mode.

If you have enough data, the CVS client fills up the output buffer on
stdout (file descriptor 1). The CVS client does not expect this file
descriptor to be in non-blocking mode. The call to fwrite or fflush
fails with EAGAIN, but CVS does not check for an error. Instead, the
data is silently lost.

I don't know what the general fix is. It makes sense for CVS to leave
file descriptor 2 untouched when executing the CVS_RSH program, so
that any errors from that program will appear on stderr.

Now that I understand what is happening, I can easily fix the
particular problem I'm seeing by setting CVS_RSH to a shell script
which does this:
ssh $* 2>/dev/null
This disconnects the file descriptor 2 which ssh sees from the one
which CVS is using, and everything works fine.

Of course, the CVS client should be changed to check for errors when
doing output to stdout and stderr. This would be simple changes to
handle_m and handle_e in src/client.c.

Ian

Richard Stallman

unread,
Jul 8, 2002, 2:20:07 PM7/8/02
to
But the fact that `cvs diff src/minibuf.c |(sleep 120; wc)' works
correctly make me think that maybe it is a bug in Emacs.

It might be a bug in Emacs that fails to read everything coming thru
the pipe when the subprocess terminates. Can you see if it is that?
Did you already check and see what happens when Emacs sees that the
subprocess has terminated?

sigchld_handler just sets a few flags; the real work is done
later on by the code that checks the flags. For an async subprocess,
it is done in status_notify. It has code to read any remaining
output, but there could be a bug in it.

Ian Lance Taylor

unread,
Jul 8, 2002, 10:16:19 PM7/8/02
to
Richard Stallman <r...@gnu.org> writes:

> I don't know what the general fix is. It makes sense for CVS to leave
> file descriptor 2 untouched when executing the CVS_RSH program, so
> that any errors from that program will appear on stderr.
>

> Can CVS be made to cope with EAGAIN on stdout?
> That would fix the problem, right? Even if did something stupid
> like sleep for a second and try again, that would still fix the bug.

CVS uses stdio, and stdio doesn't work very well with non-blocking
file descriptors. CVS can detect the error easily enough, but at that
point stdio has already thrown away the buffer full of data, and CVS
hasn't recorded it anywhere. While in principle CVS could switch to
not use stdio, that would be a fairly substantial change.

Perhaps the emacs CVS mode can avoid the problem by not dup-ing
descriptor 1 to descriptor 2. That won't help other users, though.

I suppose that before CVS does a write to stdout or stderr due to data
read from the child process, it could check the blocking status of the
file descriptor, and temporarily block it if necessary. It would be
tedious to check every time, but it would probably be safe to assume
that if the descriptor were ever blocking, it would remain blocking
thereafter.

Ian

"Jesús M. NAVARRO"

unread,
Jul 8, 2002, 9:45:44 PM7/8/02
to
Hi, Richard:

Richard Stallman wrote:
> But the fact that `cvs diff src/minibuf.c |(sleep 120; wc)' works
> correctly make me think that maybe it is a bug in Emacs.
>
> It might be a bug in Emacs that fails to read everything coming thru
> the pipe when the subprocess terminates. Can you see if it is that?
> Did you already check and see what happens when Emacs sees that the
> subprocess has terminated?
>

Could you explain how do you manage to seem not to be there, but still
jump as soon as something falls "under your jurisdiction"?
Real intrigued, since I can't imagine you really follow this newsgroup
just waiting for a message to be interesting to you!!!
--
SALUD,
Jesus
***
jmnav...@able.es
***
Desde Zaragoza, sysadmin busca empleo -
http://www.geocities.com/jesusm_navarro/CV/cv.html
***

Richard Stallman

unread,
Jul 9, 2002, 2:51:48 PM7/9/02
to
CVS uses stdio, and stdio doesn't work very well with non-blocking
file descriptors. CVS can detect the error easily enough, but at that
point stdio has already thrown away the buffer full of data, and CVS
hasn't recorded it anywhere. While in principle CVS could switch to
not use stdio, that would be a fairly substantial change.

Many programs use stdio, and these programs ought to work when run
under ssh. This suggests that ssh and stdio are responsible for the
problem, and one or the other of them should be fixed.

It would be best to fix stdio, I think. What platform is CVS running
on when it fails? Does stdio in GNU libc handle non-blocking file
descriptors properly? If not, we can fix it.

Ian Lance Taylor

unread,
Jul 9, 2002, 3:09:06 PM7/9/02
to
Richard Stallman <r...@gnu.org> writes:

> CVS uses stdio, and stdio doesn't work very well with non-blocking
> file descriptors. CVS can detect the error easily enough, but at that
> point stdio has already thrown away the buffer full of data, and CVS
> hasn't recorded it anywhere. While in principle CVS could switch to
> not use stdio, that would be a fairly substantial change.
>
> Many programs use stdio, and these programs ought to work when run
> under ssh. This suggests that ssh and stdio are responsible for the
> problem, and one or the other of them should be fixed.

When programs are run under ssh, they will work correctly.

This is a different case. CVS is invoking ssh in a child process in a
way which causes them to share file descriptor 2. ssh is then
unblocking file descriptor 2 in a way which CVS does not expect. This
type of problem can only happen with programs which invoke ssh.

> It would be best to fix stdio, I think. What platform is CVS running
> on when it fails? Does stdio in GNU libc handle non-blocking file
> descriptors properly? If not, we can fix it.

The failure which I see occurs on GNU/Linux using GNU libc 2.2.5.
Specifically, in new_do_write() in libio/fileops.c, if the call to
_IO_SYSWRITE() (i.e., write()) fails for any reason, including EAGAIN,
the contents of the output buffer are discarded.

Ian

Stefan Monnier

unread,
Jul 9, 2002, 4:39:24 PM7/9/02
to
Thanks Ian for your help. I'm rather glad to hear that it can be
reproduced outside of Emacs since I can now lay the blame on someone
else (people used to say "it doesn't work in PCL-CVS but it works with
VC so it must be a problem with PCL-CVS" ;-).

> > CVS uses stdio, and stdio doesn't work very well with non-blocking
> > file descriptors. CVS can detect the error easily enough, but at that
> > point stdio has already thrown away the buffer full of data, and CVS
> > hasn't recorded it anywhere. While in principle CVS could switch to
> > not use stdio, that would be a fairly substantial change.

My Unix programming is a bit rusty, so could someone explain to me
why SSH changing the blocking status of its file-descriptor would
have any impact on CVS' file-descriptor ? I understand the notion
of duplicating/sharing file-descriptors, but I would expect the
"blocking" status of a file-descriptor to be per-process rather than
global to all processes sharing that file-descriptor.


Stefan

Ian Lance Taylor

unread,
Jul 9, 2002, 5:47:35 PM7/9/02
to
"Stefan Monnier" <monnier+gnu/emacs/pre...@rum.cs.yale.edu> writes:

> > > CVS uses stdio, and stdio doesn't work very well with non-blocking
> > > file descriptors. CVS can detect the error easily enough, but at that
> > > point stdio has already thrown away the buffer full of data, and CVS
> > > hasn't recorded it anywhere. While in principle CVS could switch to
> > > not use stdio, that would be a fairly substantial change.
>
> My Unix programming is a bit rusty, so could someone explain to me
> why SSH changing the blocking status of its file-descriptor would
> have any impact on CVS' file-descriptor ? I understand the notion
> of duplicating/sharing file-descriptors, but I would expect the
> "blocking" status of a file-descriptor to be per-process rather than
> global to all processes sharing that file-descriptor.

You might expect that, but you would be wrong.

First, don't get confused by process boundaries. In Unix, file
descriptors point to file structures. The table of file descriptors
is process specific; the file structures are not.

There are flags associated with a specific file descriptor, which do
not apply to duped file descriptors; these are the ones returned by
fcntl(F_GETFD), and the only standard one is FD_CLOEXEC.

There are flags associated with a specific file structure, which do
apply to duped file descriptors; these are the ones returned by
fcntl(F_GETFL), and include O_APPEND and O_NONBLOCK.

This is all more or less so that >> works correctly in /bin/sh.

Ian

Richard Stallman

unread,
Jul 11, 2002, 8:01:53 AM7/11/02
to
This is a different case. CVS is invoking ssh in a child process in a
way which causes them to share file descriptor 2. ssh is then
unblocking file descriptor 2 in a way which CVS does not expect. This
type of problem can only happen with programs which invoke ssh.

In that case, maybe CVS should have code to cope with this case. It
must be quite common. Perhaps CVS should avoid using stdio. If that
is too hard, how about making CVS pipe the output through `cat' when
it invokes a subprocess? That should be easy.

Please don't dismiss that simple idea because it is "inefficient"; the
inefficiency of this will usually be insignificant compared with the
slowness of the network, and this needs to be solved somehow. If you
want to implement a more efficient and better solution, by all means
do; if you won't do that, please at least do this!


I will also forward the message to the Glibc maintainers.

Derek Robert Price

unread,
Jul 19, 2002, 3:58:23 PM7/19/02
to
Richard Stallman wrote:

> This is a different case. CVS is invoking ssh in a child process in a
> way which causes them to share file descriptor 2. ssh is then
> unblocking file descriptor 2 in a way which CVS does not expect. This
> type of problem can only happen with programs which invoke ssh.
>
>In that case, maybe CVS should have code to cope with this case. It
>must be quite common. Perhaps CVS should avoid using stdio. If that
>is too hard, how about making CVS pipe the output through `cat' when
>it invokes a subprocess? That should be easy.
>
>

CVS'd have to pipe the child's STDERR through `cat'. Otherwise the
child would still share CVS's STDERR and when that got dup'ed to CVS's
STDOUT, it would share CVS's STDOUT and could set it to non-blocking and
we're back to the original lost data case.

I'm part way to implementing this but it's seg faulting and I don't have
the time. If nothing else comes up I'll try to debug it and have
something out in the next few days.

Derek

--
*8^)

Email: de...@2-wit.com
Public key available from www.keyserver.net - Key ID 5ECF1609
Fingerprint 511D DCD9 04CE 48A9 CC07 A421 BFBF 5CC2 56A6 AB0E

Get CVS support at http://2-wit.com
--
Doesn't expecting the unexpected make the unexpected become the expected?

Larry Jones

unread,
Jul 19, 2002, 4:42:45 PM7/19/02
to
Derek Robert Price writes:
>
> CVS'd have to pipe the child's STDERR through `cat'. Otherwise the
> child would still share CVS's STDERR and when that got dup'ed to CVS's
> STDOUT, it would share CVS's STDOUT and could set it to non-blocking and
> we're back to the original lost data case.
>
> I'm part way to implementing this but it's seg faulting and I don't have
> the time. If nothing else comes up I'll try to debug it and have
> something out in the next few days.

But that's unnecessary overhead in the vast majority of cases and adds
yet another external dependency to CVS. I think we need to take a step
back and think about the overall problem -- this is one of those
annoying system failures where all the parts seem to work right in
isolation but interfere with each other when combined. It's not at all
clear to me which piece is at fault and should be fixed.

-Larry Jones

ANY idiot can be famous. I figure I'm more the LEGENDARY type! -- Calvin

Stefan Monnier

unread,
Jul 19, 2002, 5:10:55 PM7/19/02
to

Completely agreed. I feel more and more like it might be a problem
in glibc.


Stefan

Ian Lance Taylor

unread,
Jul 19, 2002, 11:44:59 PM7/19/02
to
"Stefan Monnier" <monnier+gnu/emacs/pre...@rum.cs.yale.edu> writes:

It can be a bit problematical for glibc. Conceptually, when the stdio
flush routine gets an error calling write(), it could leave the buffer
untouched and return an error indication to the caller.

That should work for fflush() and putc(). The problem comes in
functions like printf(). If a buffer flush fails with EAGAIN during
printf(), what should happen? If printf() returns an error
indication, it can't indicate how many bytes it wrote into the buffer.
In the case of a large printf(), it might already have flushed the
buffer once, so it is not possible in general for printf() to remove
the characters it added to the buffer. The result is an indeterminate
situation.

The only reasonable answer is to not use printf() with a non-blocking
buffer. But then why not apply that logic to stdio in general? Just
don't use stdio with non-blocking buffers.

I don't see a right answer here. Maybe CVS should include a script
which calls ssh piping stderr to cat, and tell people to use that
instead of using ssh.

Ian

Richard Stallman

unread,
Jul 21, 2002, 4:15:34 PM7/21/02
to
If a buffer flush fails with EAGAIN during
printf, what should happen?

printf should retry, perhaps after a short sleep, and thus more or
less emulate the behavior with an ordinary blocking descriptor.

If printf returns an error


indication, it can't indicate how many bytes it wrote into the buffer.

That is true, but returning an error indication is the wrong thing to
do anyway. The user doesn't want printf to fail because of some mere
temporary difficulty.

In the case of a large printf(), it might already have flushed the
buffer once, so it is not possible in general for printf() to remove
the characters it added to the buffer.

To remove them would be wrong even if it could.

The only reasonable answer is to not use printf() with a non-blocking
buffer. But then why not apply that logic to stdio in general? Just
don't use stdio with non-blocking buffers.

That is perfectly consistent but not at all helpful. The helpful
thing to do is to make it work in a way that is convenient.

If all stdio output functions handle EAGAIN by sleeping for a short time
and trying again, most user programs will be happy with the results.
The few that are not happy are those that should not use stdio.

kevin wang

unread,
Jul 24, 2002, 12:32:47 PM7/24/02
to
From Ian Lance Taylor

> I don't see a right answer here. Maybe CVS should include a script
> which calls ssh piping stderr to cat, and tell people to use that
> instead of using ssh.

If you're looking for 'cat' to do "infinite" buffering, i.e. it will
keep adsorbing input even while the output is blocked, it won't do that.
'less' will, and possibly some (not all) versions of 'more'.

'cat' relies on the stdin and stdout block buffers just like all other
unix i/o programs.

*Some* versions of 'cat' do have a "line buffer mode" i.e. flush
as soon as a newline is found, rather than the usual blocking mode.
Some versions of 'cat' natively enter this mode when stdout is a tty.
gnu-textutils (at least the version in rh73) doesn't have command-line
arguments for 'line buffer mode'

so if you're looking for a utility to do i/o buffering for you, cat
isn't it.

- Kevin

kevin wang

unread,
Jul 24, 2002, 12:44:32 PM7/24/02
to
From Richard Stallman

> If a buffer flush fails with EAGAIN during
> printf, what should happen?
>
> printf should retry, perhaps after a short sleep, and thus more or
> less emulate the behavior with an ordinary blocking descriptor.

If you want to emulate blocking behaviour, then why not USE blocking
behaviour?

It doesn't make any sense to make the default behaviour of non-blocking
act like blocking.

Now if you wanted to write a library that emulated 'soft-non-blocking'
i.e. retry in a little bit, with a timeout, sure that would be fine,
but blocking is blocking and non-b is non-b. anything inbetween should
be a separate mode.

If you're worried about printf, then use sprintf, dump it to a buffer,
and then feed it out stdio yourself (or with a library or whatever).

> If all stdio output functions handle EAGAIN by sleeping for a short time
> and trying again, most user programs will be happy with the results.
> The few that are not happy are those that should not use stdio.

I disagree. sleeps are inherently evil. stdio is not 'special' that it
needs different handling characteristics than any other file descriptor.
What if stdio had been instead mapped to a file? a pipe? The app simply
cannot tell the difference, and simply cannot be told to act differently
because it's stdio

- Kevin

Derek Robert Price

unread,
Jul 24, 2002, 1:10:34 PM7/24/02
to
kevin wang wrote:

> From Ian Lance Taylor


>
>
>>I don't see a right answer here. Maybe CVS should include a script
>>which calls ssh piping stderr to cat, and tell people to use that
>>instead of using ssh.
>>
>>
>

>If you're looking for 'cat' to do "infinite" buffering, i.e. it will
>keep adsorbing input even while the output is blocked, it won't do that.
>'less' will, and possibly some (not all) versions of 'more'.
>
>'cat' relies on the stdin and stdout block buffers just like all other
>unix i/o programs.
>

This wouldn't be a problem. The problem is that in a standard
configuration:

--stderr-> -------------stderr------------>
/ \ / / \
CVS Server ssh CVS client tty
\ / \ / \ /
--stdout-> --stdout-> --stdout->

Note that since CVS didn't access the stderr of its child process, ssh,
the child process gets a clone of the parent process' stderr descriptor
and ssh and the CVS client end up sharing the tty's standard error.

Now, when the user redirects stderr to stdout, say, to redirect the
output to a file (e.g. CVS_RSH=ssh cvs diff >tmp.diff 2>&1), you get the
following configuration:

--stderr-> -------------stderr-------------
/ \ / / \
CVS Server ssh CVS client
-->tty/file/whatever
\ / \ / \ /
--stdout-> --stdout-> --stdout--

Since CVS was using the same file descriptor for stderr and stdout, ssh
is writing to CVS's stdout descriptor as its stderr. When ssh sets its
stderr to non-block, the same happens to CVS's stdout. Since CVS isn't
prepared for this, data gets lost (written to a non-blocking descriptor
without watching for EAGAIN).

>so if you're looking for a utility to do i/o buffering for you, cat
>isn't it.
>
>

So, anyway, cat wouldn't need to do line buffering. What has been
proposed is that a script stick cat in between ssh's stderr and cvs's
stderr. I assume by redirecting ssh's stderr to cat's stdin and then
cat's stdout back to CVS's stderr, but I'm going to leave stdin out of
the following picture for convenience:

--stderr-> --stderr----->cat-------stderr--
/ \ / / \
CVS Server ssh CVS client
-->tty/file/whatever
\ / \ / \ /
--stdout-> --stdout-> --stdout--

Now, when ssh sets its stderr to O_NONBLOCK, only cat's stdin will be
affected. cat's buffering ability will be irrelevant since ssh is the
only PROCESS that needs to be aware of the non-blocking i/o and resend
data and it is already doing that.

Derek

--
*8^)

Email: de...@2-wit.com
Public key available from www.keyserver.net - Key ID 5ECF1609
Fingerprint 511D DCD9 04CE 48A9 CC07 A421 BFBF 5CC2 56A6 AB0E

Get CVS support at http://2-wit.com
--

The advertisement is the most truthful part of the paper.

- Thomas Jefferson

Derek Robert Price

unread,
Jul 24, 2002, 1:23:11 PM7/24/02
to
kevin wang wrote:

> From Richard Stallman


>
>
>> If a buffer flush fails with EAGAIN during
>> printf, what should happen?
>>
>>printf should retry, perhaps after a short sleep, and thus more or
>>less emulate the behavior with an ordinary blocking descriptor.
>>
>>
>

>If you want to emulate blocking behaviour, then why not USE blocking
>behaviour?
>
>

Please see my previous email. The issue is that a file descriptor
became non-blocking without CVS noticing. CVS expects blocking behavior
and normally, that's what it gets.

>It doesn't make any sense to make the default behaviour of non-blocking
>act like blocking.
>
>

I believe this proposal would only affect stdio. Since they currently
lose data in such a way as to make recovery difficult, why not write
guaranteed delivery of data to non-blocking descriptors into their
charter without rewriting the API?

>Now if you wanted to write a library that emulated 'soft-non-blocking'
>i.e. retry in a little bit, with a timeout, sure that would be fine,
>but blocking is blocking and non-b is non-b. anything inbetween should
>be a separate mode.
>
>If you're worried about printf, then use sprintf, dump it to a buffer,
>and then feed it out stdio yourself (or with a library or whatever).
>
>
>

>>If all stdio output functions handle EAGAIN by sleeping for a short time
>>and trying again, most user programs will be happy with the results.
>>The few that are not happy are those that should not use stdio.
>>
>>
>

>I disagree. sleeps are inherently evil. stdio is not 'special' that it
>needs different handling characteristics than any other file descriptor.
>What if stdio had been instead mapped to a file? a pipe? The app simply
>cannot tell the difference, and simply cannot be told to act differently
>because it's stdio
>
>

And your app doesn't care that the stdio functions don't return proper
status when writing to non-blocking descriptors? The issue is that
printf and its ilk can write partial data. You could check for EAGAIN,
but you still wouldn't know how much of the data had been written to a
file, pipe, tty, or whatever. The app can't know how much to resend.
Do you desire that behavior? Making the stdio routines always behave
as if descriptors are blocking (as far as the calling function is
concerned) seems reasonable.

Derek

--
*8^)

Email: de...@2-wit.com
Public key available from www.keyserver.net - Key ID 5ECF1609
Fingerprint 511D DCD9 04CE 48A9 CC07 A421 BFBF 5CC2 56A6 AB0E

Get CVS support at http://2-wit.com
--

I will not grease the monkey bars.
I will not grease the monkey bars.
I will not grease the monkey bars...

- Bart Simpson on chalkboard, _The Simpsons_

kevin wang

unread,
Jul 24, 2002, 7:11:30 PM7/24/02
to
From Derek Robert Price

> The problem is that in a standard configuration:
>
> --stderr-> -------------stderr------------>
> / \ / / \
> CVS Server ssh CVS client tty
> \ / \ / \ /
> --stdout-> --stdout-> --stdout->
>
> Note that since CVS didn't access the stderr of its child process, ssh,
> the child process gets a clone of the parent process' stderr descriptor
> and ssh and the CVS client end up sharing the tty's standard error.
>
> Now, when the user redirects stderr to stdout, say, to redirect the
> output to a file (e.g. CVS_RSH=ssh cvs diff >tmp.diff 2>&1), you get the
> following configuration:

This may sound silly, but as a temporary workaround, can't you:

CVS_RSH=ssh cvs diff >tmp.diff 2>tmp.diff

that ought to open the file twice as separate file descriptors.
The downside is that I/O is now unsynchronized, and lines may be inserted
in the middle of other lines, but nothing should get lost at least.
switching to line buffering should eliminate most of that problem.

this assumes, of course, that is what you're trying to do. There are
other possibilities that you can't control, I know, and doesn't solve
the real problem.

> --stderr-> -------------stderr-------------
> / \ / / \

> CVS Server ssh CVS client >tty/file/whatever


> \ / \ / \ /
> --stdout-> --stdout-> --stdout--
>
> Since CVS was using the same file descriptor for stderr and stdout, ssh
> is writing to CVS's stdout descriptor as its stderr. When ssh sets its
> stderr to non-block, the same happens to CVS's stdout. Since CVS isn't
> prepared for this, data gets lost (written to a non-blocking descriptor
> without watching for EAGAIN).
>

> So, anyway, cat wouldn't need to do line buffering. What has been
> proposed is that a script stick cat in between ssh's stderr and cvs's
> stderr. I assume by redirecting ssh's stderr to cat's stdin and then
> cat's stdout back to CVS's stderr, but I'm going to leave stdin out of
> the following picture for convenience:
>
> --stderr-> --stderr----->cat-------stderr--
> / \ / / \

> CVS Server ssh CVS client >tty/file/whatever


> \ / \ / \ /
> --stdout-> --stdout-> --stdout--
>
> Now, when ssh sets its stderr to O_NONBLOCK, only cat's stdin will be
> affected. cat's buffering ability will be irrelevant since ssh is the
> only PROCESS that needs to be aware of the non-blocking i/o and resend
> data and it is already doing that.

yup. perhaps re-opening/re-assigning stderr before forking off ssh? hm,
no, you still have the fd cloning problem

you could fork off a client and do the 'cat' thing yourself, but exec'ing
cat would have the same result.

you could setup a pipe and use the parent process to do the read/write
separation, but that's no different.

Hm, it's too bad that the shared internal file info also shares the
block/nonblock setting, but I suppose that's unavoidable. You cannot
both have and not have a buffer block on a given filedescriptor.


thanks for taking the time to detail the issues; I can't seem to find
the first part of the thread; did it come from another mailing list? or
am I just being blind?

- Kevin

Derek Robert Price

unread,
Jul 24, 2002, 9:30:31 PM7/24/02
to
kevin wang wrote:

>This may sound silly, but as a temporary workaround, can't you:
>
>CVS_RSH=ssh cvs diff >tmp.diff 2>tmp.diff
>
>that ought to open the file twice as separate file descriptors.
>The downside is that I/O is now unsynchronized, and lines may be inserted
>in the middle of other lines, but nothing should get lost at least.
>switching to line buffering should eliminate most of that problem.
>
>this assumes, of course, that is what you're trying to do. There are
>other possibilities that you can't control, I know, and doesn't solve
>the real problem.
>
>

I think it's: `rm tmp.diff; CVS_RSH=ssh cvs diff >>tmp.diff
2>>tmp.diff', but that might work.

>thanks for taking the time to detail the issues; I can't seem to find
>the first part of the thread; did it come from another mailing list? or
>am I just being blind?
>
>

I picked it up on bug...@gnu.org and there's at least one more list in
the header of this message. Don't know which list you picked it up on
but I think both bug-cvs and the emacs list have been in there since the
start of this thread.

Derek

--
*8^)

Email: de...@2-wit.com
Public key available from www.keyserver.net - Key ID 5ECF1609
Fingerprint 511D DCD9 04CE 48A9 CC07 A421 BFBF 5CC2 56A6 AB0E

Get CVS support at http://2-wit.com
--

File not found. Should I fake it? (Y/N)

Ian Lance Taylor

unread,
Jul 24, 2002, 11:13:07 PM7/24/02
to
kevin wang <k...@rightsock.com> writes:

> From Richard Stallman


> > If a buffer flush fails with EAGAIN during
> > printf, what should happen?
> >
> > printf should retry, perhaps after a short sleep, and thus more or
> > less emulate the behavior with an ordinary blocking descriptor.
>

> If you want to emulate blocking behaviour, then why not USE blocking
> behaviour?
>

> It doesn't make any sense to make the default behaviour of non-blocking
> act like blocking.

Yes, but stdio can't handle non-blocking descriptors correctly, as I
described in a previous note. I read RMS's suggestion as saying that
since stdio can't handle non-blocking descriptors, it will do least
harm by blocking on them. Clearly using stdio on a non-blocking
descriptor is an error; however, as we see in the CVS case, sometimes
that error is difficult to avoid.

Ian

Ian Lance Taylor

unread,
Jul 24, 2002, 11:15:43 PM7/24/02
to
kevin wang <k...@rightsock.com> writes:

> > The problem is that in a standard configuration:
> >
> > --stderr-> -------------stderr------------>
> > / \ / / \
> > CVS Server ssh CVS client tty
> > \ / \ / \ /
> > --stdout-> --stdout-> --stdout->
> >
> > Note that since CVS didn't access the stderr of its child process, ssh,
> > the child process gets a clone of the parent process' stderr descriptor
> > and ssh and the CVS client end up sharing the tty's standard error.
> >
> > Now, when the user redirects stderr to stdout, say, to redirect the
> > output to a file (e.g. CVS_RSH=ssh cvs diff >tmp.diff 2>&1), you get the
> > following configuration:
>

> This may sound silly, but as a temporary workaround, can't you:
>

> CVS_RSH=ssh cvs diff >tmp.diff 2>tmp.diff

Sure (in append mode), or, even easier, don't use 2>&1 at all. Or do
what I do, and set CVS_RSH to a shell script which invokes ssh with
2>/dev/null. That always works--if ssh fails you can just run it by
hand to see the error message.

The problem is that people use cvs 2>&1 with CVS_RSH set to ssh, and
expect it to work, and think it is a CVS bug when it fails. Educated
users don't have a problem.

Ian

Richard Stallman

unread,
Jul 25, 2002, 2:07:14 PM7/25/02
to
The problem is that people use cvs 2>&1 with CVS_RSH set to ssh, and
expect it to work, and think it is a CVS bug when it fails. Educated
users don't have a problem.

Does "educated users" mean "users who know they have to avoid 2>&1
when using CVS_RSH=ssh"? If so, I would say that they have been
taught in a work-around, and we really still ought to fix the bug.

kevin wang

unread,
Jul 25, 2002, 2:12:01 PM7/25/02
to
From Ian Lance Taylor

> The problem is that people use cvs 2>&1 with CVS_RSH set to ssh, and
> expect it to work, and think it is a CVS bug when it fails. Educated
> users don't have a problem.

Perhaps a quick fix then is to detect that stdio has been set to
nonblocking and output an error message?

- Kevin

Ian Lance Taylor

unread,
Jul 25, 2002, 2:18:50 PM7/25/02
to
Richard Stallman <r...@gnu.org> writes:

I agree. I see that I implied otherwise, but that was a mistake on my
part.

Ian

0 new messages