Protocol error: uncounted data discarded

Chris Sharpe

unread,

May 4, 2001, 4:31:45 PM5/4/01

to info...@gnu.org

I just upgraded by Solaris (sparc) pserver to cvs 1.11.1p1 and from
a WinNT command line cvs 1.11 client, I sometimes get the following
error:

cvs -d $my_cvsroot -f -r -q -z3 checkout -r $my_tag $my_module
Protocol error: uncounted data discarded

The module is about 3000 files in size and I've seen this a couple
of times when running the checkout command as part of a script.
Other modules checked out immediately after the failure work fine
and if I try to checkout this module again, it works fine.

I saw that there was a FIXME comment recently added by Derek in
server.c that might be related. Anyone have any insight? Is this
error related to 1.11.1p1 or is the timing just a coincidence
(I never saw this error with 1.11 pserver).

+-------------------------------------------------------------------+
| Chris Sharpe KF4WVO Author of "The Unofficial PEZ FAQ" |
| sha...@dg-rtp.dg.com "PEZ - A treat to eat in a toy that's neat"|
+-------------------------------------------------------------------+

_______________________________________________
Info-cvs mailing list
Info...@gnu.org
http://mail.gnu.org/mailman/listinfo/info-cvs

Chuck Rossi

unread,

Jul 7, 2001, 3:59:28 AM7/7/01

to info...@gnu.org, Chris Sharpe

Has there been any update on this? I made the mistake of upgrading
one of my servers and one of my build machines to 1.11.1p1
and now I get this whenever I try a build using a build script:

/usr/bin/cvs -q co -r build-1266 tree3

Protocol error: uncounted data discarded

The client is a Linux RH 6.0 2.2.17 box.
The server is a Linux RH 6.2 2.2.14-5.0smp box.

Doing the same thing by hand seems to always work.

Thanks...

chuckr

John Minnihan

unread,

Jul 7, 2001, 12:52:29 PM7/7/01

to chr...@best.com, info...@gnu.org, sha...@dg-rtp.dg.com

Hi Chris,

I have seen this too, but not consistently. For me, the error occurs only
near the end of a very large checkout using a 'universe' defining module
name. The error does not exhibit during smaller module checkouts, and as I
said - it does not consistently show up even during the large checkouts.

I'm curious what you see as the difference between 'in a buld script' and
'by hand'.

My WAG at this point is that the error shows up only when my network is
under heavy load. During periods of relative network 'quiet' (say, 15 -
40% bandwidth utilization), the large checkouts complete as expected.

HTH you in troubleshooting and developing a workaround. I posted my
question (with details) about a month ago. No reply yet, and the only
other mention to this in the archives was your message of 4-MAY-2001. See
my message of 20-JUN-2001 RE: that.

chr...@best.com wrote:

--
_____________________________________
John Minnihan
mailto:jbm...@jbminn.com
http://www.freepository.com

Chuck Rossi

unread,

Jul 9, 2001, 3:34:25 PM7/9/01

to John Minnihan, info...@gnu.org

On Sat, Jul 07, 2001 at 09:09:27AM -0700, John Minnihan wrote:
> Hi Chris,
>
> I have seen this too, but not consistently. For me, the error occurs only
> near the end of a very large checkout using a 'universe' defining module
> name. The error does not exhibit during smaller module checkouts, and as I
> said - it does not consistently show up even during the large checkouts.
>
> I'm curious what you see as the difference between 'in a buld script' and
> 'by hand'.

It's strange. If I run the cvs operation from my Perl build script,
I get "Protocol error" torwards the end of the checkout. If I execute
the same command, in the same directory from a shell prompt, the checkout
works fine.

I'm surprised more people aren't running into this problem. I'm not doing
anything special and my repository is not that big.

Chuckr

Chris Sharpe

unread,

Jul 9, 2001, 3:48:06 PM7/9/01

to Chuck Rossi, John Minnihan, info...@gnu.org

FWIW, my workaround was to determine if the checkout really got
anything (by testing for existence of the directory or file) and
sleeping for ten seconds and trying again. The retry has never
failed. Odd.

Checking the exit status of cvs wasn't reliable. It returned '0'.
I would have expected this error to cause a non-zero exit status.

--Chris

Jen Vanderputten

unread,

Aug 2, 2001, 12:50:58 PM8/2/01

to

I have some information on this problem that may help to locate the root cause.
I have recently done loadtesting of branch creation (rtag) and checkout/checkin
usage of the resulting new branches. The loadtest created 500 new branches
from a seperate base branch. It then attempted to checkout, one at a time, one
of the new branches, then add whitespace and checkin. It does this linearly;
i.e., branch1 then branch2 and so on. I noticed that the very first checkout
after a new branching (rtag command) ALWAYS got the protocol error. Any
checkouts after that, though, have worked without a hitch; also, if I did a
checkout once, then did a sleep of a few seconds, then attempted again, the
second was always successful. This begs to be a timing issue.

Not knowing much of the details of CVS server internals, and not having looked
thoroughly at that code yet, I am taking an educated guess that there is some
sort of one-time data creation performed upon the very first command against
a new branch. I am further deducing that this is in a seperate child process,
which is not finishing with that task before the child associated with the
client's request is finished. This results in a lack of needed data by the
client's child at the moment that it expects to find that clump of data, hence
the protocol error. This is why waiting a moment after this first try, then
trying again, results in success; the first child doing the special one-time
processing is given time to finish. I say one-time processing because this
never happens again to the same branch.

I have taken a look at the cvs codebase for 1.11.1p1 and found that a particular
change made since the last revision is convincingly tied into this problem. A
particular chunk of code was removed from a loop that forced it to wait on a
relevant event -- this was done, apparently, to fix a problem of overextended
waits in particular cases (or possibly infinite loop?). However, it may have
actually opened the door for this problem -- and indeed, this problem may also
be a 'particular' case, as mentioned in previous postings, with very large
code bases/modules. That would certainly explain why some child process is
taking unusually long to finish generating some needed data.

The code to which I refer is to be found in server.c. It was removed from the
main loop (I have this as being lines 2909 - 3130 in function do_cvs_command):

while (stdout_pipe[0] >= 0
|| stderr_pipe[0] >= 0
|| protocol_pipe[0] >= 0
|| count_needed <= 0)
{
.
.
.
}

Here is the code chunk that was removed (I have this as lines 3132 - 3149):

/*
* OK, we've gotten EOF on all the pipes. If there is
* anything left on stdoutbuf or stderrbuf (this could only
* happen if there was no trailing newline), send it over.
*/
if (! buf_empty_p (stdoutbuf))
{
buf_append_char (stdoutbuf, '\n');
buf_copy_lines (buf_to_net, stdoutbuf, 'M');
}
if (! buf_empty_p (stderrbuf))
{
buf_append_char (stderrbuf, '\n');
buf_copy_lines (buf_to_net, stderrbuf, 'E');
}
if (! buf_empty_p (protocol_inbuf))
buf_output0 (buf_to_net,
"E Protocol error: uncounted data discarded\n");

I hope this can shed some light on the problem. Timing issues are always
very difficult to pinpoint and fix.

--Jen