Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why does cvs wait to the next second?

2 views
Skip to first unread message

Urs Thuermann

unread,
Dec 25, 2012, 2:57:52 PM12/25/12
to info...@gnu.org
I have observed that the same CVS checkout command takes different
running times on multiple calls and that a script calling cvs checkout
many times takes very long. Using strace I see that CVS waits for the
next second start after its work has been done, right before exiting:

Trace of first call to cvs checkout <dir>
...
20:50:17.149775 close(4) = 0
20:50:17.149807 time([1356465017]) = 1356465017
20:50:17.149840 time([1356465017]) = 1356465017
20:50:17.149873 nanosleep({0, 850134000}, NULL) = 0
20:50:18.000114 time([1356465017]) = 1356465017
20:50:18.000154 nanosleep({0, 20000000}, NULL) = 0
20:50:18.020252 fchdir(3) = 0
20:50:18.020289 close(3) = 0
20:50:18.020328 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
20:50:18.020372 close(1) = 0
20:50:18.020406 munmap(0x7f9d28936000, 4096) = 0
20:50:18.020487 exit_group(0) = ?

Another call to cvs checkout <dir>
...
20:50:31.325934 close(4) = 0
20:50:31.325968 time([1356465031]) = 1356465031
20:50:31.325991 time([1356465031]) = 1356465031
20:50:31.326013 nanosleep({0, 673991000}, NULL) = 0
20:50:32.000130 time([1356465031]) = 1356465031
20:50:32.000186 nanosleep({0, 20000000}, NULL) = 0
20:50:32.020282 fchdir(3) = 0
20:50:32.020320 close(3) = 0
20:50:32.020364 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
20:50:32.020413 close(1) = 0
20:50:32.020448 munmap(0x7f905e4fb000, 4096) = 0
20:50:32.020564 exit_group(0) = ?

And a third call to checkout that smae directory.
...
20:50:45.906568 close(4) = 0
20:50:45.906601 time([1356465045]) = 1356465045
20:50:45.906633 time([1356465045]) = 1356465045
20:50:45.906667 nanosleep({0, 93340000}, NULL) = 0
20:50:46.000104 time([1356465045]) = 1356465045
20:50:46.000144 nanosleep({0, 20000000}, NULL) = 0
20:50:46.020238 fchdir(3) = 0
20:50:46.020274 close(3) = 0
20:50:46.020313 rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0
20:50:46.020357 close(1) = 0
20:50:46.020391 munmap(0x7f2c8f6a2000, 4096) = 0
20:50:46.020469 exit_group(0) = ?

What is the reason for CVS to do this? Is it because of time stamps
for history entries? And can I prevent CVS from sleeping?

urs

Paul Sander

unread,
Dec 25, 2012, 4:13:27 PM12/25/12
to Urs Thuermann, info...@gnu.org
The reason for the sleep is so that timestamps in the sandbox can stabilize before subsequent actions. It puts some separation between the stamps on checked-out sources and subsequent modifications and additions by other aspects of the users' process. Removing the sleep will cause non-deterministic failures and you'll find that you'll have to re-introduce it in some of your scripts that invoke cvs.

The specific reason for this is because CVS assumes that it was the last to modify a file if its mod time matches the one recorded in its Entries file. If it's quickly modified by something else, then CVS may still think it's up to date and both "cvs update" and "cvs commit" will produce incorrect results.

There has been much discussion on this topic, and you can see discussion of the rationale in the info-cvs archives.

Urs Thuermann

unread,
Dec 25, 2012, 5:47:22 PM12/25/12
to info...@gnu.org
Paul Sander <pa...@wakawaka.com> writes:

> The specific reason for this is because CVS assumes that it was the
> last to modify a file if its mod time matches the one recorded in
> its Entries file. If it's quickly modified by something else, then
> CVS may still think it's up to date and both "cvs update" and "cvs
> commit" will produce incorrect results.
>
> There has been much discussion on this topic, and you can see
> discussion of the rationale in the info-cvs archives.

OK, I've looked up the topic in the archives. I assume it has already
been suggested to change the "Entries" file format to use a hash
instead of a time stamp. But I haven't seen this in the info-cvs
archive. So wouldn't this be an option? Otherwise, I'd like a
command line option to disable the sleep, probably with a BIG warning
that it should only be used if you know what you do.

I have a script that calls cvs checkout hundreds to thousands of times
and that causes the script to run for half an hour or so instead of a
few seconds. The info-cvs archive also suggests using RCS tools
instead of CVS. Is it guaranteed that the CVS repository files will
always have RCS format and RCS tools will work on them?


urs

Paul Sander

unread,
Dec 25, 2012, 11:57:59 PM12/25/12
to Urs Thuermann, info...@gnu.org

On Dec 25, 2012, at 2:47 PM, Urs Thuermann wrote:

> Paul Sander <pa...@wakawaka.com> writes:
>
>> The specific reason for this is because CVS assumes that it was the
>> last to modify a file if its mod time matches the one recorded in
>> its Entries file. If it's quickly modified by something else, then
>> CVS may still think it's up to date and both "cvs update" and "cvs
>> commit" will produce incorrect results.
>>
>> There has been much discussion on this topic, and you can see
>> discussion of the rationale in the info-cvs archives.
>
> OK, I've looked up the topic in the archives. I assume it has already
> been suggested to change the "Entries" file format to use a hash
> instead of a time stamp. But I haven't seen this in the info-cvs
> archive. So wouldn't this be an option? Otherwise, I'd like a
> command line option to disable the sleep, probably with a BIG warning
> that it should only be used if you know what you do.

I think that using hashes might have been discussed, but I don't recall
specific conversations. The reliability of hashes, even cryptographic ones,
isn't foolproof, either. Random content of the files is a simplifying
assumption when designing applications around hashes, and source code isn't
random content. The MD5 and SHA-1 hashes have been broken in ways that I
believe matches use cases describing the natural evolution of source code.
This weakens the reliability claims of hashes to a degree, but truthfully I
don't know to what extent. (Perhaps the effect is negligible in the real
world, at least in projects for which CVS is used. Reducing the theoretical
probability of collision by several orders of magnitude still leaves a huge
number of files processed without incident.)

Anyway, if the use of hashes is limited strictly to the replacement of
timestamps in the Entries file (i.e., compute a hash at the time CVS writes
the file to the sandbox and write it to the Entries file, then later recompute
the hash and compare it to the Entries file), then the effect of a collision
is the same as we have observed with timestamps: Incorrect behavior of
subsequent operations due to files believed to be up to date when they are
not. The difference is that the breakage will be deterministic and there
will be no simple workaround, plus the overhead of computing the hashes
may become a factor. (Note that it is useless to store hashes in the RCS
files due to keyword expansion, so you can't amortize some of the cost by
storing them at commit time.)

> I have a script that calls cvs checkout hundreds to thousands of times
> and that causes the script to run for half an hour or so instead of a
> few seconds. The info-cvs archive also suggests using RCS tools
> instead of CVS. Is it guaranteed that the CVS repository files will
> always have RCS format and RCS tools will work on them?

What is your use case that requires you to invoke "cvs checkout" so many
times? Over a 30-minute interval, the sleeps begin to dominate execution
time at 900 invocations. At that rate, CVS' locking mechanisms are also
significantly impacting performance. Perhaps you are checking out each
source file individually? If this is true then you should consider
reducing the number of invocations. You can do this by checking out
directory trees or by simply specifying multiple paths on the
"cvs checkout" command line. The use of tags or branch/timestamp pairs
would be useful here. The use of xargs might also be helpful.

If you have path/version pairs (or path/tag pairs or even
path/branch/timestamp triples) then you can use RCS and conjure the CVS
meta-data yourself. As you have discovered, there has been some discussion
of this method that appears in the archives detailing why it's fast and
reliable. I have successfully done this method myself.

To my knowledge, CVS uses the standard RCS file format. RCS produces
warnings if newphrase extensions are used in certain contexts, e.g. in
the initial admin section of the RCS file. My experience in that area is
dated, so I don't know if this would be an issue with current versions of
either tool.



0 new messages