Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Copying parts of a repository, excluding revisions post-TAGNAME...

0 views
Skip to first unread message

Leander Hasty

unread,
Dec 31, 2003, 2:30:20 PM12/31/03
to info...@gnu.org
Hello,

As part of contracted work on a product, we received a tarball of the
project source and data from another company.

We'd very much like to have the entire revision history of all of these
files; the other company is willing to give us the directory trees for
the project from their CVS repository. From what I understand, I could
normally just accept a tarball of said directories and manually insert
them into a local CVS repository and possibly edit the CVSROOT files a
bit.

However, we're not entitled to any of their source on this project past
a certain release, which is tagged in their repository.

Digging through "man 5 rcsfile" and "doc/RCSFILE", it seems like it
would be possible to write a script to process each ,v file, determine
the version associated with the TAGNAME, and then trim out any entries
in the rcs file that have a version greater than this number. I'm more
than a bit concerned about the difficulty of creating a robust script
which can be used on a large (tens of GB) CVS repository, remotely, by
someone else (who is inexperienced in CVS administration).

So, questions:

- Does there exist a CVS command, client, utility, or script (even
third-party) which already has this capability, or some form thereof?

- If not, given that I'd only have a day or two to roll my own utility,
what sort of hurdles should I expect? (Is it possible?)

- Does the rcsfile format (or the way CVS uses it) guarantee
chronological ordering of the "delta" and "deltatext" entries in the
file?
That is, can I expect any checkins post-TAGNAME to be after the tag
entry in the ,v file, in each of their respective sections?

- What other data in the ,v file would I have to change if I could
delete all of the post-TAGNAME entries? ("head", probably...
"symbols"?)

- Can anyone suggest any more sane way to do this?

- What documentation should I be looking at, aside from that mentioned
above?

Thanks for your time and patience.

--
Leander Hasty


Larry Jones

unread,
Dec 31, 2003, 2:47:41 PM12/31/03
to Leander Hasty, info...@gnu.org
Leander Hasty writes:
>
> - Does there exist a CVS command, client, utility, or script (even
> third-party) which already has this capability, or some form thereof?

"cvs admin -otag::" may be exactly what you're looking for.

-Larry Jones

These pictures will remind us of more than we want to remember.
-- Calvin's Mom


Leander Hasty

unread,
Dec 31, 2003, 2:52:17 PM12/31/03
to Larry Jones, info...@gnu.org
> From: Larry Jones [mailto:lawrenc...@ugsplm.com]
> Sent: Wednesday, December 31, 2003 11:48 AM

> "cvs admin -otag::" may be exactly what you're looking for.

> -Larry Jones

Excellent! Thanks, this is exactly the sort of thing I was looking for.
You've saved me a tremendous amount of time.

:grin:

--
Leander Hasty


Mark D. Baushke

unread,
Dec 31, 2003, 3:50:49 PM12/31/03
to Larry Jones, info...@gnu.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Larry Jones <lawrenc...@ugsplm.com> writes:

> Leander Hasty writes:
> >
> > - Does there exist a CVS command, client, utility, or script (even
> > third-party) which already has this capability, or some form thereof?
>

> "cvs admin -otag::" may be exactly what you're looking for.

Good suggestion.

Yes, that will prune versions after the TAG. However, if the ,v file
contains other branches Leander may still need to potentially prune
those as well.

-- Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/8zap3x41pRYZE/gRAoeeAKCmLATHrsbfkr5oAyhYa/0mHY+oMACfTGQt
Pq9aLUqvrYugbSMsNt2btFw=
=4D4u
-----END PGP SIGNATURE-----


Mark D. Baushke

unread,
Dec 31, 2003, 3:47:52 PM12/31/03
to Leander Hasty, info...@gnu.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Leander Hasty <l...@taldren.com> writes:

> As part of contracted work on a product, we received a tarball of the
> project source and data from another company.
>
> We'd very much like to have the entire revision history of all of these
> files; the other company is willing to give us the directory trees for
> the project from their CVS repository. From what I understand, I could
> normally just accept a tarball of said directories and manually insert
> them into a local CVS repository and possibly edit the CVSROOT files a
> bit.

Yes.

> However, we're not entitled to any of their source on this project past
> a certain release, which is tagged in their repository.

Yes, that starts getting tricky... :-)

> Digging through "man 5 rcsfile" and "doc/RCSFILE", it seems like it
> would be possible to write a script to process each ,v file, determine
> the version associated with the TAGNAME, and then trim out any entries
> in the rcs file that have a version greater than this number.

Yes, that should be possible.

> I'm more than a bit concerned about the difficulty of creating a
> robust script which can be used on a large (tens of GB) CVS
> repository, remotely, by someone else (who is inexperienced in CVS
> administration).
>
> So, questions:
>

> - Does there exist a CVS command, client, utility, or script (even
> third-party) which already has this capability, or some form thereof?

I am not aware of a program or utility that does this exactly... most
such programs are not interested in losing recent changes. :-)

There is a script that examines every version in your repository written
by Donald Sharp called contrib/check_cvs.in intended to check the
integrity of the Repository. You might be able to start with this
utility to hack something to do the job for you.

> - If not, given that I'd only have a day or two to roll my own utility,
> what sort of hurdles should I expect? (Is it possible?)

Sure, it is possible. Going to the latest version on a given branch and
doing an 'rcs -ox.y file,v' will remove version x.y from the repository.

> - Does the rcsfile format (or the way CVS uses it) guarantee
> chronological ordering of the "delta" and "deltatext" entries in the
> file?

No, timestamps may be spoofed in a generic RCS file format. In a typical
case, you would only see a 1.1.1.10 having an older timestamp than a
1.1.1.9 version if an import was done using the 'import -d' flag and
the timestamp of the file was older than the version imported as 1.1.1.9.

The other oddity you need to be aware of is the 'dead' state which is
how versions are marked as removed from the tree even if they are later
resurrected.

You should not remove version 1.1 of a file if there are any branches
that exist even if version 1.1 is dead.

However, version numbers will always be increasing for new commits even
if there are discontinuities in the numbers.

1.1 1.3 1.4 1.4.0.2 (magic branch tag) 1.4.2.1 1.4.2.2
1.4.2.2.0.2 (magic branch tag) 1.4.2.2.2.1 1.4.2.2.2.2 ...

> That is, can I expect any checkins post-TAGNAME to be after the tag
> entry in the ,v file, in each of their respective sections?

This clarification is more confusing than the original question. I am
not sure if I have actually answered your question or not.

>
> - What other data in the ,v file would I have to change if I could
> delete all of the post-TAGNAME entries? ("head", probably...
> "symbols"?)

Removing tags for removed branches and versions would likely be a good
idea.

> - Can anyone suggest any more sane way to do this?

Given an rcs file,v file and a TAG

- do an 'rlog' command on the file,v and collect all of the <TAG
version> associations and the list of all of the versions.

- determine the version number of file,v that matches TAG

- if the version is of the form w.x.y.z, start by looking for all
w.x.y.N versions. If z is greater than N, then put w.x.y.N into the
'keep' list of versions, otherwise the version will need to be put
into the 'discard' list of versions. When you have exhausted all
w.x.y.N versions, move to w.N versions following the same rules as to
keeping w.N if x is less than N and discarding it if N is larger.
If w is not 1, then the next step is to look for w-1.N versions
and you will probably need to keep all of them until you reach
the first version 1.1 for the file.

- now that you have your discard list of versions, sort them
numerically.

- prune any version tag that references a tag to be deleted.

- locate any branch tags of the form w.x.0.even and if all w.x.even.N
versions are being removed, you should remove the branch tag for
that branch too.

- now begin pruning the versions starting with the largest number of
elements so prune w.x.y.27.2.1 before w.x.y.27 which should be pruned
before w.x.y.26 and so on until all you have left are the versions
that were direct ancestors of the TAG you were given.

In theory, if you have run into a 'dead' version while moving backward
any predicessor version to the dead version should also be marked for
removal... This assumes that the version of the file that was dead was
not really related to the current version in the repository. In some
development models this may be true, in others, it may not be true and
the versions before the 'dead' version may contain much needed history
information. So, I would suggest you not actually prune those dead
versions unless forced.

Note: If they have a recent enough version of CVS and all of the
versions of interest are on the mainline of the repository, then getting
the list of versions of interest may be as easy as 'cvs log -r:TAG' but
be advised that I believe some older versions of cvs did not report all
versions correctly in some cases and if TAG is on a branch, the log it
will stop at the origin of the branch point.

> - What documentation should I be looking at, aside from that mentioned
> above?

The doc/RCSFILES that comes with the cvs distribution is a good source.

Good luck,


-- Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/8zX43x41pRYZE/gRAu25AKDMxCmCdiAGHHw2XBYwxQYRweOqoQCbB+av
bAEtBWrRv9ypMeKuYOBIIqY=
=Yze/
-----END PGP SIGNATURE-----


Greg A. Woods

unread,
Dec 31, 2003, 4:17:19 PM12/31/03
to Leander Hasty, CVS-II Discussion Mailing List
[ On Wednesday, December 31, 2003 at 11:30:20 (-0800), Leander Hasty wrote: ]
> Subject: Copying parts of a repository, excluding revisions post-TAGNAME...

>
> However, we're not entitled to any of their source on this project past
> a certain release, which is tagged in their repository.

Are you proposing that they do this to a copy of their repository before
they provide you with that copy?

> Digging through "man 5 rcsfile" and "doc/RCSFILE", it seems like it
> would be possible to write a script to process each ,v file, determine
> the version associated with the TAGNAME, and then trim out any entries
> in the rcs file that have a version greater than this number.

It depends on how much they use branches but potentially doing what you
desire could be a lot simpler than messing with RCS file internals.

"man 1 rlog"
"man 1 rcs"

If there are no branches then it's trivial (well other than the minor
complication that there's no way to say "after" or "not" in the way
ranges are specified to RCS commands):

cd cvsroot-copy-dir
find . -type f -name '*,v' -print |
while read fn ; do
aftertagrev=$(rlog -N -r${TAGNAME} ${fn} | awk '
$1 == "revision" {
split($2, a, ".");
print a[1] "." ++a[2];
}')
rcs -o${aftertagrev}: ${fn}
done

--
Greg A. Woods

+1 416 218-0098 VE3TCP RoboHack <wo...@robohack.ca>
Planix, Inc. <wo...@planix.com> Secrets of the Weird <wo...@weird.com>


Leander Hasty

unread,
Dec 31, 2003, 5:48:48 PM12/31/03
to CVS-II Discussion Mailing List
> From: Greg A. Woods [mailto:wo...@weird.com]
> Sent: Wednesday, December 31, 2003 1:17 PM

> > However, we're not entitled to any of their source on this
> > project past a certain release, which is tagged in their
> > repository.

> Are you proposing that they do this to a copy of their
> repository before they provide you with that copy?

Yes, that was the idea.

> "man 1 rlog"
> "man 1 rcs"

Great.

> If there are no branches then it's trivial (well other than the minor
> complication that there's no way to say "after" or "not" in the way
> ranges are specified to RCS commands):

> [...]

Hmm. This does avoid having to accumulate the tags post-TAGNAME to pass
to "cvs tag -d". "cvs admin -oTAGNAME::" dies on discovering any tags
for later revisions. It probably makes sense to make sure the dangling
tags are deleted (or redirected to head, or something else safe).

I've passed along an inquiry about whether the cvs tree contains any
branches post-TAGNAME to the other company. If they do, it seems I will
probably have to script something slightly more sophisticated, along the
lines of what Mark D. Baushke suggested.

Thanks again to everyone who has provided feedback. I'm going to go
dive into "man 1 rcs" a bit more.

--
Leander Hasty


Ed Avis

unread,
Jan 1, 2004, 5:40:18 AM1/1/04
to info...@gnu.org
"Mark D. Baushke" <m...@cvshome.org> writes:

>There is a script that examines every version in your repository written
>by Donald Sharp called contrib/check_cvs.in

Sounds useful, I've often worried about how much code Donald Sharp
might have checked in to my repository when I wasn't looking.

--
Ed Avis <e...@membled.com>

Ed Avis

unread,
Jan 1, 2004, 5:46:32 AM1/1/04
to info...@gnu.org
Doesn't the other company have archived backups of their CVS
repository at earlier dates?

--
Ed Avis <e...@membled.com>


Greg A. Woods

unread,
Jan 1, 2004, 1:32:09 PM1/1/04
to Leander Hasty, CVS-II Discussion Mailing List
[ On Wednesday, December 31, 2003 at 14:48:48 (-0800), Leander Hasty wrote: ]
> Subject: RE: Copying parts of a repository, excluding revisions post-TAGNAME...

>
> Hmm. This does avoid having to accumulate the tags post-TAGNAME to pass
> to "cvs tag -d". "cvs admin -oTAGNAME::" dies on discovering any tags
> for later revisions.

Wow. That's not something I would have expected, even had I been fully
aware the '-o' option for "cvs admin" had that enhanced "::" feature.

Although I didn't know off the top of my head that "cvs admin" had an
enhanced '-o' option, I didn't even bother looking because of your note
about the inexperience of your vendor's CVS administrator(s). I thought
it would be better/safer if they modified a copy of their repository
directly with RCS commands than risk getting confused and do damage to
their real CVS repository with a mis-typed "cvs" command.

(someone really should port the "-o::" feature back to RCS and get a new
RCS release made too! ;-)

> It probably makes sense to make sure the dangling
> tags are deleted (or redirected to head, or something else safe).

I am assuming you'd only be using the repository copy in a read-only
manner and that any extra "future" tags will simply be ignored. Surely
they don't give away any information your vendor doesn't want you to see
(yet).

It would be my strong opinion that if any CVS command breaks in the
presence of tags for non-existant revisions then that would indicate a
serious bug in CVS. A warning message might be appropriate for some
commands if '-q' is not given, but that's all.

> I've passed along an inquiry about whether the cvs tree contains any
> branches post-TAGNAME to the other company.

If there are any branches at all then your vendor may not want you to
see what changes are on some or all of those branches.

They may also consider commits to any branches more recent than the time
they tagged the release to be proprietary.

I.e. the real question is why they don't want you to see future code
that they've not yet officially released to you.

> If they do, it seems I will
> probably have to script something slightly more sophisticated, along the
> lines of what Mark D. Baushke suggested.

If it gets that complicated then I'd suggest getting the "future" code
included in your mutual agreement so that you and your colleagues can
view it all without concern.

Alternately maybe you could simply ask them to make a backup copy of
their repository immediately after tagging the next release they give
you and give you that backup copy. Even if they have very active
developers and the module(s) they're giving you are very large it's not
likely there'll be much, if anything, leaked to you.

Otherwise just ask them to run "rcs2log" and ship you the result with
the exported code and be satisfied with that. While all of the pruning
of branches is possible, it seems it would be vastly more work than it's
worth, even once it's turned into a safe and reliable script. Sure it's
nice to be able to see a diff to understand a change, but if you're
really having trouble grasping their past changes to their code then
presumably you can ask them directly for help.

Leander Hasty

unread,
Jan 2, 2004, 2:47:09 PM1/2/04
to info...@gnu.org, Ed Avis
> From: info-cvs-bounces+lan=taldr...@gnu.org On Behalf Of Ed Avis
> Sent: Thursday, January 01, 2004 2:47 AM

> Doesn't the other company have archived backups of their CVS
> repository at earlier dates?

Unfortunately, no. That was the first thing they communicated to us
when this bit of negotiation was started.

It also appears that they do have branches after the desired TAGNAME.

--
Leander Hasty


Leander Hasty

unread,
Jan 2, 2004, 4:25:25 PM1/2/04
to CVS-II Discussion Mailing List
> From: Greg A. Woods [mailto:wo...@weird.com]
> Sent: Thursday, January 01, 2004 10:32 AM

> I am assuming you'd only be using the repository copy in a read-only
> manner and that any extra "future" tags will simply be
> ignored.

Yes and yes, I believe.

> > If they do, it seems I will probably have to script something
> > slightly more sophisticated, along the lines of what Mark D.
> > Baushke suggested.

> If it gets that complicated then I'd suggest getting the "future" code
> included in your mutual agreement so that you and your colleagues can
> view it all without concern.

I think we're looking into getting the post-TAGNAME code, hopefully
something will come of it...

> Alternately maybe you could simply ask them to make a backup copy of
> their repository immediately after tagging the next release they give
> you and give you that backup copy. Even if they have very active
> developers and the module(s) they're giving you are very
> large it's not likely there'll be much, if anything, leaked to you.

Unfortunately I don't know if they'll be tagging any more releases any
time soon. =(

> Otherwise just ask them to run "rcs2log" and ship you the result with
> the exported code and be satisfied with that.

Ah, interesting. Thank you -- another utility to go tinker with.

> While all of the pruning of branches is possible, it seems it would be
vastly more
> work than it's worth, even once it's turned into a safe and reliable
script.

"Safe" and "reliable" are the key words here, of course. :sigh: I
think I'm inclined to agree, overall.

Even if we had to just delete any entries in the RCS-file that were
timestamped after the TAGNAME checkin and have a non-working RCS file,
this would probably be better than nothing. It probably would still
take a reasonably robust script not to delete something it shouldn't,
however.

> Sure it's nice to be able to see a diff to understand a change, but if
you're
> really having trouble grasping their past changes to their code then
> presumably you can ask them directly for help.

For the same reason that they won't be tagging any more releases, asking
for help is a bit difficult. We're working on this, too, but as with
all things, this takes time.

Most of the reason we need the history is around is to track feature
changes -- a lot of code-paths in this project end very abruptly and
unexpectedly, and several hours into investigating a feature we'll find
that it dead-ends and has been replaced by a completely different
system. Some also induce crashes for unknown reasons, and the
changelogs would help in tracking these down too.

--
Leander Hasty


Mark D. Baushke

unread,
Jan 2, 2004, 5:44:24 PM1/2/04
to Leander Hasty, CVS-II Discussion Mailing List
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Leander Hasty <l...@taldren.com> writes:

> > From: Greg A. Woods [mailto:wo...@weird.com]
> > Sent: Thursday, January 01, 2004 10:32 AM
>

> > Otherwise just ask them to run "rcs2log" and ship you the result with
> > the exported code and be satisfied with that.
>
> Ah, interesting. Thank you -- another utility to go tinker with.

Another intersting tool to consider is cvs2cl
(http://www.red-bean.com/cvs2cl/).

I think you will find it has some very nice features.

Enjoy!
-- Mark

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQE/9fRI3x41pRYZE/gRAr2nAJ9uZzIBy5pVB0a1jlmQoP2sjYaz4QCfQ+rJ
x1+P42owQki83N5ZSwz4mrE=
=7RXp
-----END PGP SIGNATURE-----


Leander Hasty

unread,
Jan 2, 2004, 6:17:54 PM1/2/04
to CVS-II Discussion Mailing List, Mark D. Baushke
> -----Original Message-----
> From: m...@juniper.net On Behalf Of Mark D. Baushke
> Sent: Friday, January 02, 2004 2:44 PM

> Another intersting tool to consider is cvs2cl
> (http://www.red-bean.com/cvs2cl/).
>
> I think you will find it has some very nice features.

Ah. This is good, too...

And from this, I've found cvs2ps, which may do something reasonably
close to what we're looking for; we can restrict the patchset
descriptions to a certain date, and have the diff ignore binaries. It
certainly won't be as user-friendly as a working CVS module, but it may
serve the needed purpose.

Thank you again.

--
Leander Hasty


0 new messages