I've "verified" it in the sense that I've done a "git-whatchanged -p" at
various stages of the import, and it looked sane. I also compared doing a
tar-tree-export of the 2.6.12-rc2 release, which exists both in my current
git tree _and_ in the old bkcvs tree, and they compared identically apart
from the fact that the bkcvs tree has the BitKeeper/ directory and a
ChangeSet file.
It's also pretty aggressively packed - I used "--window=50 --depth=50"
(rather than the default 10 for both) to make the archive smaller, so it's
going to be somewhat more CPU-intensive to use (due to the possibly longer
delta chains), but it got the pack-file down from 204MB to 166MB, which I
think is pretty damn good for three years of history or whatever it is.
Especially considering that a gzip -9'd tar-file of the 2.6.12-rc2 release
is 45MB all on its own, that archive is just 3.6 times a single tree.
Of course, this _is_ the cvs import, which means that it's basically just
a straight-line linearization of the real BK history, but it's a pretty
good linearization and so it's certainly useful.
If somebody adds some logic to "parse_commit()" to do the "fake parent"
thing, you can stitch the histories together and see the end result as one
big tree. Even without that, you can already do things like
git diff v2.6.10..v2.6.12
(which crosses the BK->git transition) by just copying the 166MB pack-file
over, along with the tags that come with the thing. I've not verified it,
but if that doesn't work, then it's a git bug. It _should_ work.
BIG NOTE! This is definitely one archive you want to "rsync" instead of
closing with a git repack. The unpacked archive is somewhere in the 2.4GB
region, and since I actually used a higher compression ratio than the
default, you'll transfer a smaller pack that way anyway.
It will probably take a while to mirror out (in fact, as I write this, the
DSL upload just from my local machine out still has fifteen minutes to
go), but it should be visible out there soonish. Please holler if you find
any problems with the conversion, or if you just have suggestions for
improvments.
It actually took something like 16 hours to do the conversion on my
machine (most of it appears to have been due to CVS being slow, the git
parts were quick), so I won't re-convert for any trivial things.
I'm planning on doing the 2.4 tree too some day - either as a separate
branch in the same archive, or as a separate git archive, I haven't quite
decided yet. But I was more interested int he 2.6.x tree (for obvious
reasons), and before I do the 2.4.x one I'd like to give that tree some
time for people to check if the conversion was ok.
One thing that could be verified, for example (but that I have _not_
done), is to do a few random "git diff v2.6.x..v2.6.y" and comparing the
result with the standard diffs that are out there. Just to verify that the
archive looks ok. I assume there is some "diff-compare" out there that can
handle the fact that the files are diffed in a different order (and with
different flags) etc.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> I'm planning on doing the 2.4 tree too some day - either as a separate
> branch in the same archive, or as a separate git archive, I haven't quite
It'd be great to have the same thing but for the 1.0 - 2.2 tree. Of course
there are no "changelogs" for that, but incremental patches are still
available, and it'd be very interesting (for "historical reasons") to see how
things were added/removed
That's a bit of a hack which really doesn't belong in the git tools.
It's not particularly hard to reparent the tree for real -- I'd much
rather see a tool added to git which can _actually_ change the
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 commit to have a parent of
0bcc493c633d78373d3fcf9efc29d6a710637519, and ripple the corresponding
SHA1 changes up to the current HEAD.
Note that the latter commit ID I gave there was actually the 2.6.12-rc2
commit in Thomas' history import, not your own. Thomas has done a lot of
work on it, and it has the full names extracted from the shortlog
script, full timestamps, branch/merge history and consistent character
sets in the commit logs. I'd definitely suggest that you use that
instead of the import from bkcvs.
http://www.kernel.org/git/?p=linux/kernel/git/tglx/history.git;a=summary
--
dwmw2
On Wed, 27 Jul 2005, David Woodhouse wrote:
> On Tue, 2005-07-26 at 11:57 -0700, Linus Torvalds wrote:
> > If somebody adds some logic to "parse_commit()" to do the "fake parent"
> > thing, you can stitch the histories together and see the end result as one
> > big tree. Even without that, you can already do things like
> >
> > git diff v2.6.10..v2.6.12
>
> That's a bit of a hack which really doesn't belong in the git tools.
Actually, it's not a hack at all. It's very fundamentally how git works:
you give it two trees that it knows about, and it will show the
differences between them - regardless of whether they share any common
ancestry or not.
> It's not particularly hard to reparent the tree for real -- I'd much
> rather see a tool added to git which can _actually_ change the
> 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 commit to have a parent of
> 0bcc493c633d78373d3fcf9efc29d6a710637519, and ripple the corresponding
> SHA1 changes up to the current HEAD.
I used to think I wanted to, but these days I really don't. One of the
reasons is that I expect to try to pretty up the old bkcvs conversion some
time: use the name translation from the old "shortlog" scripts etc, and
see if I can do some other improvements on the conversion (I think I'll
remove the BK files - "ChangeSet" etc).
And it's really much easier and more general to have a "graft" facility.
It's something that git can do trivially (literally a hook in
"parse_commit" to add a special parent), and it's actually a generic
mechanism exactly for issues like this ("project had old history in some
other format").
Somebody already asked for having the import history for old historic
patches - which we _do_ actually have as patches, but which obviously
don't have any changelogs except for the version information. Most people
may not want that, but the thing is, with a "graft" facility, the people
who _do_ want that can easily see it all, and it is totally seamless.
So it's not even a one-time hack - it's a real feature that just in the
kernel would have several cases we'd be able to use it for, and the same
is likely true for almost any other project that wasn't started purely
from git..
Linus
On Wed, 27 Jul 2005, David Woodhouse wrote:
>
> Hm, OK. That works and can also be used for the "fake _absence_ of
> parent" thing -- if I'm space-constrained and want only the history back
> to some relatively recent point like 2.6.0, I can do that by turning the
> 2.6.0 commit into an orphan instead of also using all the rest of the
> history back to 2.4.0.
Yes. The grafting really should work pretty well for various things like
this, and at the same time I don't think it's ever going to be a huge
problem: people may have a couple of graft-points (if you want to drop
history, you may well have more than one point you need to "cauterize":
you may not be able to just cut it off at 2.6.0, since there may be merges
furhter back in history), but I don't think it's going to explode and
become unwieldly.
I just don't see people having more than a few trees that they might want
to graft together, and while the "drop history" thing might cause more
issues, even that is bounded by the amount of development parallellism, so
while it probably causes more graft-points than the "join trees" usage, it
should still be just a small handful of points.
Thomas has done all that; it's on kernel.org already.
> And it's really much easier and more general to have a "graft" facility.
> It's something that git can do trivially (literally a hook in
> "parse_commit" to add a special parent), and it's actually a generic
> mechanism exactly for issues like this ("project had old history in some
> other format").
Hm, OK. That works and can also be used for the "fake _absence_ of
parent" thing -- if I'm space-constrained and want only the history back
to some relatively recent point like 2.6.0, I can do that by turning the
2.6.0 commit into an orphan instead of also using all the rest of the
history back to 2.4.0.
--
dwmw2