---------- Forwarded message ----------
From: Matt Mackall <m...@selenic.com>
Date: Thu, Mar 27, 2008 at 12:09 PM
Subject: Re: mercurial --> plain text --> mercurial
To: didier deshommes <dfde...@gmail.com>
Cc: merc...@selenic.com
On Thu, 2008-03-27 at 14:24 +0000, didier deshommes wrote:
> Hi everyone,
> Sage (http://www.sagemath.org/) uses hg for its source control and recently a
> question has come up about the possibility of doing the following:
>
> (1) export everything in the .hg repo to something (perhaps a ton of
> stuff) in plain text format,
> (2) delete .hg/ directory
> (3) do something that recovers the .hg/ directory from the output of (1).
This will work for the export side:
#!/usr/bin/env python
import sys
from mercurial import revlog, node
for f in sys.argv[1:]:
r = revlog.revlog(open, f)
print "file:", f
for i in xrange(r.count()):
n = r.node(i)
p = r.parents(i)
d = r.revision(n)
print "node:", node.hex(n)
print "linkrev:", r.linkrev(n)
print "parents:", node.hex(p[0]), node.hex(p[1])
print "length:", len(d)
print "-start-"
print d
print "-end-"
Then you can do something like:
find .hg/store -name "*.i" | xargs ./dumprevlog > repo.dump
This will make a nice flat, uncompressed file with everything you need
to reconstruct a repo. But it'll be huge. The mercurial repo goes from
11MB to 435MB. Other projects will get -much- bigger; I've seen large
revlogs with compression ratios of > 1000:1.
I'm too busy to write the import side of this today, but it'll be about
as long. And you shouldn't actually need that piece if you only need to
scan the dump.
--
Mathematics is the supreme nostalgia of our time.
Here's what I did:
$ pwd # old repo
old-hg/
$ find .hg/store/ -name "*.i" | xargs dumprevlog > repo.dump
$ cd ~/new-hg # new repo
$ hg init
$ undumprevlog < ~/old-repo/repo.dump
[stuff happens...]
$ hg tip
dfdeshom <at> sage:~/new-hg$ hg tip
changeset: 8962:211b127eab5d
tag: tip
user: William Stein <wstein <at> gmail.com>
date: Mon Mar 17 16:03:46 2008 -0700
files: sage/rings/polynomial/multi_polynomial_element.py sage/version.py
description:
2.10.4
And doing hg co will re-populate this directory. And it looks like
these scripts will be incorporated in the new version of hg (in
/contribs/ I guess). Thanks to Matt for his quick response!
didier
---------- Forwarded message ----------
From: Matt Mackall <m...@selenic.com>
Date: Thu, Mar 27, 2008 at 1:49 PM
Subject: Re: mercurial --> plain text --> mercurial
To: didier deshommes <dfde...@gmail.com>
Cc: merc...@selenic.com
Alright, here's a pair of scripts that will do end-to-end:
diff -r bc142ee1522c contrib/dumprevlog
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/contrib/dumprevlog Thu Mar 27 12:40:17 2008 -0500
@@ -0,0 +1,21 @@
+#!/usr/bin/env python
+# Dump revlogs as raw data stream
+# $ find .hg/store/ -name "*.i" | xargs dumprevlog > repo.dump
+
+import sys
+from mercurial import revlog, node
+
+for f in sys.argv[1:]:
+ r = revlog.revlog(open, f)
+ print "file:", f
+ for i in xrange(r.count()):
+ n = r.node(i)
+ p = r.parents(n)
+ d = r.revision(n)
+ print "node:", node.hex(n)
+ print "linkrev:", r.linkrev(n)
+ print "parents:", node.hex(p[0]), node.hex(p[1])
+ print "length:", len(d)
+ print "-start-"
+ print d
+ print "-end-"
diff -r bc142ee1522c contrib/undumprevlog
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/contrib/undumprevlog Thu Mar 27 12:40:17 2008 -0500
@@ -0,0 +1,34 @@
+#!/usr/bin/env python
+# Undump a dump from dumprevlog
+# $ hg init
+# $ undumprevlog < repo.dump
+
+import sys
+from mercurial import revlog, node, util, transaction
+
+opener = util.opener('.', False)
+tr = transaction.transaction(sys.stderr.write, opener, "undump.journal")
+while 1:
+ l = sys.stdin.readline()
+ if not l:
+ break
+ if l.startswith("file:"):
+ f = l[6:-1]
+ r = revlog.revlog(opener, f)
+ print f
+ elif l.startswith("node:"):
+ n = node.bin(l[6:-1])
+ elif l.startswith("linkrev:"):
+ lr = int(l[9:-1])
+ elif l.startswith("parents:"):
+ p = l[9:-1].split()
+ p1 = node.bin(p[0])
+ p2 = node.bin(p[1])
+ elif l.startswith("length:"):
+ length = int(l[8:-1])
+ sys.stdin.readline() # start marker
+ d = sys.stdin.read(length)
+ sys.stdin.readline() # end marker
+ r.addrevision(d, tr, lr, p1, p2)
+
+tr.close()
Tested on the Mercurial repo.
ps: making this work on systems that have braindead notions about text
vs binary files is an exercise left to the reader
http://sage.math.washington.edu/home/robertwb/hg/
I've successfully exported and re-created simple repositories (with
branching) with these scripts, and it works great and preserves all
the history. The only issue is that I can't seem to get it to work
with any repositories older than a certain date. I think the issue is
that mercurial changed the way nodeid's are calculated (and I keep
getting an error "abort: patch is damaged or loses information" which
is thrown when the newly computed nodeid does not match the one in
the patch (command.py:1632 in 0.9.5)). Matt Mackall, any suggestions
on how to cleanly get around this/get the old node-id numbers instead?
- Robert
FYI, the sage-main repo has 244MB of patches, compressing down to
33MB under bzip2.
Robert,
Did you ever get a response to this question? Any updates on this?
William
--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org
Yes, they changed the way they do hashing, so if you commit the same
sequence of patches you will get different nodeids than you would
have in the past. This is how parents are identified, so will result
in missing parent errors.
I have a hacked version of mercurial that will keep a correspondance
between old and new hashes that allows you to recreate a repository
from the list of patches alone (this is how I rebased Cython,
exporting the patches, running them through a diff to change the
paths, and recreating the repo from those). One drawback is that the
nodeid's will be different so the repositories aren't compatible
anymore. (I also know how to change it the other way, so the
"computed" hash will just be the old one if available.)
If people are interested, I could try and release a less hackish patch.
- Robert
Could you "rebase" the entire Sage repo so it has the new hashes, then included
it for Sage-3.0 :-) If we're going to make a massive change like
that, 3.0 would
be the time to do it. Or does that request make no sense?
Wiliam
I'll let Robert answer, but he said "Yes, they changed the way they do
hashing,",
and I'm proposing somehow updating our repo so that throughout it uses
their new way of doing hashing. I'm not proposing something that would
happen more than once.
William
Yes, I could. This would mean that no pre-3.0 bundles would apply to
post-3.0 (short of re-basing the bundles--and the big one (coercion)
I could rebase myself). Patches should be just fine, and most things
aren't big enough to warrant bundles. Does anyone know if mercurial
1.0 changed how hashing is done (yet again) or is it finally stable?
If so I think this would be a good thing to do.
- Robert
Well this is definitely the right *time* to do it.
william
I'll do that then. Probably best to do right before the release, to
not disrupt the development cycle (and as the actual code won't
change (check with a diff) we won't need to be concerned about
breaking Sage). Perhaps the other packages should be changed as well.
- Robert
> On Apr 21, 8:24 pm, Robert Bradshaw <rober...@math.washington.edu>
> wrote:
>> On Apr 21, 2008, at 11:14 AM, William Stein wrote:
>
> Hi,
>
>>>> Yes, I could. This would mean that no pre-3.0 bundles would
>>>> apply to
>>>> post-3.0 (short of re-basing the bundles--and the big one
>>>> (coercion)
>>>> I could rebase myself). Patches should be just fine, and most
>>>> things
>>>> aren't big enough to warrant bundles.
>
> The number of bundles in trac is rather small and most of those
> bundles either have review issues or shouldn't be bundles in the first
> place [as you stated above], so applying them to a pre-3.0 tree,
> extracting the patch and so on should be doable.
Sure. The other concern is people with as-yet unsubmitted code on
their own computers. One will no longer be able to pull/push. (Does
the current upgrade try and do that?)
Maybe I could schedule doing it sometime when you're sleeping (does
that ever happen? :-) 'cause it can't be done in parallel to merging
very well.
>>>> Does anyone know if mercurial
>>>> 1.0 changed how hashing is done (yet again) or is it finally
>>>> stable?
>>>> If so I think this would be a good thing to do.
>>
>>> Well this is definitely the right *time* to do it.
>>
>> I'll do that then. Probably best to do right before the release, to
>> not disrupt the development cycle (and as the actual code won't
>> change (check with a diff) we won't need to be concerned about
>> breaking Sage). Perhaps the other packages should be changed as well.
>
> The main ones, i.e. extcode and scripts, too and I guess it would be
> nice to get all the hg repos in the spkgs fixed, too.
Certainly.
> Does this
> require that we upgrade to hg 1.0 or is it fine with the release we
> ship? Upgrading to 1.0 should be quick and I think I will get it done
> during the 3.0.1 cycle.
It requires a hacked version of hg I have on my computer, and not the
kind of patch that would ever get merged upstream (without cleanup).
I just asked the Mercurial guy who answered my original question if
the hashes changed (again) in 1.0, but I got the impression last time
that they have been sable for a while (just not as long as Sage has
been around).
- Robert
John
2008/4/21 Robert Bradshaw <robe...@math.washington.edu>:
> Sorry if this is a stupid question: if you are going to make a new
> complete repository with a patched version of mercurial, does that
> mean that native mercurial installations will not work with Sage from
> now on, only Sage's "own" version?
>
> John
No, it's just a changeing the internal names (the hex string called
nodeid) of the patches. Mercurial keeps track of what the "parent" of
a given changset is, but sometime in the last two years of so they
changed how they compute this hash which makes it difficult to export
and re-import an entire repository as a string of patches. My hacked
mercurial essentially allows one to rename all the old changesets
according to the new method without changing the content, history, or
timestamps of a repository.
- Robert
> On Apr 21, 8:24 pm, Robert Bradshaw <rober...@math.washington.edu>
> wrote:
>> On Apr 21, 2008, at 11:14 AM, William Stein wrote:
>
> Hi,
>
>>>> Yes, I could. This would mean that no pre-3.0 bundles would
>>>> apply to
>>>> post-3.0 (short of re-basing the bundles--and the big one
>>>> (coercion)
>>>> I could rebase myself). Patches should be just fine, and most
>>>> things
>>>> aren't big enough to warrant bundles.
>
> The number of bundles in trac is rather small and most of those
> bundles either have review issues or shouldn't be bundles in the first
> place [as you stated above], so applying them to a pre-3.0 tree,
> extracting the patch and so on should be doable.
Did you still want me to do this, or is this a non-issue now that 3.0
is officially released?
- Robert
Hi,
I've made a trac ticket for this, since it seems to have got stalled:
http://trac.sagemath.org/sage_trac/ticket/3052
William
Yep, that is one of the things I've noticed. The patch comment is
insufficient in this case.
- Robert
Can the git-style diffs somehow help?
hg diff --git
Jason