Fwd: mercurial --> plain text --> mercurial

44 views
Skip to first unread message

didier deshommes

unread,
Mar 27, 2008, 12:15:09 PM3/27/08
to sage-...@googlegroups.com
FYI, this is a proposed solution...


---------- Forwarded message ----------
From: Matt Mackall <m...@selenic.com>
Date: Thu, Mar 27, 2008 at 12:09 PM
Subject: Re: mercurial --> plain text --> mercurial
To: didier deshommes <dfde...@gmail.com>
Cc: merc...@selenic.com

On Thu, 2008-03-27 at 14:24 +0000, didier deshommes wrote:
> Hi everyone,
> Sage (http://www.sagemath.org/) uses hg for its source control and recently a
> question has come up about the possibility of doing the following:
>
> (1) export everything in the .hg repo to something (perhaps a ton of
> stuff) in plain text format,
> (2) delete .hg/ directory
> (3) do something that recovers the .hg/ directory from the output of (1).

This will work for the export side:

#!/usr/bin/env python
import sys
from mercurial import revlog, node

for f in sys.argv[1:]:
r = revlog.revlog(open, f)
print "file:", f
for i in xrange(r.count()):
n = r.node(i)
p = r.parents(i)
d = r.revision(n)
print "node:", node.hex(n)
print "linkrev:", r.linkrev(n)
print "parents:", node.hex(p[0]), node.hex(p[1])
print "length:", len(d)
print "-start-"
print d
print "-end-"

Then you can do something like:

find .hg/store -name "*.i" | xargs ./dumprevlog > repo.dump

This will make a nice flat, uncompressed file with everything you need
to reconstruct a repo. But it'll be huge. The mercurial repo goes from
11MB to 435MB. Other projects will get -much- bigger; I've seen large
revlogs with compression ratios of > 1000:1.

I'm too busy to write the import side of this today, but it'll be about
as long. And you shouldn't actually need that piece if you only need to
scan the dump.

--
Mathematics is the supreme nostalgia of our time.

didier deshommes

unread,
Mar 27, 2008, 5:04:07 PM3/27/08
to sage-...@googlegroups.com, m...@selenic.com
From the scripts below I was able to dump a text version of the SAGE
repo and recover it to make another hg repo out of it. This requires:
- mercurial 1.0
- a change in the layout of the sage repo. It's not too hard: hg
clone --pull $SAGE_REPO will create an exact copy of the repository in
the new format.

Here's what I did:
$ pwd # old repo
old-hg/
$ find .hg/store/ -name "*.i" | xargs dumprevlog > repo.dump

$ cd ~/new-hg # new repo
$ hg init
$ undumprevlog < ~/old-repo/repo.dump
[stuff happens...]

$ hg tip
dfdeshom <at> sage:~/new-hg$ hg tip
changeset: 8962:211b127eab5d
tag: tip
user: William Stein <wstein <at> gmail.com>
date: Mon Mar 17 16:03:46 2008 -0700
files: sage/rings/polynomial/multi_polynomial_element.py sage/version.py
description:
2.10.4

And doing hg co will re-populate this directory. And it looks like
these scripts will be incorporated in the new version of hg (in
/contribs/ I guess). Thanks to Matt for his quick response!

didier

---------- Forwarded message ----------
From: Matt Mackall <m...@selenic.com>
Date: Thu, Mar 27, 2008 at 1:49 PM
Subject: Re: mercurial --> plain text --> mercurial
To: didier deshommes <dfde...@gmail.com>
Cc: merc...@selenic.com


Alright, here's a pair of scripts that will do end-to-end:

diff -r bc142ee1522c contrib/dumprevlog
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/contrib/dumprevlog Thu Mar 27 12:40:17 2008 -0500
@@ -0,0 +1,21 @@
+#!/usr/bin/env python
+# Dump revlogs as raw data stream
+# $ find .hg/store/ -name "*.i" | xargs dumprevlog > repo.dump
+
+import sys

+from mercurial import revlog, node
+
+for f in sys.argv[1:]:
+ r = revlog.revlog(open, f)
+ print "file:", f
+ for i in xrange(r.count()):
+ n = r.node(i)
+ p = r.parents(n)
+ d = r.revision(n)

+ print "node:", node.hex(n)
+ print "linkrev:", r.linkrev(n)
+ print "parents:", node.hex(p[0]), node.hex(p[1])
+ print "length:", len(d)
+ print "-start-"
+ print d
+ print "-end-"
diff -r bc142ee1522c contrib/undumprevlog
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/contrib/undumprevlog Thu Mar 27 12:40:17 2008 -0500
@@ -0,0 +1,34 @@
+#!/usr/bin/env python
+# Undump a dump from dumprevlog
+# $ hg init
+# $ undumprevlog < repo.dump
+
+import sys
+from mercurial import revlog, node, util, transaction
+
+opener = util.opener('.', False)
+tr = transaction.transaction(sys.stderr.write, opener, "undump.journal")
+while 1:
+ l = sys.stdin.readline()
+ if not l:
+ break
+ if l.startswith("file:"):
+ f = l[6:-1]
+ r = revlog.revlog(opener, f)
+ print f
+ elif l.startswith("node:"):
+ n = node.bin(l[6:-1])
+ elif l.startswith("linkrev:"):
+ lr = int(l[9:-1])
+ elif l.startswith("parents:"):
+ p = l[9:-1].split()
+ p1 = node.bin(p[0])
+ p2 = node.bin(p[1])
+ elif l.startswith("length:"):
+ length = int(l[8:-1])
+ sys.stdin.readline() # start marker
+ d = sys.stdin.read(length)
+ sys.stdin.readline() # end marker
+ r.addrevision(d, tr, lr, p1, p2)
+
+tr.close()

Tested on the Mercurial repo.

ps: making this work on systems that have braindead notions about text
vs binary files is an exercise left to the reader

Robert Bradshaw

unread,
Mar 27, 2008, 7:49:58 PM3/27/08
to sage-...@googlegroups.com, m...@selenic.com
I've looked into this some more and it looks like we can completely
reconstruct a repository from the export of all its keywords. The
trick is to use the --exact keyword when importing. This forces it to
apply the given patch to the correct parent (sometimes creating a new
head) and will also correctly import merge patches (removing heads).
Some scripts to do this are up at

http://sage.math.washington.edu/home/robertwb/hg/

I've successfully exported and re-created simple repositories (with
branching) with these scripts, and it works great and preserves all
the history. The only issue is that I can't seem to get it to work
with any repositories older than a certain date. I think the issue is
that mercurial changed the way nodeid's are calculated (and I keep
getting an error "abort: patch is damaged or loses information" which
is thrown when the newly computed nodeid does not match the one in
the patch (command.py:1632 in 0.9.5)). Matt Mackall, any suggestions
on how to cleanly get around this/get the old node-id numbers instead?

- Robert

FYI, the sage-main repo has 244MB of patches, compressing down to
33MB under bzip2.

William Stein

unread,
Apr 21, 2008, 10:12:18 AM4/21/08
to sage-...@googlegroups.com
On Thu, Mar 27, 2008 at 4:49 PM, Robert Bradshaw
<robe...@math.washington.edu> wrote:
>
> I've looked into this some more and it looks like we can completely
> reconstruct a repository from the export of all its keywords. The
> trick is to use the --exact keyword when importing. This forces it to
> apply the given patch to the correct parent (sometimes creating a new
> head) and will also correctly import merge patches (removing heads).
> Some scripts to do this are up at
>
> http://sage.math.washington.edu/home/robertwb/hg/
>
> I've successfully exported and re-created simple repositories (with
> branching) with these scripts, and it works great and preserves all
> the history. The only issue is that I can't seem to get it to work
> with any repositories older than a certain date. I think the issue is
> that mercurial changed the way nodeid's are calculated (and I keep
> getting an error "abort: patch is damaged or loses information" which
> is thrown when the newly computed nodeid does not match the one in
> the patch (command.py:1632 in 0.9.5)). Matt Mackall, any suggestions
> on how to cleanly get around this/get the old node-id numbers instead

Robert,

Did you ever get a response to this question? Any updates on this?


William

--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

Robert Bradshaw

unread,
Apr 21, 2008, 1:32:49 PM4/21/08
to sage-...@googlegroups.com

Yes, they changed the way they do hashing, so if you commit the same
sequence of patches you will get different nodeids than you would
have in the past. This is how parents are identified, so will result
in missing parent errors.

I have a hacked version of mercurial that will keep a correspondance
between old and new hashes that allows you to recreate a repository
from the list of patches alone (this is how I rebased Cython,
exporting the patches, running them through a diff to change the
paths, and recreating the repo from those). One drawback is that the
nodeid's will be different so the repositories aren't compatible
anymore. (I also know how to change it the other way, so the
"computed" hash will just be the old one if available.)

If people are interested, I could try and release a less hackish patch.

- Robert

William Stein

unread,
Apr 21, 2008, 1:46:47 PM4/21/08
to sage-...@googlegroups.com

Could you "rebase" the entire Sage repo so it has the new hashes, then included
it for Sage-3.0 :-) If we're going to make a massive change like
that, 3.0 would
be the time to do it. Or does that request make no sense?

Wiliam

mabshoff

unread,
Apr 21, 2008, 1:51:04 PM4/21/08
to sage-devel
Hi,

> Could you "rebase" the entire Sage repo so it has the new hashes, then included
> it for Sage-3.0 :-)  If we're going to make a massive change like
> that, 3.0 would
> be the time to do it.  Or does that request make no sense?

We will likely have the same problem every time we merge heads if I
understand the problem correctly. We also have about 40 patches
outstanding [at any given time it seems - the number seems to
oscillate around 40] and all of those would need to be rebased. Since
the repo-as-text is a very specific case and in case it would have to
be redone after each branch merge I see little benefit to do it.

> Wiliam

Cheers,

Michael

William Stein

unread,
Apr 21, 2008, 1:53:12 PM4/21/08
to sage-...@googlegroups.com

I'll let Robert answer, but he said "Yes, they changed the way they do
hashing,",
and I'm proposing somehow updating our repo so that throughout it uses
their new way of doing hashing. I'm not proposing something that would
happen more than once.

William

Robert Bradshaw

unread,
Apr 21, 2008, 2:06:15 PM4/21/08
to sage-...@googlegroups.com

Yes, I could. This would mean that no pre-3.0 bundles would apply to
post-3.0 (short of re-basing the bundles--and the big one (coercion)
I could rebase myself). Patches should be just fine, and most things
aren't big enough to warrant bundles. Does anyone know if mercurial
1.0 changed how hashing is done (yet again) or is it finally stable?
If so I think this would be a good thing to do.

- Robert

William Stein

unread,
Apr 21, 2008, 2:14:28 PM4/21/08
to sage-...@googlegroups.com

Well this is definitely the right *time* to do it.

william

Robert Bradshaw

unread,
Apr 21, 2008, 2:24:06 PM4/21/08
to sage-...@googlegroups.com

I'll do that then. Probably best to do right before the release, to
not disrupt the development cycle (and as the actual code won't
change (check with a diff) we won't need to be concerned about
breaking Sage). Perhaps the other packages should be changed as well.

- Robert

mabshoff

unread,
Apr 21, 2008, 2:34:25 PM4/21/08
to sage-devel


On Apr 21, 8:24 pm, Robert Bradshaw <rober...@math.washington.edu>
wrote:
> On Apr 21, 2008, at 11:14 AM, William Stein wrote:

Hi,

> >>  Yes, I could. This would mean that no pre-3.0 bundles would apply to
> >>  post-3.0 (short of re-basing the bundles--and the big one (coercion)
> >>  I could rebase myself). Patches should be just fine, and most things
> >>  aren't big enough to warrant bundles.

The number of bundles in trac is rather small and most of those
bundles either have review issues or shouldn't be bundles in the first
place [as you stated above], so applying them to a pre-3.0 tree,
extracting the patch and so on should be doable.

> >> Does anyone know if mercurial
> >>  1.0 changed how hashing is done (yet again) or is it finally stable?
> >>  If so I think this would be a good thing to do.
>
> > Well this is definitely the right *time* to do it.
>
> I'll do that then. Probably best to do right before the release, to  
> not disrupt the development cycle (and as the actual code won't  
> change (check with a diff) we won't need to be concerned about  
> breaking Sage). Perhaps the other packages should be changed as well.

The main ones, i.e. extcode and scripts, too and I guess it would be
nice to get all the hg repos in the spkgs fixed, too. Does this
require that we upgrade to hg 1.0 or is it fine with the release we
ship? Upgrading to 1.0 should be quick and I think I will get it done
during the 3.0.1 cycle.

> - Robert

Cheers,

Michael

Robert Bradshaw

unread,
Apr 21, 2008, 2:43:13 PM4/21/08
to sage-...@googlegroups.com
On Apr 21, 2008, at 11:34 AM, mabshoff wrote:

> On Apr 21, 8:24 pm, Robert Bradshaw <rober...@math.washington.edu>
> wrote:
>> On Apr 21, 2008, at 11:14 AM, William Stein wrote:
>
> Hi,
>
>>>> Yes, I could. This would mean that no pre-3.0 bundles would
>>>> apply to
>>>> post-3.0 (short of re-basing the bundles--and the big one
>>>> (coercion)
>>>> I could rebase myself). Patches should be just fine, and most
>>>> things
>>>> aren't big enough to warrant bundles.
>
> The number of bundles in trac is rather small and most of those
> bundles either have review issues or shouldn't be bundles in the first
> place [as you stated above], so applying them to a pre-3.0 tree,
> extracting the patch and so on should be doable.

Sure. The other concern is people with as-yet unsubmitted code on
their own computers. One will no longer be able to pull/push. (Does
the current upgrade try and do that?)

Maybe I could schedule doing it sometime when you're sleeping (does
that ever happen? :-) 'cause it can't be done in parallel to merging
very well.

>>>> Does anyone know if mercurial
>>>> 1.0 changed how hashing is done (yet again) or is it finally
>>>> stable?
>>>> If so I think this would be a good thing to do.
>>
>>> Well this is definitely the right *time* to do it.
>>
>> I'll do that then. Probably best to do right before the release, to
>> not disrupt the development cycle (and as the actual code won't
>> change (check with a diff) we won't need to be concerned about
>> breaking Sage). Perhaps the other packages should be changed as well.
>
> The main ones, i.e. extcode and scripts, too and I guess it would be
> nice to get all the hg repos in the spkgs fixed, too.

Certainly.

> Does this
> require that we upgrade to hg 1.0 or is it fine with the release we
> ship? Upgrading to 1.0 should be quick and I think I will get it done
> during the 3.0.1 cycle.

It requires a hacked version of hg I have on my computer, and not the
kind of patch that would ever get merged upstream (without cleanup).
I just asked the Mercurial guy who answered my original question if
the hashes changed (again) in 1.0, but I got the impression last time
that they have been sable for a while (just not as long as Sage has
been around).

- Robert

John Cremona

unread,
Apr 21, 2008, 2:47:54 PM4/21/08
to sage-...@googlegroups.com
Sorry if this is a stupid question: if you are going to make a new
complete repository with a patched version of mercurial, does that
mean that native mercurial installations will not work with Sage from
now on, only Sage's "own" version?

John

2008/4/21 Robert Bradshaw <robe...@math.washington.edu>:

Robert Bradshaw

unread,
Apr 21, 2008, 2:58:14 PM4/21/08
to sage-...@googlegroups.com
On Apr 21, 2008, at 11:47 AM, John Cremona wrote:

> Sorry if this is a stupid question: if you are going to make a new
> complete repository with a patched version of mercurial, does that
> mean that native mercurial installations will not work with Sage from
> now on, only Sage's "own" version?
>
> John

No, it's just a changeing the internal names (the hex string called
nodeid) of the patches. Mercurial keeps track of what the "parent" of
a given changset is, but sometime in the last two years of so they
changed how they compute this hash which makes it difficult to export
and re-import an entire repository as a string of patches. My hacked
mercurial essentially allows one to rename all the old changesets
according to the new method without changing the content, history, or
timestamps of a repository.

- Robert

Robert Bradshaw

unread,
Apr 22, 2008, 3:41:47 AM4/22/08
to sage-...@googlegroups.com
On Apr 21, 2008, at 11:34 AM, mabshoff wrote:

> On Apr 21, 8:24 pm, Robert Bradshaw <rober...@math.washington.edu>
> wrote:
>> On Apr 21, 2008, at 11:14 AM, William Stein wrote:
>
> Hi,
>
>>>> Yes, I could. This would mean that no pre-3.0 bundles would
>>>> apply to
>>>> post-3.0 (short of re-basing the bundles--and the big one
>>>> (coercion)
>>>> I could rebase myself). Patches should be just fine, and most
>>>> things
>>>> aren't big enough to warrant bundles.
>
> The number of bundles in trac is rather small and most of those
> bundles either have review issues or shouldn't be bundles in the first
> place [as you stated above], so applying them to a pre-3.0 tree,
> extracting the patch and so on should be doable.

Did you still want me to do this, or is this a non-issue now that 3.0
is officially released?

- Robert

mabshoff

unread,
Apr 22, 2008, 3:53:00 AM4/22/08
to sage-devel
On Apr 22, 9:41 am, Robert Bradshaw <rober...@math.washington.edu>
wrote:
> On Apr 21, 2008, at 11:34 AM, mabshoff wrote:
<SNIP>
> > The number of bundles in trac is rather small and most of those
> > bundles either have review issues or shouldn't be bundles in the first
> > place [as you stated above], so applying them to a pre-3.0 tree,
> > extracting the patch and so on should be doable.
>
> Did you still want me to do this, or is this a non-issue now that 3.0  
> is officially released?

Sure, let's do it. So far only an informal announcement to sage-devel
is out [aside from the big, fat reference on sagemath.org], but since
we have to update HISTORY.txt as well as fix some small issue in the
scripts repo it is something that can be done now. It won't make a
difference for users, the inconvenience for developers will be small,
i.e. we can keep an old style 3.0 around and rebase on that tree.
Since 3.0.1 will see "bug fixes only" it seems like the perfect time
to do so.

> - Robert

Cheers,

Michael

William Stein

unread,
Apr 29, 2008, 1:14:10 AM4/29/08
to sage-...@googlegroups.com

Hi,

I've made a trac ticket for this, since it seems to have got stalled:

http://trac.sagemath.org/sage_trac/ticket/3052

William

mabshoff

unread,
May 2, 2008, 5:39:57 AM5/2/08
to sage-devel


On Apr 29, 7:14 am, "William Stein" <wst...@gmail.com> wrote:
> On Tue, Apr 22, 2008 at 12:53 AM, mabshoff
<SNIP>

> Hi,
>
> I've made a trac ticket for this, since it seems to have got stalled:
>
>    http://trac.sagemath.org/sage_trac/ticket/3052
>
> William

Robert,

I have come across a case that might cause some trouble: If you change
the permissions on a file you need to make a mercurial checking since
hg claims the repo has been changed, which is true. But export that
changeset and it is empty. GIT handles renames and permission changes
and also prints those status changes in the log. So, does hg do
anything about those changes internally and is it just the log that is
insufficient? In the end it will not matter much since we can just add
a list of files whose permissions have to be changed and restore them
if it causes trouble. In case of spkg-install & friends for example
that is automatically taken care of by first making the scripts
executable before they are being run by sage-$FOO.

Cheers,

Michael

Robert Bradshaw

unread,
May 2, 2008, 4:48:04 PM5/2/08
to sage-...@googlegroups.com

Yep, that is one of the things I've noticed. The patch comment is
insufficient in this case.

- Robert

Jason Grout

unread,
May 2, 2008, 5:33:45 PM5/2/08
to sage-...@googlegroups.com


Can the git-style diffs somehow help?

hg diff --git

Jason

Keshav Kini

unread,
Aug 11, 2011, 10:56:45 AM8/11/11
to sage-...@googlegroups.com
Sorry for the necropost. After talking to William I posted a patch at #3052, if someone wants to take a look. It doesn't export the repository as diffs, but as the more simplistic patches Mercurial actually uses internally (consisting of a series, per changeset per file, of ranges to delete and blocks of data to replace them with), collated in JSON.

-Keshav

----
Join us in #sagemath on irc.freenode.net !

leif

unread,
Aug 11, 2011, 1:57:13 PM8/11/11
to sage-devel
Sorry, OT:

On 11 Aug., 16:56, Keshav Kini <keshav.k...@gmail.com> wrote:
> Sorry for the necropost.

How did you manage that?

At least Google's web interface refused a mailing list reply (to a
thread of Feb/March 2011!) recently; it only gave me the option to
directly reply to the author (without any explanation btw., there was
just the "Reply to author" link at the bottom, and "More options"
didn't show more either.)


-leif

Keshav Kini

unread,
Aug 11, 2011, 9:26:44 PM8/11/11
to sage-...@googlegroups.com
Google Groups has a new interface (which looks like Google Reader). They seem to have lifted that restriction along with the unveiling of the new interface. It's been around for a few months. I guess they're distancing themselves from their Usenet roots, i.e. no longer keeping their own internal distinction between active threads and archived threads to match with "extant" and "already purged" messages on Usenet servers (?).
Reply all
Reply to author
Forward
0 new messages