mercurial --> plain text --> mercurial

9 views
Skip to first unread message

William Stein

unread,
Mar 26, 2008, 2:53:17 PM3/26/08
to sage-...@googlegroups.com
Hi Jason (or anybody),

Does anybody have a clue if it is possible to take a directory (e.g.,
devel/sage/) with an .hg repo directory
in it, and do the following:

(1) export everything in the .hg repo to something (perhaps a ton of
stuff) in plain text format,
(2) delete .hg
(3) do something that recovers the .hg directory from the output of (1).

Note that just doing hg_sage.export([0..10000]), where say 10000 is
the tip, doesn't work, because
that looses all information about branching, etc., hence fails completely.

If mercurial can't do the above, that is a _very_ serious problem for
the longterm viability of
Mercurial at least for Sage. So any ideas how to do the above?

The reason for doing (1) -- (3) is that it is possible to scan an .hg
directory with antivirus tools.
Thus something silly like "base64-encode a tarball of .hg" won't work.

-- william

--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

mabshoff

unread,
Mar 26, 2008, 3:04:12 PM3/26/08
to sage-devel


On Mar 26, 7:53 pm, "William Stein" <wst...@gmail.com> wrote:
> Hi Jason (or anybody),
>
> Does anybody have a clue if it is possible to take a directory (e.g.,
> devel/sage/) with an .hg repo directory
> in it, and do the following:
>
>   (1) export everything in the .hg repo to something (perhaps a ton of
> stuff) in plain text format,
>   (2) delete .hg
>   (3) do something that recovers the .hg directory from the output of (1).

There is nothing that does that currently that I am awake of. I did
play around with this and wrote some dummy scripts that

a) export every commit from a tree
b) create an empty hg repo
c) reimport each exported changeset

The md5sum of the files inside the .hg repo is different afterwards,
but parents and changesets and all that fun stuff remains the same.

> Note that just doing hg_sage.export([0..10000]), where say 10000 is
> the tip, doesn't work, because
> that looses all information about branching, etc., hence fails completely.

I need to see what happens with branches, but I am not sure what
happens in case of multi head merges.

> If mercurial can't do the above, that is a _very_ serious problem for
> the longterm viability of
> Mercurial at least for Sage.      So any ideas how to do the above?
>
> The reason for doing (1) -- (3) is that it is possible to scan an .hg
> directory with antivirus tools.
> Thus something silly like "base64-encode a tarball of .hg" won't work.
>
>  -- william

Cheers,

Michael

root

unread,
Mar 26, 2008, 4:18:44 PM3/26/08
to sage-...@googlegroups.com, sage-...@googlegroups.com
William,

git can do this. Since git uses a hash it will always regenerate the
same hash from the same file.

In fact, git uses hashes all the way down the tree so you can just
look at the hash code of the root of the tree to see if anything
changes. Equal hash codes, even across the net, imply exact copies
of the source tree.

Axiom uses arch, cvs, svn, and git. I have used several other systems
in the past. Now all of the primary work is in git and is export-only
to the other systems (git can work with them transparently). git has
fundamentally changed the way I work and the way Axiom is maintained,
all for the better.

I know it is a challenge to change source code systems but the gain
is well worth the pain in this case. The fact that git works with
legacy humans is a huge plus in minimizing the pain.

Tim Daly
Axiom

William Stein

unread,
Mar 26, 2008, 3:12:13 PM3/26/08
to sage-...@googlegroups.com

We are indeed considering changing to git. The
repo --> plain text --> original repo
problem is a show stopper -- i.e., if Mercurial absolutely can't
do that, then we have no choice but to dump mercurial for git.
I hope Mercurial can do that though, since we've spent a lot
of time getting going with Mercurial, and it works fairly well.

-- William

Mike Hansen

unread,
Mar 26, 2008, 3:21:50 PM3/26/08
to sage-...@googlegroups.com
It seems like the mercurial mailing list would be the best place to go for this.

Using queues has made me quite a bit more productive, and I'd like to
avoid switching to a version control system without them. Also, the
git documentation leaves something to be desired compared to the
Mercurial book.

--Mike

mabshoff

unread,
Mar 26, 2008, 3:28:45 PM3/26/08
to sage-devel


On Mar 26, 8:12 pm, "William Stein" <wst...@gmail.com> wrote:
I did play around with a small repo that included a bundle that I did
merge without conflict and in that case reimporting the commits one by
one yields the same md5sums for all the files in the repo. When I
imported a "merge" changeset I got the following:

[mabshoff@localhost c]$ hg import ../b/5.patch
applying ../b/5.patch
abort: 00changelog.i: no node
1711dd4455804471c46598ec4f92f672de73916e!

But it didn't make a difference, expect that in the end the repo had
one fewer commit. I am still curious what happens if you resolve a
merge conflict in that changeset. I am not too positive about the
outcome since "hg help export" states:

NOTE: export may generate unexpected diff output for merge
changesets,
as it will compare the merge changeset against its first parent
only.

I guess I need to take on the Sage repo and see what happens when I do
the same thing with 9,000+ commits.

>  -- William

Cheers,

Michael

root

unread,
Mar 26, 2008, 4:46:40 PM3/26/08
to sage-...@googlegroups.com, sage-...@googlegroups.com
Mike,

>Using queues has made me quite a bit more productive, and I'd like to
>avoid switching to a version control system without them. Also, the
>git documentation leaves something to be desired compared to the
>Mercurial book.

The queues feature in Mercurial is available independently in the
quilt system. Mercurial makes this point:
<http://hgbook.red-bean.com/hgbookch12.html>

Tim Daly
Axiom

Mike Hansen

unread,
Mar 26, 2008, 3:41:37 PM3/26/08
to sage-...@googlegroups.com
> The queues feature in Mercurial is available independently in the
> quilt system. Mercurial makes this point:
> <http://hgbook.red-bean.com/hgbookch12.html>

There are things with queues that you don't get with quilt. Quoting
from the book,

"As an example, the integration of patches with revision control makes
understanding patches and debugging their effects—and their interplay
with the code they're based on—enormously easier. Since every applied
patch has an associated changeset, you can use "hg log filename" to
see which changesets and patches affected a file. You can use the
bisect extension to binary-search through all changesets and applied
patches to see where a bug got introduced or fixed. You can use the
"hg annotate" command to see which changeset or patch modified a
particular line of a source file. And so on."

--Mike

root

unread,
Mar 26, 2008, 5:19:16 PM3/26/08
to sage-...@googlegroups.com, sage-...@googlegroups.com

Sorry, I didn't mean to start a debate about the relative feature
set of the various systems. The git system can do what you want
(e.g. bisect, branch, etc.). Having gone thru the series of changes
cvs->arch->svn->git I understand why you would be reluctant to change.

The original point was the question of regenerating the information.
Since git uses hashing to guarantee uniqueness you know that any
root (or subtree) with equal hashs has the same code. This makes it
impossible to inject a virus. It also makes it very convenient to
recreate the exact sources used by a user reporting a bug since you
simply "undo the changes until the root is equal" and you have the
exact sources reporting the bug. Given the high change rate of Sage
this could be a major feature.

Another point worth mentioning is that git only stores one copy of a
file if they hash to the same value. Since the GMP library, BLAS,
LAPACK or other common libraries might show up in several spkgs there
is the potential for a significant reduction in storage space. I also
noticed that arch and svn seem to keep a second copy of the system
somewhere (not sure what hg does) but moving to git immediately
reduced the required disk space by a factor of 2. For a system as
large as Sage this might prove interesting. If I get the time I
can try to import Sage into git to quantify the gain.

Tim Daly
Axiom

Carl Witty

unread,
Mar 26, 2008, 4:12:40 PM3/26/08
to sage-devel
On Mar 26, 11:53 am, "William Stein" <wst...@gmail.com> wrote:
> Hi Jason (or anybody),
>
> Does anybody have a clue if it is possible to take a directory (e.g.,
> devel/sage/) with an .hg repo directory
> in it, and do the following:
>
> (1) export everything in the .hg repo to something (perhaps a ton of
> stuff) in plain text format,
> (2) delete .hg
> (3) do something that recovers the .hg directory from the output of (1).
>
> Note that just doing hg_sage.export([0..10000]), where say 10000 is
> the tip, doesn't work, because
> that looses all information about branching, etc., hence fails completely.
>
> If mercurial can't do the above, that is a _very_ serious problem for
> the longterm viability of
> Mercurial at least for Sage. So any ideas how to do the above?
>
> The reason for doing (1) -- (3) is that it is possible to scan an .hg
> directory with antivirus tools.
> Thus something silly like "base64-encode a tarball of .hg" won't work.

I still don't understand the requirements. First, that last paragraph
makes a lot more sense with "it is impossible" than "it is possible".
Did you mean "impossible"?

Second, are you worried about people checking in viruses, or people
concealing a virus in the .hg directory without it being checked in?

For the former concern, it seems that it would be sufficient to check
out the files, and you don't need to recreate the repository. For the
latter concern, perhaps something based on "hg verify" would suffice
to ensure that nothing nasty has been hidden in the repository.

Carl

Robert Bradshaw

unread,
Mar 26, 2008, 4:27:05 PM3/26/08
to sage-...@googlegroups.com
On Mar 26, 2008, at 11:53 AM, William Stein wrote:

> Hi Jason (or anybody),
>
> Does anybody have a clue if it is possible to take a directory (e.g.,
> devel/sage/) with an .hg repo directory
> in it, and do the following:
>
> (1) export everything in the .hg repo to something (perhaps a ton of
> stuff) in plain text format,
> (2) delete .hg
> (3) do something that recovers the .hg directory from the output
> of (1).
>
> Note that just doing hg_sage.export([0..10000]), where say 10000 is
> the tip, doesn't work, because
> that looses all information about branching, etc., hence fails
> completely.
>
> If mercurial can't do the above, that is a _very_ serious problem for
> the longterm viability of
> Mercurial at least for Sage. So any ideas how to do the above?
>
> The reason for doing (1) -- (3) is that it is possible to scan an .hg
> directory with antivirus tools.
> Thus something silly like "base64-encode a tarball of .hg" won't work.

I think it should be possible to export a series of patches and a
(python) script that would apply the patches in the right order,
clone, and merge to get back the original repository. It might not be
the most efficient however. I'll look into this more.

Has anyone tried contacting the mercurial developers?

- Robert

mabshoff

unread,
Mar 26, 2008, 4:30:37 PM3/26/08
to sage-devel


On Mar 26, 9:27 pm, Robert Bradshaw <rober...@math.washington.edu>
wrote:
Nope, that doesn't work on the Sage repo. Exporting all 9028
changesets of 2.11.alpha1 took about 40 minutes, but on reimport to a
fresh repo failed around 1300 changesets. The problem is that export
of a merge on diffs against on parent. So if you resolve a merge
conflict in a merge changeset things go FUBAR.

That was with 0.9.5, but I haven't tried 1.0 yet.

> - Robert

Cheers,

Michael

Robert Bradshaw

unread,
Mar 26, 2008, 4:35:46 PM3/26/08
to sage-...@googlegroups.com

I was talking about something more sophisticated than export/import,
which won't work the instant one has multiple branches. One needs to
actually create multiple heads, apply patches, then resolve them. Hg
export doesn't have enough information to do this.

- Robert


mabshoff

unread,
Mar 26, 2008, 4:56:06 PM3/26/08
to sage-devel
On Mar 26, 9:35 pm, Robert Bradshaw <rober...@math.washington.edu>
wrote:

> I was talking about something more sophisticated than export/import,  
> which won't work the instant one has multiple branches. One needs to  
> actually create multiple heads, apply patches, then resolve them. Hg  
> export doesn't have enough information to do this.

Ok, sounds good. Do you have any pointers or documentation on this?

> - Robert

Cheers,

Michael

Robert Bradshaw

unread,
Mar 26, 2008, 4:59:10 PM3/26/08
to sage-...@googlegroups.com

Not at the moment, but I've mucked around with mercurial more than
most so I don't think it should be too hard once I start looking into
it.

- Robert

Jason Grout

unread,
Mar 26, 2008, 6:13:36 PM3/26/08
to sage-...@googlegroups.com

I use git to manage my personal/professional file repository. To me,
Mercurial is much simpler, but git is more powerful and feels more
stable. I don't have a huge amount of experience with git, though; I
keep forgetting the commands to do things, so I keep putting off
checking things in and working on things in git :). Thank goodness for
the git-gui, gitk, and qgit tools that give graphical interfaces to a
git repository!

As to queues, of course, the concept and original software originated
with the linux development model, as far as I know. Git has a tool
called StGit ("Stacked Git"; http://procode.org/stgit/; it's in python!)
and also has Guilt. The messages at
http://fixunix.com/kernel/368500-announce-stacked-git-0-14-2-a.html seem
to indicate that the two tools overlap (as well as the debian
description http://packages.debian.org/unstable/devel/guilt). I haven't
used either tool.

Git also has some very powerful tools in the way of lightweight
branching and rebasing. One thing in a recent release is git rebase
--interactive, which allows you to basically go back and edit a commit
or change the order of commits, thereby providing queue functionality
that is fully integrated with the versioning system (see
http://blog.madism.org/index.php/2007/09/09/138-git-awsome-ness-git-rebase-interactive).
I don't think, in the end, that there is anything we can do with
queues that we can't do with git (possibly using one of the above tools
on top of git). However, I haven't tried (I've only read about it), so
count that opinion as worth the electrons that conveyed it :).

Personally, after getting over the initial learning hump (which I see as
much greater than the mercurial learning hump), I think git would
provide more power.


William, I presume you're looking for something exactly analogous to
svnadmin dump for SVN (see
http://svnbook.red-bean.com/en/1.1/ch05s03.html ). That command came
in very handy for me when I kept things in SVN for a while.

One option for what Williams wants to do is to convert a copy of the hg
repository to a git repository and then do the text dump (apparently
that is possible...I don't have first-hand experience with that). I'm
not sure how lossless the conversion would be, but my gut feeling is
that it would be good.


Jason

William Stein

unread,
Mar 27, 2008, 1:11:20 AM3/27/08
to sage-...@googlegroups.com

Several people suggested asking on the Mercurial list, and we should
do that. There might already be an extension or something to do this.

I really don't like the prospect of say Jason Grout's idea to convert
the whole Sage repo to git and back just to do that. Ick.

Carl Witty said:
> I still don't understand the requirements.

To convert the hg repo to a plain text non-obfuscated format from which
one can recreate the original hg repo.

> Second, are you worried about people checking in viruses, or people
> concealing a virus in the .hg directory without it being checked in?

Both. Yes, I'm worried about people checking viruses.
Yes, I'm also worried about people concealing a virus in the .hg directory
without it being checked in.

> For the former concern, it seems that it would be sufficient to check
> out the files, and you don't need to recreate the repository.

That requires trusting Mercurial, and that there aren't any bugs in
Mercurial that allow one to work around such checks. That isn't a reasonable
hypothesis, unfortunately. Also, the virus could be in an old version
of the repo, so you have to check out that last 9000 or so states of
the repo.

> For the
> latter concern, perhaps something based on "hg verify" would suffice
> to ensure that nothing nasty has been hidden in the repository.

Again, this requires trusting Mercurial, and that nobody found a way
to workaround something like this in Mercurial. That's again not
a reasonable assumption to make.

-- William

mabshoff

unread,
Mar 27, 2008, 4:00:03 AM3/27/08
to sage-devel


On Mar 27, 6:11 am, "William Stein" <wst...@gmail.com> wrote:
It isn't even a virus, any kind of malicious code *could* be hidden in
the repo. AFAIK mercurial doesn't prevent you from adding files in the
repo directories. While to you and me a binary .hg directory isn't
really a concern other people see it differently.

> >  For the
> > latter concern, perhaps something based on "hg verify" would suffice
> > to ensure that nothing nasty has been hidden in the repository.
>
> Again, this requires trusting Mercurial, and that nobody found a way
> to workaround something like this in Mercurial. That's again not
> a reasonable assumption to make.

We should all know by now that software is buggy in general, Sage not
being an exception.

Re mercurial vs. git: I don't buy the complexity argument and it isn't
a secret that I prefer git over mercurial. It is unlikely that we will
switch since mercurial works well enough.

Should we ever switch here are two more arguments for git:

* git handles file permission changes, mercurial doesn't at the
moment
* git handles empty files gracefully, mercurial doesn't at the moment

Cheers,

Michael

>   -- William

Jason Grout

unread,
Mar 27, 2008, 8:08:00 AM3/27/08
to sage-...@googlegroups.com
William Stein wrote:
> On Wed, Mar 26, 2008 at 1:18 PM, root <da...@axiom-developer.org> wrote:
>> William,
>>
>> git can do this. Since git uses a hash it will always regenerate the
>> same hash from the same file.
>>
>> In fact, git uses hashes all the way down the tree so you can just
>> look at the hash code of the root of the tree to see if anything
>> changes. Equal hash codes, even across the net, imply exact copies
>> of the source tree.
>>
>> Axiom uses arch, cvs, svn, and git. I have used several other systems
>> in the past. Now all of the primary work is in git and is export-only
>> to the other systems (git can work with them transparently). git has
>> fundamentally changed the way I work and the way Axiom is maintained,
>> all for the better.
>>
>> I know it is a challenge to change source code systems but the gain
>> is well worth the pain in this case. The fact that git works with
>> legacy humans is a huge plus in minimizing the pain.
>
> We are indeed considering changing to git. The
> repo --> plain text --> original repo


For reference, I believe the relevant git commands are:

git fast-export:
http://www.kernel.org/pub/software/scm/git-core/docs/git-fast-export.html

git fast-import:
http://www.kernel.org/pub/software/scm/git/docs/git-fast-import.html

Of course, correct me if I'm wrong. I've never used either of these.

Jason

didier deshommes

unread,
Mar 27, 2008, 10:36:17 AM3/27/08
to sage-...@googlegroups.com
On Wed, Mar 26, 2008 at 3:21 PM, Mike Hansen <mha...@gmail.com> wrote:
>
> It seems like the mercurial mailing list would be the best place to go for this.

Done: http://selenic.com/pipermail/mercurial/2008-March/018133.html

didier

Carl Witty

unread,
Mar 27, 2008, 11:39:53 AM3/27/08
to sage-devel
On Mar 26, 10:11 pm, "William Stein" <wst...@gmail.com> wrote:
> That requires trusting Mercurial...
...
> Again, this requires trusting Mercurial...

If you don't trust your version control system, the whole exercise
seems futile to me. Unless you're planning to actually read all the
text during step 2?

Also, does this mean you're planning to enforce a requirement that all
files in the repository be readable text? Wouldn't that mean giving
up on #229?

Carl

didier deshommes

unread,
Mar 28, 2008, 10:20:58 AM3/28/08
to sage-...@googlegroups.com, m...@daimi.au.dk
Thanks Martin,
I think the issue is that we want a version of our repository that has no binary data in it for transparency. The virus part is just a  possible scenario that has been blown out of proportion because of the way I asked the question, since I didn't understand it well enough myself :)

didier

Forwarded conversation
Subject: [nor...@googlegroups.com] Posting error: sage-devel
------------------------

From: Martin Geisler <m...@daimi.au.dk>
Date: Fri, Mar 28, 2008 at 5:47 AM
To: dfde...@gmail.com


Hi,

I tried to post the following message to the SAGE group to participate
in the discussion about Mercurial. But I apparently have to register
first -- could you instead forward it?

> "William Stein" <wst...@gmail.com> writes:

>
> > Carl Witty said:
> >> Second, are you worried about people checking in viruses, or
> >> people concealing a virus in the .hg directory without it being
> >> checked in?
> >
> > Both. Yes, I'm worried about people checking viruses. Yes, I'm
> > also worried about people concealing a virus in the .hg directory
> > without it being checked in.
>
> No matter what files I put in the .hg directory in my clone, they
> wont be copied to other clones via 'hg push' and 'hg pull'. So I
> don't see why you are afraid that I might put a virus there.
>
> The only way I could inject a virus into somebone elses Mercurial
> repository (without having direct write access to it) is to commit
> it and convince the other party to 'hg pull' from me.
>
> I think that checking that people do not commit stupid things (build
> products, virusses, etc) is more of a social problem. And still: if
> they do commit something bad, then (assuming you are using an OS
> that wont randomly execute files on your harddisk...) you can safely
> pull the changes since you can always strip them away again if you
> want.

--
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.
----------
From: Martin Geisler <m...@daimi.au.dk>
Date: Fri, Mar 28, 2008 at 8:14 AM
To: dfde...@gmail.com



The following message is a courtesy copy of an article that has been
posted to gmane.comp.version-control.mercurial.general as well.

didier deshommes <dfde...@gmail.com> writes:

> Hi everyone,
> Sage (http://www.sagemath.org/) uses hg for its source control and
> recently a question has come up about the possibility of doing the

> following:
>
>  (1) export everything in the .hg repo to something (perhaps a ton of
> stuff) in plain text format,
>  (2) delete .hg/ directory
>  (3) do something that recovers the .hg/ directory from the output of (1).

From reading the messages in this thread I gather that you want the
plain text format to be able to inspect the files and make sure that
they have not been changed by a virus?

It is not necessary to have the repository contents in plain text to
do that -- all you need is to sign a trusted revision number with a
GnuPG key. You can then later verify the integrity of the repository.

The gpg Mercurial extension makes this (already easy step) even
easier: http://www.selenic.com/mercurial/wiki/index.cgi/GpgExtension

The point is that the revision number (the hexadecimal string printed
using, say, 'hg id') depends on *everything* in the repository. So it
is impossible for a virus to change any meta-data without also
disturbing the hash value.

You can therefore easily trust a repository given to you by a
stranger, as long as you verify the integrity (with 'hg verify') and
check that the revision of the repository is trusted.

If the tip-most revision is unknown to you, then you can always strip
the unknown revisions away using 'hg strip' and then start from a last
known good revision.

And please note that this property is not unique to Mercurial: all the
other modern revision control systems use the same technique to make
it easy to verify the integrity of a repository.

Reply all
Reply to author
Forward
0 new messages