Does anybody have a clue if it is possible to take a directory (e.g.,
devel/sage/) with an .hg repo directory
in it, and do the following:
(1) export everything in the .hg repo to something (perhaps a ton of
stuff) in plain text format,
(2) delete .hg
(3) do something that recovers the .hg directory from the output of (1).
Note that just doing hg_sage.export([0..10000]), where say 10000 is
the tip, doesn't work, because
that looses all information about branching, etc., hence fails completely.
If mercurial can't do the above, that is a _very_ serious problem for
the longterm viability of
Mercurial at least for Sage. So any ideas how to do the above?
The reason for doing (1) -- (3) is that it is possible to scan an .hg
directory with antivirus tools.
Thus something silly like "base64-encode a tarball of .hg" won't work.
-- william
--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org
git can do this. Since git uses a hash it will always regenerate the
same hash from the same file.
In fact, git uses hashes all the way down the tree so you can just
look at the hash code of the root of the tree to see if anything
changes. Equal hash codes, even across the net, imply exact copies
of the source tree.
Axiom uses arch, cvs, svn, and git. I have used several other systems
in the past. Now all of the primary work is in git and is export-only
to the other systems (git can work with them transparently). git has
fundamentally changed the way I work and the way Axiom is maintained,
all for the better.
I know it is a challenge to change source code systems but the gain
is well worth the pain in this case. The fact that git works with
legacy humans is a huge plus in minimizing the pain.
Tim Daly
Axiom
We are indeed considering changing to git. The
repo --> plain text --> original repo
problem is a show stopper -- i.e., if Mercurial absolutely can't
do that, then we have no choice but to dump mercurial for git.
I hope Mercurial can do that though, since we've spent a lot
of time getting going with Mercurial, and it works fairly well.
-- William
Using queues has made me quite a bit more productive, and I'd like to
avoid switching to a version control system without them. Also, the
git documentation leaves something to be desired compared to the
Mercurial book.
--Mike
>Using queues has made me quite a bit more productive, and I'd like to
>avoid switching to a version control system without them. Also, the
>git documentation leaves something to be desired compared to the
>Mercurial book.
The queues feature in Mercurial is available independently in the
quilt system. Mercurial makes this point:
<http://hgbook.red-bean.com/hgbookch12.html>
Tim Daly
Axiom
There are things with queues that you don't get with quilt. Quoting
from the book,
"As an example, the integration of patches with revision control makes
understanding patches and debugging their effects—and their interplay
with the code they're based on—enormously easier. Since every applied
patch has an associated changeset, you can use "hg log filename" to
see which changesets and patches affected a file. You can use the
bisect extension to binary-search through all changesets and applied
patches to see where a bug got introduced or fixed. You can use the
"hg annotate" command to see which changeset or patch modified a
particular line of a source file. And so on."
--Mike
Sorry, I didn't mean to start a debate about the relative feature
set of the various systems. The git system can do what you want
(e.g. bisect, branch, etc.). Having gone thru the series of changes
cvs->arch->svn->git I understand why you would be reluctant to change.
The original point was the question of regenerating the information.
Since git uses hashing to guarantee uniqueness you know that any
root (or subtree) with equal hashs has the same code. This makes it
impossible to inject a virus. It also makes it very convenient to
recreate the exact sources used by a user reporting a bug since you
simply "undo the changes until the root is equal" and you have the
exact sources reporting the bug. Given the high change rate of Sage
this could be a major feature.
Another point worth mentioning is that git only stores one copy of a
file if they hash to the same value. Since the GMP library, BLAS,
LAPACK or other common libraries might show up in several spkgs there
is the potential for a significant reduction in storage space. I also
noticed that arch and svn seem to keep a second copy of the system
somewhere (not sure what hg does) but moving to git immediately
reduced the required disk space by a factor of 2. For a system as
large as Sage this might prove interesting. If I get the time I
can try to import Sage into git to quantify the gain.
Tim Daly
Axiom
> Hi Jason (or anybody),
>
> Does anybody have a clue if it is possible to take a directory (e.g.,
> devel/sage/) with an .hg repo directory
> in it, and do the following:
>
> (1) export everything in the .hg repo to something (perhaps a ton of
> stuff) in plain text format,
> (2) delete .hg
> (3) do something that recovers the .hg directory from the output
> of (1).
>
> Note that just doing hg_sage.export([0..10000]), where say 10000 is
> the tip, doesn't work, because
> that looses all information about branching, etc., hence fails
> completely.
>
> If mercurial can't do the above, that is a _very_ serious problem for
> the longterm viability of
> Mercurial at least for Sage. So any ideas how to do the above?
>
> The reason for doing (1) -- (3) is that it is possible to scan an .hg
> directory with antivirus tools.
> Thus something silly like "base64-encode a tarball of .hg" won't work.
I think it should be possible to export a series of patches and a
(python) script that would apply the patches in the right order,
clone, and merge to get back the original repository. It might not be
the most efficient however. I'll look into this more.
Has anyone tried contacting the mercurial developers?
- Robert
I was talking about something more sophisticated than export/import,
which won't work the instant one has multiple branches. One needs to
actually create multiple heads, apply patches, then resolve them. Hg
export doesn't have enough information to do this.
- Robert
Not at the moment, but I've mucked around with mercurial more than
most so I don't think it should be too hard once I start looking into
it.
- Robert
I use git to manage my personal/professional file repository. To me,
Mercurial is much simpler, but git is more powerful and feels more
stable. I don't have a huge amount of experience with git, though; I
keep forgetting the commands to do things, so I keep putting off
checking things in and working on things in git :). Thank goodness for
the git-gui, gitk, and qgit tools that give graphical interfaces to a
git repository!
As to queues, of course, the concept and original software originated
with the linux development model, as far as I know. Git has a tool
called StGit ("Stacked Git"; http://procode.org/stgit/; it's in python!)
and also has Guilt. The messages at
http://fixunix.com/kernel/368500-announce-stacked-git-0-14-2-a.html seem
to indicate that the two tools overlap (as well as the debian
description http://packages.debian.org/unstable/devel/guilt). I haven't
used either tool.
Git also has some very powerful tools in the way of lightweight
branching and rebasing. One thing in a recent release is git rebase
--interactive, which allows you to basically go back and edit a commit
or change the order of commits, thereby providing queue functionality
that is fully integrated with the versioning system (see
http://blog.madism.org/index.php/2007/09/09/138-git-awsome-ness-git-rebase-interactive).
I don't think, in the end, that there is anything we can do with
queues that we can't do with git (possibly using one of the above tools
on top of git). However, I haven't tried (I've only read about it), so
count that opinion as worth the electrons that conveyed it :).
Personally, after getting over the initial learning hump (which I see as
much greater than the mercurial learning hump), I think git would
provide more power.
William, I presume you're looking for something exactly analogous to
svnadmin dump for SVN (see
http://svnbook.red-bean.com/en/1.1/ch05s03.html ). That command came
in very handy for me when I kept things in SVN for a while.
One option for what Williams wants to do is to convert a copy of the hg
repository to a git repository and then do the text dump (apparently
that is possible...I don't have first-hand experience with that). I'm
not sure how lossless the conversion would be, but my gut feeling is
that it would be good.
Jason
Several people suggested asking on the Mercurial list, and we should
do that. There might already be an extension or something to do this.
I really don't like the prospect of say Jason Grout's idea to convert
the whole Sage repo to git and back just to do that. Ick.
Carl Witty said:
> I still don't understand the requirements.
To convert the hg repo to a plain text non-obfuscated format from which
one can recreate the original hg repo.
> Second, are you worried about people checking in viruses, or people
> concealing a virus in the .hg directory without it being checked in?
Both. Yes, I'm worried about people checking viruses.
Yes, I'm also worried about people concealing a virus in the .hg directory
without it being checked in.
> For the former concern, it seems that it would be sufficient to check
> out the files, and you don't need to recreate the repository.
That requires trusting Mercurial, and that there aren't any bugs in
Mercurial that allow one to work around such checks. That isn't a reasonable
hypothesis, unfortunately. Also, the virus could be in an old version
of the repo, so you have to check out that last 9000 or so states of
the repo.
> For the
> latter concern, perhaps something based on "hg verify" would suffice
> to ensure that nothing nasty has been hidden in the repository.
Again, this requires trusting Mercurial, and that nobody found a way
to workaround something like this in Mercurial. That's again not
a reasonable assumption to make.
-- William
For reference, I believe the relevant git commands are:
git fast-export:
http://www.kernel.org/pub/software/scm/git-core/docs/git-fast-export.html
git fast-import:
http://www.kernel.org/pub/software/scm/git/docs/git-fast-import.html
Of course, correct me if I'm wrong. I've never used either of these.
Jason
Done: http://selenic.com/pipermail/mercurial/2008-March/018133.html
didier