Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A Huge Reason to Reevaluate Mercurial

7 views
Skip to first unread message

Sergey Yanovich

unread,
Aug 31, 2007, 6:15:13 PM8/31/07
to
Taras has recently upload a patch to bmo to remove outparam in QI. After
I pulled his repos from hg.mozilla.org, I was not even able to run
'./configure'.

I was using a latest Mercurial 0.9.4, when I reverted to 0.9.1
everything worked fine. Upon investigation, I found that Taras used hg
merging in those repos.

Contrary to hg advertising, hg revision turns out *NOT* to be immutable.
Different versions of hg treat the same of ChangeSets differently.

This is an inherent flow rooting in delta model. With n ChangeSets and m
algorithms to apply them, there is m^n variants (^ means power). It is
perfectly fine as long as there is only *one* algorithm. But when merges
come in, things really become complicated.

Of course, there is a easy solution: to store intermediate states. And
this is exactly how *git* works. Except from this stability, git may
bring additional benefits to Mozilla. I would quote Linus Torvalds here
from http://www.gelato.unsw.edu.au/archives/git/0505/2784.html

> Note that we discussed this early on, and the issues with full-file
> handling haven't changed. It does actually have real functional
> advantages:
>
> - you can share the objects freely between different trees, never
> worrying about one tree corrupting another trees object by mistake.
> - you can drop old objects.
>
> delta models very fundamentally don't support this.

The first point in the list is the top priority that Brendan Eich
mentioned for the mozilla2 SCM system in March 2007:

> [sic] we anticipate many experimental branches (for Tamarin work,
> Oink/Elsa-based refactoring, etc.) and related integration branches
> feeding back into the main Mozilla 2 branch.

In fact, that is exactly the process, they develop linux kernel. Mozilla
will only need an equivalent of '-mm' kernel branch to complete the picture.

As a disclaimer I add that I am not counter-mercurial. I have svn, hg
and git repositories set up for my project to facilitate cooperation.
But I do personally prefer git to manage my working tree since its
stability gives me a feeling of protection.

--
Sergey Yanovich
- abstract ERP leader

Robert Kaiser

unread,
Aug 31, 2007, 9:11:37 PM8/31/07
to
Sergey Yanovich wrote:
> Of course, there is a easy solution: to store intermediate states. And
> this is exactly how *git* works. Except from this stability, git may
> bring additional benefits to Mozilla.

But, and this is the major downside, it still has no stable and
well-tested release for Windows, despite good efforts on msysgit etc. -
and this is a killer argument in the Mozilla project, as the majority of
our userbase is on Windows and a lots of our developers are as well.

Note that I'm a big fan of git and would love nothing more in the VCS
area than Mozilla using git, but we need to see an officially endorsed,
performant and reliable git for Windows before even thinking about that.

Also note that I'm just a contributor, nobody who makes decisions about
such a repository, so it's not me who needs to be convinced :)

Robert Kaiser

Message has been deleted

ynv...@gmail.com

unread,
Sep 1, 2007, 4:54:28 AM9/1/07
to
> But, and this is the major downside, it still has no stable and
> well-tested release for Windows, despite good efforts on msysgit etc. -
> and this is a killer argument in the Mozilla project, as the majority of
> our userbase is on Windows and a lots of our developers are as well.

Like hedge funds are called elephants in the financial markets,
Mozilla is an open source elephant. :) Linux kernel is definitely
another one of the kind. There is no obstacle to a running herd of
elephants in savanna.

My opinion is that git will be beneficial to Mozilla over hg in the
long run. Of course, "given enough eyeballs, all bug are shallow" and
Mercurial will soon patch the concrete bug I exposed.

However, this bug reveals an inherent deficiency in Mercurial design.
Mercurial can be trusted *ONLY* on a linear branch. Each merge is a
critical point at which hg may or may not fail. But merging is why a
mozilla2 VCS issue is raised in the first place.

> Also note that I'm just a contributor, nobody who makes decisions about
> such a repository, so it's not me who needs to be convinced :)

That is the aim of this post :)

If there is a consensus on git fitting Mozilla needs better, git
people can be approached. Git has a near perfect unit test suit (I
know since I send patches there). It gives unparalleled freedom for
project management. My first idea is to migrate git to nspr.

That will require good amount of labor on both sides, but serious
mutual benefits will arise also:
* Mozilla community will be able to *safely* use advance merging.
* nspr will receive an agile user base outside mozilla which will
overlap with the core team of linux kernel.
* git will receive a second "national account" sized client becoming
an unmatched leader in open-source VCS.

This kind of deals are not easy, but potential result worth efforts to
broker.

Benoit Boissinot

unread,
Sep 1, 2007, 6:36:26 AM9/1/07
to
On Sep 1, 12:15 am, Sergey Yanovich <ynv...@gmail.com> wrote:
> Contrary to hg advertising, hg revision turns out *NOT* to be immutable.
> Different versions of hg treat the same of ChangeSets differently.
>
where named-branches involded ? did you check that the revision you
checkout have the same changesetid (using hg id) without local changes
(no '+' in hg id)

> This is an inherent flow rooting in delta model. With n ChangeSets and m
> algorithms to apply them, there is m^n variants (^ means power). It is
> perfectly fine as long as there is only *one* algorithm. But when merges
> come in, things really become complicated.
>
> Of course, there is a easy solution: to store intermediate states. And
> this is exactly how *git* works.

Your just plain wrong as to how mercurial works. Of course it store
intermediate
states, it does even more checking than git since the hash depends on
the
content and the ancestry information. You can read
http://www.selenic.com/mercurial/wiki/index.cgi/Design
for details.

> Except from this stability, git may
> bring additional benefits to Mozilla. I would quote Linus Torvalds here
> from http://www.gelato.unsw.edu.au/archives/git/0505/2784.html
>
> > Note that we discussed this early on, and the issues with full-file
> > handling haven't changed. It does actually have real functional
> > advantages:
>
> > - you can share the objects freely between different trees, never
> > worrying about one tree corrupting another trees object by mistake.
> > - you can drop old objects.
>
> > delta models very fundamentally don't support this.
>

Oh funny, isn't git packing using delta ? You're quoting Linus from
the time where git didn't have any repo compression method.

> But I do personally prefer git to manage my working tree since its
> stability gives me a feeling of protection.

I doubt you'll like using a repository as big as the kernel or mozilla
without using
packs (you want to avoid storing delta, right ?), or you really have
lots of free space.


regards,

Benoit

Message has been deleted

Sergey Yanovich

unread,
Sep 2, 2007, 5:33:23 AM9/2/07
to
Benoit Boissinot wrote:
> where named-branches involded ? did you check that the revision you
> checkout have the same changesetid (using hg id) without local changes
> (no '+' in hg id)

No, to my knowledge. Each scientific experiment must yield the same
result when repeated, or it is not scientific. Anyone interested to
repeat is welcome:

wget http://people.mozilla.org/~tglek/checkout.sh
chmod ug+x checkout.sh
./checkout.sh http://hg.mozilla.org
cd oink-stack
./configure

My results:
* hg v0.9.4:
weird error reposts about missing files.
* hg v0.9.1:
stack configured successfully.

>> This is an inherent flow rooting in delta model. With n ChangeSets and m
>> algorithms to apply them, there is m^n variants (^ means power). It is
>> perfectly fine as long as there is only *one* algorithm. But when merges
>> come in, things really become complicated.
>>
>> Of course, there is a easy solution: to store intermediate states. And
>> this is exactly how *git* works.
>
> Your just plain wrong as to how mercurial works. Of course it store
> intermediate
> states, it does even more checking than git since the hash depends on
> the
> content and the ancestry information. You can read
> http://www.selenic.com/mercurial/wiki/index.cgi/Design
> for details.

Well, my statement above is about delta model in general. I don't know
much about how mercurial works in details. Neither it is explained in
this wiki article. My knowledge is from the thread I quoted Linus
Torvalds from. Mercurial's manifest files seem to have a SHA1 of the
*delta* *path* to construct the blob, not SHA1 of the blob *content*.
And the delta path obviously relies on the correct algorithm being
applied on every step.

> Oh funny, isn't git packing using delta ? You're quoting Linus from
> the time where git didn't have any repo compression method.

Right. Git pack is using deltas. But they are deltas of repository
objects, as different from source tree files. This way git limits
itself to *one* algorithm of creating/resolving deltas, which perfectly
safe. And SHA1 of each object is validated before the object is ever
passed to a client.

--
Sergey Yanovich

Benoit Boissinot

unread,
Sep 2, 2007, 7:19:14 AM9/2/07
to
On Sep 2, 11:33 am, Sergey Yanovich <ynv...@gmail.com> wrote:
> Benoit Boissinot wrote:
> > where named-branches involded ? did you check that the revision you
> > checkout have the same changesetid (using hg id) without local changes
> > (no '+' in hg id)
>
> No, to my knowledge.

tonfa@minoglio:/tmp/oink-stack.new/elkhound$ hg branches
trunk 988:2e04437ff47b
default 987:5a79ef66de7e (inactive)

So there are a couple of repos who have a default branch, and hg up
will
go to the tip of this branch by default.
You should either use hg up tip or hg up trunk in your script.

(the guy who commited 987 managed to commit a cset rooted to nullid
and
ignored the warning during push and created a new branch name)

There is an open ticket in hg bugtracker to make completely closing a
dead
branch possible.

> > Your just plain wrong as to how mercurial works. Of course it store
> > intermediate
> > states, it does even more checking than git since the hash depends on
> > the
> > content and the ancestry information. You can read
> >http://www.selenic.com/mercurial/wiki/index.cgi/Design
> > for details.
>
> Well, my statement above is about delta model in general. I don't know
> much about how mercurial works in details. Neither it is explained in
> this wiki article.

First sentence in the above wiki article:
> Nodeids are unique ids that represent the contents of a file and its
> position in the project history.

So it *is* the SHA of the blog content (plus ancestry information).

> My knowledge is from the thread I quoted Linus
> Torvalds from. Mercurial's manifest files seem to have a SHA1 of the
> *delta* *path* to construct the blob, not SHA1 of the blob *content*.

Not true, see above.

> > Oh funny, isn't git packing using delta ? You're quoting Linus from
> > the time where git didn't have any repo compression method.
>
> Right. Git pack is using deltas. But they are deltas of repository
> objects, as different from source tree files.

A git object represent the state of a file at a particular point,
so what is the difference from the content of a file ?

> This way git limits
> itself to *one* algorithm of creating/resolving deltas, which perfectly
> safe.

I don't see your point, anyway the diff algorithm did not change since
the first
self-hosted version of hg (may '05).

> And SHA1 of each object is validated before the object is ever
> passed to a client.
>

Same as hg, the hash is validated during network transfer and
checkout.

Benoit

Sergey Yanovich

unread,
Sep 2, 2007, 8:24:21 AM9/2/07
to
Benoit Boissinot wrote:
> tonfa@minoglio:/tmp/oink-stack.new/elkhound$ hg branches
> trunk 988:2e04437ff47b
> default 987:5a79ef66de7e (inactive)
>

From now on, I assume that the experiment is repeatable.

>> Well, my statement above is about delta model in general. I don't know
>> much about how mercurial works in details. Neither it is explained in
>> this wiki article.
>
> First sentence in the above wiki article:
>> Nodeids are unique ids that represent the contents of a file and its
>> position in the project history.
>
> So it *is* the SHA of the blog content (plus ancestry information).
>
>> My knowledge is from the thread I quoted Linus
>> Torvalds from. Mercurial's manifest files seem to have a SHA1 of the
>> *delta* *path* to construct the blob, not SHA1 of the blob *content*.
>
> Not true, see above.

This is the statement of what it is supposed to be, as different from
how exactly it is achieved. With a single delta application algorithm
the delta path equals blob content (1^n = 1). The situation is
completely different with branching/merging.

By details I mean relevant code snippets, which can explain why
different versions of hg produce a different working tree for the same
2e04437ff47b commit.

> A git object represent the state of a file at a particular point,
> so what is the difference from the content of a file ?
>> This way git limits
>> itself to *one* algorithm of creating/resolving deltas, which perfectly
>> safe.

Git treats different versions of the same file as different objects. Git
objects are unambiguously represented by their name (SHA1). Git pack is
roughly the same as a *linear* branch hg repository where no file is
ever overwritten or deleted, only copied or added. So there is exactly
one diff for one file, and each file is either new or has exactly one
ancestor. So there is *one* algorithm for creating/resolving deltas, and
the result of each delta path is unique (1^n = 1).

> I don't see your point, anyway the diff algorithm did not change since
> the first
> self-hosted version of hg (may '05).

That is what I am talking about all the time. Mercurial is safe on a
linear branch, but when there are more available algorithms, it is not.

>> And SHA1 of each object is validated before the object is ever
>> passed to a client.
>>
>
> Same as hg, the hash is validated during network transfer and
> checkout.

I would say that hg validates deltas against hashes, from which objects
are later constructed, as different from validating constructed objects
against their hashes (as git does). You can prove me wrong, of course,
with a relevant code snippet.

--
Sergey Yanovich

Benoit Boissinot

unread,
Sep 2, 2007, 8:35:00 AM9/2/07
to
[I'm a mercurial dev, I should probably have make it clear in my first
post]

On Sep 2, 2:24 pm, Sergey Yanovich <ynv...@gmail.com> wrote:
> Benoit Boissinot wrote:
> > tonfa@minoglio:/tmp/oink-stack.new/elkhound$ hg branches
> > trunk 988:2e04437ff47b
> > default 987:5a79ef66de7e (inactive)
>
> From now on, I assume that the experiment is repeatable.
>

no, not in the sense you use below.

0.9.1 will checkout 2e04437ff47b whereas 0.9.4 will checkout
5a79ef66de7e
(the last cset of the default branch)

> >> Well, my statement above is about delta model in general. I don't know
> >> much about how mercurial works in details. Neither it is explained in
> >> this wiki article.
>

> [snip]


> By details I mean relevant code snippets, which can explain why
> different versions of hg produce a different working tree for the same
> 2e04437ff47b commit.
>

Both tree are not in the same commit (just run hg parents in both of
them, you'll see).

[snip wrong assumptions about mercurial]


>
> I would say that hg validates deltas against hashes, from which objects
> are later constructed, as different from validating constructed objects
> against their hashes (as git does). You can prove me wrong, of course,
> with a relevant code snippet.
>

from mercurial/revlog.py:
def hash(text, p1, p2):
"""generate a hash from the given text and its parent hashes

This hash combines both the current file contents and its history
in a manner that makes it easy to distinguish nodes with the same
content in the revision graph.
"""
l = [p1, p2]
l.sort()
s = _sha(l[0])
s.update(l[1])
s.update(text)
return s.digest()

And if you don't believe me you can check in the code that the text
argument is never a delta
(see _addrevision())

Benoit

Sergey Yanovich

unread,
Sep 2, 2007, 10:18:26 AM9/2/07
to
Benoit Boissinot wrote:
>> From now on, I assume that the experiment is repeatable.
>>
> no, not in the sense you use below.
>
> 0.9.1 will checkout 2e04437ff47b whereas 0.9.4 will checkout
> 5a79ef66de7e
> (the last cset of the default branch)
>
> Both tree are not in the same commit (just run hg parents in both of
> them, you'll see).

That explains everything. I was using hg tip to determine the current
revision.

> from mercurial/revlog.py:
> def hash(text, p1, p2):
> """generate a hash from the given text and its parent hashes
>
> This hash combines both the current file contents and its history
> in a manner that makes it easy to distinguish nodes with the same
> content in the revision graph.
> """
> l = [p1, p2]
> l.sort()
> s = _sha(l[0])
> s.update(l[1])
> s.update(text)
> return s.digest()
>
> And if you don't believe me you can check in the code that the text
> argument is never a delta
> (see _addrevision())

Well, unless there is another hash function, I would sooner believe that
text here is *always* a delta, since Mercurial needs to hash deltas anyway.

However, unless there is another hash function, Mercurial now seems to
have only *one* algorithm to deal with deltas for both linear and merge
cases. I assume p2 is null in a linear case, and p1 and p2 are null for
a new blob. And that effectively means Mercurial delta path equals blob
content (1^n = 1).

Now, I ADMIT that my assumption about Mercurial untrustworthiness was
wrong. There was a simple change in default behavior between v0.9.1 and
v0.9.4 of hg.

There is NO huge reason to reevaluate hg. Thanks for your time, Benoit.

--
Sergey Yanovich

0 new messages