Mercurial can have multiple commits with identical content

41 views
Skip to first unread message

Jed Brown

unread,
Mar 6, 2013, 9:34:14 PM3/6/13
to giti...@googlegroups.com
I had a scare today that I thought was caused by gitifyhg, but turned
out to be a consequence of Mercurial's binary diff representation not
being a unique function of the changes. The format allows files to be
named (thus part of the SHA1) even though they are unaffected by the
commit. It's an interesting story, so I thought I'd share.

If you look at this commit, you'll see that ten files are named, but
have zero modifications (search for 'ftn-kernels' for a few).

https://bitbucket.org/petsc/petsc-dev/commits/de73c9a7d341 (A)

Meanwhile, here is a semantically identical commit:

https://bitbucket.org/petsc/petsc-dev/commits/84df07d03c6e (B)

When imported using gitifyhg, these have the same SHA1, thus Git
considers them to be a single commit. How was the "bad" Hg commit
created?

Fortunately, it has nothing to do with gitifyhg. Barry created a fork,
did some work on it, then merged from upstream to create the bad commit
here:

https://bitbucket.org/BarryFSmith/petsc-dev-simp/commits/de73c9a7d341 (A)

He uses vanilla Hg without extensions and without rebase, etc. I later
did more work on his branch, merged it, and pushed upstream. When I
pushed upstream, gitifyhg made a new SHA1 (B), using the non-corrupt
representation of the diff (so it does not name files that are
unaffected by the commit). Of course the descendants of B also got new
Hg SHA1s because their parent SHA1 was different. At this point the
upstream repository (petsc/petsc-dev) was clean, with no duplicate
commits.

When Barry later pulled, he had two (identical except for SHA1) heads
that he merged without debugging why and pushed directly to
petsc/petsc-dev, leading to the current situation where we have a series
of commits that are semantically identical, but are distinct according
to Mercurial. Note that they are _not_ distinct according to Git
(gitifyhg) because the content is identical.


Obviously I think it is totally fucked that Hg can create commits with
multiple representations of the same diff (naming some files with zero
changes), leading to multiple semantically identical commits with
different SHA1s.

Dusty Phillips

unread,
Mar 7, 2013, 11:31:37 AM3/7/13
to giti...@googlegroups.com
Great story, Jed. I think you should name it "The Tragedy Of Errors". Is
there anything we need to (or can) do to account for or work around this
idiocy, or is gitifyhg able to handle it without breakage?

Here's another example of Mercurial's indecency:

$ hg blame .hgtags | grep 0.1.0b27

402: f424dc658073b1f003842ad85c01f4e95e3ae706 0.1.0b27
403: f424dc658073b1f003842ad85c01f4e95e3ae706 0.1.0b27
403: 0000000000000000000000000000000000000000 0.1.0b27
405: 0000000000000000000000000000000000000000 0.1.0b27
405: 10573b83494ae0c697c612a51109b50f62e3fdbd 0.1.0b27

$ hg tags | grep 0.1.0b27
0.1.0b27 404:10573b83494a

I thought gitifyhg was adding "old" tags until I realized these were old
revisions, and somebody else had managed to create this mess before
gitifyhg was written.

Dusty

Jed Brown

unread,
Mar 7, 2013, 12:14:43 PM3/7/13
to Dusty Phillips, giti...@googlegroups.com
Dusty Phillips <du...@archlinux.ca> writes:

> Great story, Jed. I think you should name it "The Tragedy Of Errors". Is
> there anything we need to (or can) do to account for or work around this
> idiocy, or is gitifyhg able to handle it without breakage?

So a colleague seems to be arguing that the original commit is not
"corrupt", in that mercurial does not preclude those files from being
named. Hg import has an option '--exact':

If --exact is specified, import will set the working directory to
the parent of each patch before applying it, and will abort if the
resulting changeset has a different ID than the one recorded in
the patch. This may happen due to character set problems or other
deficiencies in the text patch format.

If 'tip' is the merge commit, I can

$ hg export -r tip > merge.patch
$ hg strip tip
$ hg up FIRST_PARENT
$ hg import merge.patch
applying merge.patch
no rollback information available
abort: patch is damaged or loses information

There appears to be no way to 'hg export' this patch so that 'hg import
--exact' can be used, and it's not related to character sets. I can 'hg
import merge.patch', but the result no longer names those files that
were unchanged (because that information is not in the 'hg export').

I don't think there is anything gitifyhg could possibly do to have the
same semantics as an hg client, other than insanity like storing hg
bundles in git metadata so that we reuse it when pushing to a different
repository (and so that we can distinguish between patches with
identical content+metadata, but different side-band information).

> Here's another example of Mercurial's indecency:
>
> $ hg blame .hgtags | grep 0.1.0b27
>
> 402: f424dc658073b1f003842ad85c01f4e95e3ae706 0.1.0b27
> 403: f424dc658073b1f003842ad85c01f4e95e3ae706 0.1.0b27
> 403: 0000000000000000000000000000000000000000 0.1.0b27
> 405: 0000000000000000000000000000000000000000 0.1.0b27
> 405: 10573b83494ae0c697c612a51109b50f62e3fdbd 0.1.0b27
>
> $ hg tags | grep 0.1.0b27
> 0.1.0b27 404:10573b83494a
>
> I thought gitifyhg was adding "old" tags until I realized these were old
> revisions, and somebody else had managed to create this mess before
> gitifyhg was written.

Oh the joys of ad-hoc non-normalized data representation.

This simple statement may characterize 90% of why we so strongly prefer
working with Git. Mercurial is a quagmire of ad-hoc non-normalization,
which I think accounts for their perpetual conceptual wandering
regarding branching (mq, branches, bookmarks, evolution), poor
performance (and extreme unsafeness) for some operations (rebase,
strip), and opportunities for inconsistencies like we have observed in
this thread.

Note that a bug in Git could never have produced a commit that behaves
like I observed, nor the tag nonsense that you observed.

Jed Brown

unread,
Mar 9, 2013, 2:40:15 PM3/9/13
to Dusty Phillips, giti...@googlegroups.com
Jed Brown <j...@59A2.org> writes:

> If --exact is specified, import will set the working directory to
> the parent of each patch before applying it, and will abort if the
> resulting changeset has a different ID than the one recorded in
> the patch. This may happen due to character set problems or other
> deficiencies in the text patch format.

Along these lines, it's also good to know that different versions of
Mercurial may not be able to recreate the same commit IDs.

http://selenic.com/pipermail/mercurial/2011-June/038828.html
Reply all
Reply to author
Forward
0 new messages