Adding merge --ancestor option?

50 views
Skip to first unread message

Greg Ward

unread,
Mar 13, 2012, 10:03:17 AM3/13/12
to mercuri...@selenic.com
Hi all --

at work, we regularly get bitten by ambiguous merge ancestors: if two
common ancestors have the same distance from the root and both are the
maximum distance, ancestor() picks the first one it sees. Sometimes it
picks wrong and the user gets stuck resolving a conflict that 1) is
not of his making and 2) has already been resolved by someone else.

One workaround for this annoyance is to add an --ancestor option to
the merge command. Martin posted an implementation of this on
stackoverflow recently, and we're going to steal his code and add it
our local "hacks" extension. (Thanks Martin!)

But I am inclined to send in a post that adds --ancestor for everyone.
Yes, it's a workaround rather than a proper fix. But if other people
are being bitten by this, it could help them too.

As for a proper fix: we're thinking of hacking up
mercurial/ancestor.py so ancestors() returns a list, making the caller
is responsible for picking one -- and possibly telling the user,
"ambiguous merge ancestor: if this merge goes horribly wrong, try
again with --ancestor". This code is not terribly extension-friendly,
so we'd have to carry this as a local patch until we can get it
accepted... so what do you think of that idea?

Greg
--
Greg Ward http://www.gerg.ca/
Yield to temptation; it may not pass your way again.
_______________________________________________
Mercurial-devel mailing list
Mercuri...@selenic.com
http://selenic.com/mailman/listinfo/mercurial-devel

Matt Mackall

unread,
Mar 13, 2012, 2:31:44 PM3/13/12
to Greg Ward, mercuri...@selenic.com
On Tue, 2012-03-13 at 10:03 -0400, Greg Ward wrote:
> Hi all --
>
> at work, we regularly get bitten by ambiguous merge ancestors: if two
> common ancestors have the same distance from the root and both are the
> maximum distance, ancestor() picks the first one it sees. Sometimes it
> picks wrong and the user gets stuck resolving a conflict that 1) is
> not of his making and 2) has already been resolved by someone else.

Hmm, I might want some data from you.. I've been recently formulating
some ideas about how to be smarter (or at least more consistent) when
picking the merge ancestor.

Does your workflow regularly involve backouts or bidirectional merging
between branches?

> One workaround for this annoyance is to add an --ancestor option to
> the merge command. Martin posted an implementation of this on
> stackoverflow recently, and we're going to steal his code and add it
> our local "hacks" extension. (Thanks Martin!)

Link?

> But I am inclined to send in a post that adds --ancestor for everyone.
> Yes, it's a workaround rather than a proper fix. But if other people
> are being bitten by this, it could help them too.

I'm about +0 on this. Upsides: you can address cases like this and
manually do graft-like operations. Downsides: you can do extremely wrong
merges with very little effort that will royally and permanently screw
up your history.

> As for a proper fix: we're thinking of hacking up
> mercurial/ancestor.py so ancestors() returns a list, making the caller
> is responsible for picking one -- and possibly telling the user,
> "ambiguous merge ancestor: if this merge goes horribly wrong, try
> again with --ancestor". This code is not terribly extension-friendly,
> so we'd have to carry this as a local patch until we can get it
> accepted... so what do you think of that idea?

There's way too much code that assumes one ancestor. So we'd need a
second interface.

--
Mathematics is the supreme nostalgia of our time.

Na'Tosha Bard

unread,
Mar 13, 2012, 2:36:36 PM3/13/12
to mercuri...@selenic.com
2012/3/13 Greg Ward <gr...@gerg.ca>
Hi all --

at work, we regularly get bitten by ambiguous merge ancestors: if two
common ancestors have the same distance from the root and both are the
maximum distance, ancestor() picks the first one it sees. Sometimes it
picks wrong and the user gets stuck resolving a conflict that 1) is
not of his making and 2) has already been resolved by someone else.

Interesting.  I've noticed this happening to us sometimes, too.  I suspect it could be from the same cause.

--
Na'Tosha Bard
Build & Infrastructure Developer | Unity Technologies - Copenhagen

Skype: natosha.bard

Martin Geisler

unread,
Mar 13, 2012, 6:43:23 PM3/13/12
to Matt Mackall, mercuri...@selenic.com, Greg Ward
Matt Mackall <m...@selenic.com> writes:

> On Tue, 2012-03-13 at 10:03 -0400, Greg Ward wrote:
>> Hi all --
>>
>> at work, we regularly get bitten by ambiguous merge ancestors: if two
>> common ancestors have the same distance from the root and both are the
>> maximum distance, ancestor() picks the first one it sees. Sometimes it
>> picks wrong and the user gets stuck resolving a conflict that 1) is
>> not of his making and 2) has already been resolved by someone else.
>
> Hmm, I might want some data from you.. I've been recently formulating
> some ideas about how to be smarter (or at least more consistent) when
> picking the merge ancestor.
>
> Does your workflow regularly involve backouts or bidirectional merging
> between branches?
>
>> One workaround for this annoyance is to add an --ancestor option to
>> the merge command. Martin posted an implementation of this on
>> stackoverflow recently, and we're going to steal his code and add it
>> our local "hacks" extension. (Thanks Martin!)
>
> Link?

It's here, along with an example of a criss-cross merge that gives
different output depending on the ancestor you pick:

http://stackoverflow.com/a/9430810/110204

I just let 'hg merge' pass an revision to merge.update, really simple.

--
Martin Geisler

Mercurial links: http://mercurial.ch/

Greg Ward

unread,
Mar 16, 2012, 9:56:46 PM3/16/12
to Matt Mackall, mercuri...@selenic.com
On 13 March 2012, Matt Mackall said:
> Hmm, I might want some data from you.. I've been recently formulating
> some ideas about how to be smarter (or at least more consistent) when
> picking the merge ancestor.

I've been thinking about it lately, but not coming up with any great
ideas. I'd love to see this get better, and the developers at work who
get bitten by it would love it even more. ;-)

The canonical case that has bitten us from the beginning was discussed
here:

http://mercurial.markmail.org/thread/h32ixccixjgntvtx

The summary of that case is:

1) Alice and Bob are working concurrently from the same changeset on
branch 1.0
2) Alice commits on 1.0
3) Alice merges to 1.1
4) Alice merges to default
5) Bob commits on 1.0
6) Bob merges to 1.1, gets a conflict, resolves it
7) Bob merges to default
8) Alice pushes and goes home: she's done her day's work
9) Bob attempts to push and fails: "push creates remote heads"
10) Bob pulls
11) Bob merges with Alice on 1.0, 1.1, and trunk
12) Bob pushes and goes home: he's done his day's work
13) Carl starts work at the tip of branch 1.0 (Bob's merge with Alice)
13) Carl merges 1.0 to 1.1: FAIL: he gets Bob's conflict!

We've mitigated the mess here with a fancy merge script on top of
Mercurial, but the fundamental problem is still there and cannot
always be avoided.

But we've recently discovered another workflow that is also affected
by ambiguous merge ancestors.

> Does your workflow regularly involve backouts or bidirectional merging
> between branches?

Backouts: occasionally. I'm not aware of this being a frequent source
of merge problems though.

Bidirectional merging between branches: I think this is what has
started biting us recently.

Here's the story: we need to create release branch 4.3 to start
getting finished features to customers, and so the next round of
features (aimed at release 4.4) can be pushed to our master repository
and start getting integrated with each other.

But there's another sizeable feature that we need to get to customers
sooner than 4.4, but it isn't ready in time for 4.3. We can't schedule
another release branch because there are no extra digits between "3"
and "4". Ha ha only serious. The overhead of an additional release
branch is too much to contemplate, because it would force customers to
upgrade in order to get new features. Can't have that; we have to just
make the new feature magically appear on an existing release branch
and not call it an upgrade. (Long story.)

So we decided to create a branch 4.3m, where m is short for the
feature. (Yes, the real branch name is slightly longer.) It branched
off the trunk immediately after branching 4.3. The plan is to merge
4.3m onto 4.3 once the 4.3m features are done, QA is happy with them,
and QA is also ready for a big pile of new features to land on 4.3 --
i.e. they have tested the bejeezus out of everything else on 4.3 and
declared it "pretty good".

Problem #1: we want the 4.3m branch to see bug fixes happening on 4.3,
otherwise the 4.3m developers might waste time chasing down bugs that
are already fixed, and QA might similarly waste time when testing
builds from 4.3m. So we plan to regularly merge from 4.3 to 4.3m.

Problem #2: the work happening on 4.3m is feature work, so there are
going to be big changes -- and if changes also happen to that code on
the trunk post-4.3[m], the chance of eventual conflicts goes way up.
We do not want developers to work in isolation on 4.3m and have their
changes invisible to other developers working on trunk. So we plan to
regularly merge 4.3m to trunk. To make things simpler, we decide to
merge 4.3 -> 4.3m -> trunk -- then it's just one round of merging
handled by one developer a couple of times per week.

Luckily I can simulate the whole mess in a toy repo:

"""
#!/bin/sh -ex

rm -rf repo
hg init repo
cd repo

HGMERGE=internal:merge ; export HGMERGE

# metadata.txt is similar to IDE project files, which
# we modify with the branch name on every new branch

echo a > file2
echo "this is trunk" > metadata.txt
hg ci -d"0 0" -u andy -Am"init"

# create release branch
hg branch 4.3
echo "this is 4.3" > metadata.txt
hg ci -d"0 0" -u andy -m"create branch 4.3"
hg update default
hg merge --tool internal:merge 4.3
hg revert -r . metadata.txt
hg ci -d"0 0" -u andy -m"partly merge from 4.3"

# create feature branch
hg branch 4.3m
echo "this is 4.3m" > metadata.txt
hg ci -d"0 0" -u bob -m"create branch 4.3m"
echo b >> file2
hg ci -d"0 0" -u bob -m"change something on 4.3m"

# regular maintenance workflow: go fix a bug on the release branch and
# merge it to trunk
hg update 4.3
echo c >> file2
hg ci -d"0 0" -u carl -m"bug fix on release branch"
hg update default
hg merge 4.3
hg ci -d"0 0" -u carl -m"merge from 4.3"

# back to feature branch and merge from 4.3 (resolving conflict)
hg update 4.3m
hg merge 4.3 || true
sed -i -e '/^<<<</ d; /^====/ d; /^>>>>/ d' file2
hg resolve -m file2
hg ci -d"0 0" -u bob -m"merge from 4.3"
"""

So far so good: one expected conflict which Bob was able to resolve,
level-headed professional that he is. But now Bob has to merge 4.3m
back to trunk. No problem, right? After all, the bug fix on 4.3 has
already been merged, and he hasn't done anything earth-shattering on
4.3m yet. Here's the graph at this point:

o 7:d279f82d2b44 4.3m merge from 4.3
|\
| | @ 6:ece18ae6d801 default merge from 4.3
| |/|
| o | 5:bdac2e3447ea 4.3 bug fix on release branch
| | |
o | | 4:1a606fd8486d 4.3m change something on 4.3m
| | |
o---+ 3:cd200f854e39 4.3m create branch 4.3m
/ /
| o 2:a70f5bcc6e6d default partly merge from 4.3
|/|
o | 1:95c40b0fb1b0 4.3 create branch 4.3
|/
o 0:6f0973ea0a5e default init

That is, Bob is going to merge rev 7 to rev 6. Note the common
ancestors:

$ hg log -q -r 'ancestors(6) & ancestors(7)'
0:6f0973ea0a5e
1:95c40b0fb1b0
2:a70f5bcc6e6d
5:bdac2e3447ea

Revs 2 and 5 are the same distance from root: ambiguous merge
ancestor! Turns out that Mercurial picks 2 as the merge ancestor, so
Bob gets an unexpected conflict in file2:

$ hg update default
$ hg merge 4.3m
merging file2
warning: conflicts during merge.
merging file2 incomplete! (edit conflicts, then use 'hg resolve --mark')
1 files updated, 0 files merged, 0 files removed, 1 files unresolved
use 'hg resolve' to retry unresolved file merges or 'hg update -C .'
to abandon
$ cat file2
a
<<<<<<< local
=======
b
>>>>>>> other
c

Darn. That file should have been mergeable by software. What if I try
Martin's extension and force rev 5 as the ancestor?

$ HGMERGE=internal:merge hg --config extensions.mergebase=../mergebase.py merge --ancestor=5 4.3m
merging metadata.txt
warning: conflicts during merge.
merging metadata.txt incomplete! (edit conflicts, then use 'hg resolve --mark')
1 files updated, 0 files merged, 0 files removed, 1 files unresolved
use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon

Good news: file2, the tricky source code merge that requires actual
brainpower to resolve, was handled perfectly this time:

$ cat file2
a
b
c

But that stupid metadata.txt, which we deliberately keep in a
different state on every branch, was not:

$ cat metadata.txt
<<<<<<< local
this is trunk
=======
this is 4.3m
>>>>>>> other

In reality, we have ~100 of those files (Eclipse project
descriptions). Conflicts there are annoying and alarming, but easily
resolved: "hg revert -r . -I'**.project' .". At least, they are easily
handled with internal:merge. If you use a nice, friendly interactive
merge tool, it is a royal PITA because you have to sit through ~100
invocations of kdiff3 (or whatever). Not fun. (This is a major reason
why people persist in using old-school non-interactive merge tools
that put those ugly conflict markers in their source code.)

We have discovered other conditions that make this even more
interesting, but I'll spare you the gory details. It all seems to boil
down to this: merge ancestors can be ambiguous. Sometimes Mercurial
picks the wrong one. Sometimes there is no right one, but one is less
wrong than another. (E.g. above I would say 5 is less wrong than 2,
because conflicts in metadata.txt can be resolved mechanically.) And
sometimes you want one ancestor for one file and a different ancestor
for a different file.

> > But I am inclined to send in a post that adds --ancestor for everyone.
> > Yes, it's a workaround rather than a proper fix. But if other people
> > are being bitten by this, it could help them too.
>
> I'm about +0 on this. Upsides: you can address cases like this and
> manually do graft-like operations. Downsides: you can do extremely wrong
> merges with very little effort that will royally and permanently screw
> up your history.

I completely agree. I'm about +0.5 because I don't know what else to
do.

> > As for a proper fix: we're thinking of hacking up
> > mercurial/ancestor.py so ancestors() returns a list, making the caller
> > is responsible for picking one -- and possibly telling the user,
> > "ambiguous merge ancestor: if this merge goes horribly wrong, try
> > again with --ancestor". This code is not terribly extension-friendly,
> > so we'd have to carry this as a local patch until we can get it
> > accepted... so what do you think of that idea?
>
> There's way too much code that assumes one ancestor. So we'd need a
> second interface.

I'm thinking this: turn ancestors() into a generator, allancestors(),
that returns all common ancestors at maximum distance from the root.
Add a new ancestors() that just consumes and returns the first output
of allancestors(). Make merge call allancestors() and warn if > 1.

Greg
--
Greg Ward http://www.gerg.ca/

"He's dead, Jim. You get his tricorder and I'll grab his wallet."

Matt Mackall

unread,
Mar 21, 2012, 5:11:34 PM3/21/12
to Greg Ward, mercuri...@selenic.com

This is yet another case where we can't do any meaningful
differentiation between possible ancestors (the commits in (2) and (5)
in this case). We could perhaps walk the graph and notice that (5) has a
descendant merge with a conflict, and thus score it higher, but it'll
still be trivial to create scenarios with ties.

So there are two ways we can go:

- allow manual ancestor selection (restricted to heads(::x and ::y))?
- invent a merge operator that's well-defined for multiple ancestors

It's not too hard to see how the latter might work, if we ignore
renames. Something like this:

actions = {}
for a in allancestors(x, y):
actions[a] = manifestmerge(x, y, a)
combined = actionmerge(actions)
final = resolveprompts(combined)
applyupdates(final)

The magic of course is in actionmerge, which can note that wherever the
action lists aren't unanimous about a conflict existing, and then skip
it. In other words, the principle is:

"a merge conflict in an N-ancestor merge exists only if it
exists in all 1-ancestor merges"

But this raises other questions. For instance, we can have two
1-ancestors merges that are conflict-free but don't agree. Which means
we now have a meta-conflict. This might actually be a good thing, as it
may mean that merge ambiguities introduced by backout-like graph
excursions are now conflicts and can thus be resolved rather than
oscillating.

--
Mathematics is the supreme nostalgia of our time.

Greg Ward

unread,
Mar 21, 2012, 8:48:31 PM3/21/12
to Matt Mackall, mercuri...@selenic.com
On 21 March 2012, Matt Mackall said:
> > 1) Alice and Bob are working concurrently from the same changeset on
> > branch 1.0
> > 2) Alice commits on 1.0
> > 3) Alice merges to 1.1
> > 4) Alice merges to default
> > 5) Bob commits on 1.0
> > 6) Bob merges to 1.1, gets a conflict, resolves it
> > 7) Bob merges to default
> > 8) Alice pushes and goes home: she's done her day's work
> > 9) Bob attempts to push and fails: "push creates remote heads"
> > 10) Bob pulls
> > 11) Bob merges with Alice on 1.0, 1.1, and trunk
> > 12) Bob pushes and goes home: he's done his day's work
> > 13) Carl starts work at the tip of branch 1.0 (Bob's merge with Alice)
> > 13) Carl merges 1.0 to 1.1: FAIL: he gets Bob's conflict!
>
> This is yet another case where we can't do any meaningful
> differentiation between possible ancestors (the commits in (2) and (5)
> in this case). We could perhaps walk the graph and notice that (5) has a
> descendant merge with a conflict, and thus score it higher, but it'll
> still be trivial to create scenarios with ties.

I was confused at first by how you can detect conflict after-the-fact.
Then I realized you meant "Mercurial had to invoke a merge tool" not
"the merge tool had to ask the user for help". (Most people mean the
latter when they say "there was a conflict".)

I.e. if file F is different in R than both p1(R) and p2(R), then F had
a conflict in R. Right? Wait... no... it's not that simple. The user
might have resolved a conflict entirely in favour of p1(R) or of
p2(R). So I guess I'm still confused about how you can detect a
conflict after the fact.

> So there are two ways we can go:
>
> - allow manual ancestor selection (restricted to heads(::x and ::y))?
> - invent a merge operator that's well-defined for multiple ancestors
>
> It's not too hard to see how the latter might work, if we ignore
> renames.

That would indeed be nifty. I'll have to screw on the old thinking cap
and cogitate over this a bit.

Greg
--
Greg Ward http://www.gerg.ca/

I'm on a strict vegetarian diet -- I only eat vegetarians.

Matt Mackall

unread,
Mar 22, 2012, 11:21:29 AM3/22/12
to Angel Ezquerra Moreu, mercuri...@selenic.com, Greg Ward
On Thu, 2012-03-22 at 15:44 +0100, Angel Ezquerra Moreu wrote:
> On Thu, Mar 22, 2012 at 2:37 PM, Matt Mackall <m...@selenic.com> wrote:

> > On Wed, 2012-03-21 at 20:48 -0400, Greg Ward wrote:
> >> On 21 March 2012, Matt Mackall said:
> >> > > 1) Alice and Bob are working concurrently from the same changeset on
> >> > > branch 1.0
> >> > > 2) Alice commits on 1.0
> >> > > 3) Alice merges to 1.1
> >> > > 4) Alice merges to default
> >> > > 5) Bob commits on 1.0
> >> > > 6) Bob merges to 1.1, gets a conflict, resolves it
> >> > > 7) Bob merges to default
> >> > > 8) Alice pushes and goes home: she's done her day's work
> >> > > 9) Bob attempts to push and fails: "push creates remote heads"
> >> > > 10) Bob pulls
> >> > > 11) Bob merges with Alice on 1.0, 1.1, and trunk
> >> > > 12) Bob pushes and goes home: he's done his day's work
> >> > > 13) Carl starts work at the tip of branch 1.0 (Bob's merge with Alice)
> >> > > 13) Carl merges 1.0 to 1.1: FAIL: he gets Bob's conflict!
> >> >
> >> > This is yet another case where we can't do any meaningful
> >> > differentiation between possible ancestors (the commits in (2) and (5)
> >> > in this case). We could perhaps walk the graph and notice that (5) has a
> >> > descendant merge with a conflict, and thus score it higher, but it'll
> >> > still be trivial to create scenarios with ties.
> >>
> >> I was confused at first by how you can detect conflict after-the-fact.
> >
> > Simple. A merge without conflicts will have no files listed in the
> > changeset. In this scheme, we'd try to pick the merge path that had the
> > most conflicts already resolved. So we'd notice that one of the choices
> > of ancestor implied merge 'legs' including Bob's conflict resolution
> > from (6) and choose it over the one with no resolutions in its legs.
> >
> > This tweak is much more work than its worth, though, as it nibbles only
> > a small chunk off the ambiguous domain.

> >
> >> > So there are two ways we can go:
> >> >
> >> > - allow manual ancestor selection (restricted to heads(::x and ::y))?
> >> > - invent a merge operator that's well-defined for multiple ancestors
> >> >
> >> > It's not too hard to see how the latter might work, if we ignore
> >> > renames.
> >>
> >> That would indeed be nifty. I'll have to screw on the old thinking cap
> >> and cogitate over this a bit.
> >
> > I'm starting to write up some design notes for this idea, which I'm
> > calling "concensus merge".
> >
> > A quick measurement on the Mercurial repo shows:
> >
> > 1911 merges
> > 83 with two or more merge ancestors
> > 1 with three
>
> Matt,
>
> is there a simple way (e.g. revset) to repeat that measurement? I
> suspect that mercurial's history is probably more linear than most,
> given the patch based workflow, the excellent review process and the
> high commit quality standards. The fact that there are only 2 named
> branches probably contributes to that as well.
>
> I could repeat those measurements on some of our repos to give you
> another measurement point.

I did this:

hg log --template '{rev}\n' -r 'merge()' > merges
for f in `cat merges`; do echo -n "$f: "; hg log -r "heads(::p1($f) and ::p2($f))" --template "{rev} "; echo; done > merge-ancestors

You can also do something like this:

$ hg dbsh
loaded repo : /home/mpm/hg
using source: /home/mpm/hg/mercurial
>>> d = {}
>>> for m in repo.revs("merge()"):
... d[m] = repo.revs('heads(::p1(%d) and ::p2(%d))', m, m)
...
>>> len(d)
1911
>>> len([x for x in d if len(d[x]) >= 2])
83

It's actually not clear from this measurement that any of these merges
were 'ambiguous' based on the current algorithm, which picks the first
common ancestor furthest from root.

--
Mathematics is the supreme nostalgia of our time.

Reply all
Reply to author
Forward
0 new messages