Proposal for using a multi-headed tree instead of inbound

Kartikaya Gupta

unread,

Apr 3, 2013, 5:31:32 PM4/3/13

to

(Cross-posted to mozilla.dev.tree-management; please reply to
mozilla.dev.platform)

TL;DR: I propose that instead of using inbound as a "linear commit;
rebase as needed" tree, we use a tree that allows multiple heads (e.g
try) to land patches on. Each patch queue would be based off a
known-green m-c changeset, and result in a new "head" on the try tree.
If the developer is satisfied with the green-ness of the head, they can
flag it for merging into m-c. I believe this will result in both less
resource usage and less frustration for developers/sheriffs than using
inbound. If this sounds interesting to you, please read on...

Long version:

There have been long closures on inbound recently for various reasons,
mostly various kinds of bustage and having to wait for the builders to
catch up. I'm sure everybody feels the frustration of not being able to
land their work because the tree is closed for something unrelated.
There was a thread on some mobile mailing lists about the possibility of
creating a mobile-central branch similar to fx-team, services-central,
etc. to land mobile patches in isolation, and then merge that in to m-c.
There are advantages and disadvantages to this solution, but while
thinking about it I had a different idea that I think might work better.

The developer workflow I'm proposing requires there be a tree that is
allowed to have multiple heads. The "try" tree we have now satisfies
this requirement, so I'm using that in my proposal, but we could just as
easily use inbound or some other tree provided it accepts multiple
heads. The process for landing a patch from the developer's point of
view would look like this:

1. Take the latest green m-c change, commit your patch(es) on top of it,
and push it to try.
2. If your try push is green, flag it for eventual merge to m-c and
you're done.
3. If your try push is not green, update your patch(es) and go back to
step 1.

The "eventual merge" step would be resolved by the sheriffs picking up
green flagged heads from try and merging them into m-c. If the resulting
m-c is broken, the tree is closed until fixed (see Scenario D below) but
people can continue landing on try while this happens.

I'm going to go through some common scenarios of patches that currently
cause bustage on inbound and show how they would be different with my
suggested process. Except for the last scenario they all show a clear
win. Even in the last one, I would consider it a win although it's
debatable depending on what you value. If you feel this proposal would
be worse in any non-pathological scenario, please do reply with a
description of said scenario.

Note that in the scenarios below I'm not assuming people test their
patches on try unless I explicitly state that (e.g. in Scenario C). Also
I'm assuming that each named patch is independent of the other named
patches (i.e. is not part of the same "patch queue") unless otherwise
stated.

Scenario A
----------
In this scenario, there are a bunch of patches, let's call them patches
A, B, C, D, and E. Patch C contains a bug that causes bustage on tbpl,
but is easily fixed by the developer once the bustage is visible.

With the current process:

Patch A lands on inbound, tree is green.
Patch B lands on inbound, tree is green.
Patch C lands on inbound, causes bustage.
Patch D lands on inbound, bustage continues.
Patch E lands on inbound, bustage continues.
Patch C is backed out, tree is green.
Patch C' (fixed) lands on inbound, tree is green.
Eventually inbound is merged to m-c, tree is green.

In this process we used 8 units of tbpl resources, but the backout
investigation was straightforward and did not consume much
developer/sheriff time.

With my suggested process, all of these patches would be based on the
latest green m-c patch and land on try first:

Patch A lands on try, tree is green.
Patch B lands on try, tree is green.
Patch C lands on try, causes bustage.
Patch D lands on try, tree is green.
Patch E lands on try, tree is green.
Patch C' (fixed) lands on try, tree is green.
A, B, D, E, and C' are merged to m-c, tree is green.

This process uses 7 units of tbpl resources, and there was no backout
involved which keeps the final set of patches in m-c cleaner (helps with
hg blame). Again no significant developer/sheriff time was used.

So even with one bad patch, my suggested process is winning here. Let's
look at another scenario:

----------
Scenario B - the one with tree closure
----------
This is the same as scenario A, except now both B and C are bad patches
and require one iteration on tbpl to fix.

With the current process:

Patch A lands on inbound, tree is green.
Patch B lands on inbound, causes bustage.
Patch C lands on inbound, bustage continues.
Patch D lands on inbound, bustage continues.
Patch E lands on inbound, bustage continues.
Patch B is backed out, bustage continues.
Tree is closed.
Investigation to determine which of {C,D,E} is causing bustage.
Eventually C is backed out, tree is green.
Tree is reopened.
Patch B' lands on inbound, tree is green.
Patch C' lands on inbound, tree is green.
Inbound is merged to m-c, tree is green.

This process used 10 units of tbpl time, plus had a tree closure
(significant developer frustration) and some amount of developer/sheriff
time was used to figure out what happened.

Suggested process:

Patch A lands on try, tree is green.
Patch B lands on try, causes bustage.
Patch C lands on try, causes bustage.
Patch D lands on try, tree is green.
Patch E lands on try, tree is green.
Patch B' lands on try, tree is green.
Patch C' lands on try, tree is green.
A, D, E, B', and C' and merged to m-c, tree is green.

This process used 8 units of tbpl time, and no tree closure or sheriff
time was involved. The only developer frustration was to the author(s)
of the bad patches. Doesn't get much fairer than that. The hg blame in
m-c is also about as clean as possible.

----------
Scenario C - the one with model citizens
----------
In this scenario, people are model citizens and push to try before
pushing to inbound to ensure their patches are great. Patches A and B
are perfect, but C has a bug that requires one iteration to fix.

Current process:

Patch A lands on try, tree is green.
Patch A lands on inbound, tree is green.
Patch B lands on try, tree is green.
Patch B lands on inbound, tree is green.
Patch C lands on try, causes bustage.
Patch C' lands on try, tree is green.
Patch C' lands on inbound, tree is green.
Inbound is merged to m-c, tree is green.

8 units of tbpl time are used to land these 3 patches, but at least
inbound was never broken or closed. That's not bad at all.

In my suggested process there is no inbound, so it's much simplified:

Patch A lands on try, tree is green.
Patch B lands on try, tree is green.
Patch C lands on try, causes bustage.
Patch C' lands on try, tree is green.
Patches A, B, and C' are merged to m-c, tree is green.

This uses 5 units of tbpl time, also with no tree closure or breakage.
Clear win.

----------
Scenario D - the one with patch interactions
----------
In this scenario we have our 5 patches A-E, but unbeknownst to the
authors, patches B and C have a bad interaction and so cause bustage
when combined. Patch C' is a modified version of C that accounts for the
interaction, and patch F is the diff between C and C' (i.e. it can be
applied on top of C to get C').

Current process:

Patch A lands on inbound, tree is green.
Patch B lands on inbound, tree is green.
Patch C lands on inbound, causes bustage.
Patch D lands on inbound, bustage continues.
Patch E lands on inbound, bustage continues.
Patch C is backed out, tree is green.
Patch C' lands on inbound, tree is green.
Inbound is merged to m-c, tree is green.

8 units of tbpl time are used, assuming the author of patch C can figure
out exactly what happened (I don't know how often this is the case). For
one more unit of tbpl time, you can buy patch C going green on try
*before* the inbound landing, which means the author will know with more
certainty it was a bad interaction with either patch A or B. In more
subtle interactions, though, the author of patch C will spend 2-3 units
of tbpl time on try iterations to figure out what happened before
pushing C' back to inbound.

Suggested process:

Patch A lands on try, tree is green.
Patch B lands on try, tree is green.
Patch C lands on try, tree is green.
Patch D lands on try, tree is green.
Patch E lands on try, tree is green.
Patches A, B, C, D, and E are merged to m-c, causes bustage.
Tree is closed.
Patch F (based on new m-c) lands on try, tree is green.
Patch F is merged to m-c, tree is green.
Tree is opened.

Again here 8 units of tbpl time are used, assuming the author of patch C
can figure out exactly what happened. If the interactions are more
subtle, it won't be obvious and the sheriffs (or somebody) will need to
spend 2-3 units of tbpl time to bisect the merge and figure out what
happened before pushing F. You could argue that this is slightly worse
than the current process since there's no clear "owner" for the tree
closure. Note that this is a problem with our current process in
scenario B as well, and that we can solve this by using policy (e.g.
"the author of the last change in the merge must do the bisection").

Most importantly, the tree closure doesn't prevent people from
continuing to land things on try *without any modification to their
workflow*. This part, IMO, is critical, and outweighs the potential
disadvantage to the person stuck with bisecting the bustage.

--- End of scenarios

My suggested process *requires* a tree which allows multiple heads,
which is why I suggest "try" instead of "inbound". It has been suggested
that hg does not deal well with tree with large numbers of heads, which
is why try is reset every so often. In my proposal, we can still reset
try periodically, as long as the changes have been merged to m-c. M-c
itself will always have only one head.

Another potential problem with this approach is that we will have more
merge changes in m-c, which generally screws with hg bisect. Personally
I already have enough trouble with hg bisect to the point where I don't
use it because I can't trust it. This may be a legitimate problem for
some, but it's not for me.

Cheers,
kats

Gregory Szorc

unread,

Apr 3, 2013, 6:24:35 PM4/3/13

to Kartikaya Gupta, dev-platform

On 4/3/13 2:31 PM, Kartikaya Gupta wrote:

Excellent write-up! I think a re-examination of our tree management is
long overdue, especially with all the recent closures on inbound.

> My suggested process *requires* a tree which allows multiple heads,
> which is why I suggest "try" instead of "inbound". It has been suggested
> that hg does not deal well with tree with large numbers of heads, which
> is why try is reset every so often. In my proposal, we can still reset
> try periodically, as long as the changes have been merged to m-c. M-c
> itself will always have only one head.

We could consider Mercurial bookmarks for this. (I believe bookmarks
weren't ready for prime time when we established our current tree
management policy.)

There are some interesting things you could do with bookmarks. e.g. you
could have the convention that all pushed bookmarks following a specific
naming convention (e.g. autoland/bug123456) are automatically considered
for "cherry-picking" to mainline.

While I concede that Mercurial's UI around bookmarks isn't as good as
say Git branches (e.g. |hg push| by default pushes all bookmarks and
AFAIK there is no config option to disable that - you need to know to
specify -r <bookmark>), one of the things that makes Mercurial special
is that it is highly extensible. We could consider maintaining a
Mercurial extension that forces proper use of our ordained tree policy.
Combine this with some push hooks that reject invalid pushes and we
should be set!

Justin Lebar

unread,

Apr 3, 2013, 6:34:03 PM4/3/13

to Gregory Szorc, dev-platform, Kartikaya Gupta

This is a really interesting idea.

git is smart enough to let you pull only the csets that end up getting
merged into master. I don't know how multiple heads works with hg,
but I wonder if we could make hg smart enough to do the same thing.

Otherwise, maybe it's time to switch to git. <ducks>

On Wed, Apr 3, 2013 at 6:24 PM, Gregory Szorc <g...@mozilla.com> wrote:
> On 4/3/13 2:31 PM, Kartikaya Gupta wrote:
>
> Excellent write-up! I think a re-examination of our tree management is long
> overdue, especially with all the recent closures on inbound.
>
>

>> My suggested process *requires* a tree which allows multiple heads,
>> which is why I suggest "try" instead of "inbound". It has been suggested
>> that hg does not deal well with tree with large numbers of heads, which
>> is why try is reset every so often. In my proposal, we can still reset
>> try periodically, as long as the changes have been merged to m-c. M-c
>> itself will always have only one head.
>
>

> We could consider Mercurial bookmarks for this. (I believe bookmarks weren't
> ready for prime time when we established our current tree management
> policy.)
>
> There are some interesting things you could do with bookmarks. e.g. you
> could have the convention that all pushed bookmarks following a specific
> naming convention (e.g. autoland/bug123456) are automatically considered for
> "cherry-picking" to mainline.
>
> While I concede that Mercurial's UI around bookmarks isn't as good as say
> Git branches (e.g. |hg push| by default pushes all bookmarks and AFAIK there
> is no config option to disable that - you need to know to specify -r
> <bookmark>), one of the things that makes Mercurial special is that it is
> highly extensible. We could consider maintaining a Mercurial extension that
> forces proper use of our ordained tree policy. Combine this with some push
> hooks that reject invalid pushes and we should be set!
>

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

L. David Baron

unread,

Apr 3, 2013, 6:36:54 PM4/3/13

to dev-pl...@lists.mozilla.org

On Wednesday 2013-04-03 17:31 -0400, Kartikaya Gupta wrote:
> 1. Take the latest green m-c change, commit your patch(es) on top of
> it, and push it to try.
> 2. If your try push is green, flag it for eventual merge to m-c and
> you're done.
> 3. If your try push is not green, update your patch(es) and go back
> to step 1.

This seems like it would lead to a substantial increase in
build/test load -- one that I suspect we don't currently have the
hardware to support. This is because it would require a full
build/test run for every push, which we avoid today because many
builds and tests get merged on inbound when things are behind. (I
also don't feel like we need a full build/test run for every push,
so it feels like unnecessary use of resources to me.)

-David

--
𝄞 L. David Baron http://dbaron.org/ 𝄂
𝄢 Mozilla http://www.mozilla.org/ 𝄂

Gary Kwong

unread,

Apr 3, 2013, 6:43:23 PM4/3/13

to Christian Holler, Jesse Ruderman

> Another potential problem with this approach is that we will have more
> merge changes in m-c, which generally screws with hg bisect. Personally
> I already have enough trouble with hg bisect to the point where I don't
> use it because I can't trust it. This may be a legitimate problem for
> some, but it's not for me.

autoBisect relies heavily on `hg bisect` and has been working well with
JSBugMon especially in Spidermonkey for the better part of ~ 4 years now.

-Gary

Justin Lebar

unread,

Apr 3, 2013, 6:58:06 PM4/3/13

to Gary Kwong, Jesse Ruderman, dev-platform, Christian Holler

If anything this should improve the experience of bisecting, because
you'll be able to bisect known-good csets on m-c and only at the end
step in to the merge csets which may or may not be good.

Right now we say that when people push a patch queue to m-c every
patch should be green, but in practice that's often not the case.
Right now when you bisect you have no way that I'm aware of to say
"bisect only the tips of pushes", because hg doesn't store that
information, so you end up bisecting through many bad csets.

Gregory Szorc

unread,

Apr 3, 2013, 7:11:48 PM4/3/13

to L. David Baron, dev-pl...@lists.mozilla.org

On 4/3/13 3:36 PM, L. David Baron wrote:
> On Wednesday 2013-04-03 17:31 -0400, Kartikaya Gupta wrote:

>> 1. Take the latest green m-c change, commit your patch(es) on top of
>> it, and push it to try.
>> 2. If your try push is green, flag it for eventual merge to m-c and
>> you're done.
>> 3. If your try push is not green, update your patch(es) and go back
>> to step 1.
>

> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind. (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)

Let me ask a related question: why does my push to change a .js file in
/services incur 40+ hours of machine time to run the full reftest suite
- something which it is practically guaranteed to not interfere with?

IMO we should have a mechanism in place that audits changesets for
likely impact, chooses a reasonable set of builds/tests to perform, and
runs with it. We can supplement this with "no changeset lands in m-c
until all builds/tests have ran on it" (i.e. a later push on that tree
or a manual or timer triggered full gamut run). This general idea is
being considered in bug 849385.

Sure, regressions will fall through the cracks. But, the cost to not
doing it this way is continued and perpetual run-ins with automation
capacity.

At the same time, we need to make our automation infrastructure more
efficient. http://brasstacks.mozilla.com/gofaster/#/overhead/build says
our win32 builders spend 1 hour (out of their 3 hour total wall time) in
"setup and teardown." 33% lost to overhead on every push. Sadness.
Failing to use all available cores for xpcshell tests, reftests, etc
wastes even more CPU cycles.

Our current approach is not sustainable (unless we are content with
throwing a lot of money and resources at the problem).

Joshua Cranmer 🐧

unread,

Apr 3, 2013, 7:28:28 PM4/3/13

to

On 4/3/2013 4:31 PM, Kartikaya Gupta wrote:
> The developer workflow I'm proposing requires there be a tree that is
> allowed to have multiple heads. The "try" tree we have now satisfies
> this requirement, so I'm using that in my proposal, but we could just
> as easily use inbound or some other tree provided it accepts multiple
> heads. The process for landing a patch from the developer's point of
> view would look like this:
>
> 1. Take the latest green m-c change, commit your patch(es) on top of
> it, and push it to try.
> 2. If your try push is green, flag it for eventual merge to m-c and
> you're done.
> 3. If your try push is not green, update your patch(es) and go back to
> step 1.
>
> The "eventual merge" step would be resolved by the sheriffs picking up
> green flagged heads from try and merging them into m-c. If the
> resulting m-c is broken, the tree is closed until fixed (see Scenario
> D below) but people can continue landing on try while this happens.

I think the relative merits of this approach is easily decidable by
looking at two numbers:
1. How many of our patch landings would have non-trivial merge conflicts?
2. How much patch bustage is caused by interaction failures instead of
patch author/reviewer negligence?

Note that your proposal makes both of these cases strictly worse: in the
first case, you force the tree sheriffs (instead of developers) to spend
more time resolving conflicts; in the second, sheriffs have to spend
more time finding the cause of the bustage. Keep in mind that we have
about three sheriffs and about a hundred developers--so it is better to
optimize for sheriffs' time instead of developers' time (at the very
least, developers represent better parallelization opportunities).

I don't know these numbers. I can estimate the first point by pointing
out that the JS developers stopped doing JS development on a project
branch because merges were too painful, which suggests that this would
be a nontrivial amount. The second point I have no reference basis (most
of my development is on comm-central), but if you presume that the patch
was developed and reviewed in good faith for not breaking things, then
it would suggest that this is not altogether rare.

For what it's worth, I do recall there being release engineering talk
about some sort of "autoland" feature (which would automatically land
any patch that passed try or something), and I recall (my memory may
just be playing tricks on me) that it got to the prototyping phase but
was shelved because the "merge" part was too difficult/impossible. A
comment from anyone who worked on this would be helpful.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

jmaher

unread,

Apr 3, 2013, 7:34:23 PM4/3/13

to dev-pl...@lists.mozilla.org

>
> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind. (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)
>
>

I would love to see a model where we only run full tests on a schedule as we do for pgo builds. For everything in between we would be able to run a much smaller set of tests that are known to be less noisy and ignoring tests which always pass. While this would be some work up front, it would allow for great scaling of our systems and much faster turnaround times.

To make something like this happen we would need a method in place for us to launch builds and tests which were not run for any given changeset from a simple UI (i.e. TBPL) to hunt down a regression if one is found on the full set of tests when they are finally run.

jmaher

unread,

Apr 3, 2013, 7:34:23 PM4/3/13

to mozilla.de...@googlegroups.com, dev-pl...@lists.mozilla.org

>
> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind. (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)
>
>

Joshua Cranmer 🐧

unread,

Apr 3, 2013, 7:44:21 PM4/3/13

to

On 4/3/2013 5:36 PM, L. David Baron wrote:
> On Wednesday 2013-04-03 17:31 -0400, Kartikaya Gupta wrote:
>> 1. Take the latest green m-c change, commit your patch(es) on top of
>> it, and push it to try.
>> 2. If your try push is green, flag it for eventual merge to m-c and
>> you're done.
>> 3. If your try push is not green, update your patch(es) and go back
>> to step 1.
> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind. (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)

Continuing in the vein of people posting ideas for "use less
infrastructure", an idea I had is this:

Instead of running {mochitest-*,reftest,crashtest,xpcshell,marionette}
on every single desktop platform on every inbound checkin, run them
round-robin. A given push would trigger mochitest on Linux, reftest on
mac, and then the next test would run reftest on Linux and mochitest on
Mac. Each push would still end up running the full testsuite (modulo
some tests that are only run on some platforms) on some platform, and we
would be using fewer test resources to achieve that coverage. If most
actual test bustage is not platform-specific, this would give us
generally sufficiently good coverage for most checkins. I am not
qualified to say, however, if this is the case.

Jesse Ruderman

unread,

Apr 3, 2013, 7:49:20 PM4/3/13

to

+1.

But can we do this with rebased changesets instead of "trivial" merge
changesets? While the core of hg can handle merges, pretty much none of
the tools we rely on for understanding history (hg {log, grep, diff,
bisect}) handle them well.

Justin Lebar

unread,

Apr 3, 2013, 7:58:46 PM4/3/13

to Jesse Ruderman, dev-platform

> But can we do this with rebased changesets instead of "trivial" merge changesets?

Personally, I think merges are a Very Good idea because they bake into
the tree information that is currently contained only in the pushlog.
This improves bisect from my perspective, although I'll grant it might
not improve other tools. Merges also preserve the differences between
the code I wrote and the code that eventually got merged in after
conflicts were resolved, perhaps by the sheriff.

But perhaps we should table this discussion (and some of the other
ones in this thread) until after we decide that this is a good idea.

Jeff Hammel

unread,

Apr 3, 2013, 7:59:31 PM4/3/13

to dev-pl...@lists.mozilla.org

If there was a way/guidelines to do this in some sensible way, or even a
cultural norm like "reviewer comments on what tests to run", I'd give a
+0.5 here fwiw

Chris AtLee

unread,

Apr 3, 2013, 8:06:43 PM4/3/13

to Gregory Szorc, L. David Baron, dev-pl...@lists.mozilla.org

On 16:11, Wed, 03 Apr, Gregory Szorc wrote:

>On 4/3/13 3:36 PM, L. David Baron wrote:
>>On Wednesday 2013-04-03 17:31 -0400, Kartikaya Gupta wrote:

>>>1. Take the latest green m-c change, commit your patch(es) on top of
>>>it, and push it to try.
>>>2. If your try push is green, flag it for eventual merge to m-c and
>>>you're done.
>>>3. If your try push is not green, update your patch(es) and go back
>>>to step 1.
>>

>>This seems like it would lead to a substantial increase in
>>build/test load -- one that I suspect we don't currently have the
>>hardware to support. This is because it would require a full
>>build/test run for every push, which we avoid today because many
>>builds and tests get merged on inbound when things are behind. (I
>>also don't feel like we need a full build/test run for every push,
>>so it feels like unnecessary use of resources to me.)
>

>Let me ask a related question: why does my push to change a .js file
>in /services incur 40+ hours of machine time to run the full reftest
>suite - something which it is practically guaranteed to not interfere
>with?
>
>IMO we should have a mechanism in place that audits changesets for
>likely impact, chooses a reasonable set of builds/tests to perform,
>and runs with it. We can supplement this with "no changeset lands in
>m-c until all builds/tests have ran on it" (i.e. a later push on that
>tree or a manual or timer triggered full gamut run). This general idea
>is being considered in bug 849385.
>
>Sure, regressions will fall through the cracks. But, the cost to not
>doing it this way is continued and perpetual run-ins with automation
>capacity.
>
>At the same time, we need to make our automation infrastructure more
>efficient. http://brasstacks.mozilla.com/gofaster/#/overhead/build
>says our win32 builders spend 1 hour (out of their 3 hour total wall
>time) in "setup and teardown." 33% lost to overhead on every push.
>Sadness. Failing to use all available cores for xpcshell tests,
>reftests, etc wastes even more CPU cycles.

This really shocked me, so I took a deeper look. It turns out that we're
counting 'make check' as 'overhead'. So ~45 minutes from the 'setup
and teardown' time is being misrepresented there.

I'll see if I can fix the reporting for this.

Cheers,
Chris

signature.asc

Trevor Saunders

unread,

Apr 3, 2013, 8:28:47 PM4/3/13

to dev-pl...@lists.mozilla.org

On Wed, Apr 03, 2013 at 04:59:31PM -0700, Jeff Hammel wrote:
> On 04/03/2013 04:44 PM, Joshua Cranmer 🐧 wrote:

Another important question is what the net effect on machine load this
has is. I'd expect this would be a win over the coalescing we do today
but I'm not sure of that. If we reduce load significantly making it
somewhat slower to get full test results may well be acceptable because
we have many more cycles to spend on retriggers.

> If there was a way/guidelines to do this in some sensible way, or
> even a cultural norm like "reviewer comments on what tests to run",
> I'd give a +0.5 here fwiw

a smarter system than the one jcranmer is proposing certainly has merit
but its also significantly harder since it requires specifying what is
likely to be effected by each part of the tree and then keeping that up
to date.

Trev

Gregory Szorc

unread,

Apr 3, 2013, 8:39:21 PM4/3/13

to L. David Baron, dev-pl...@lists.mozilla.org

On 4/3/13 4:11 PM, Gregory Szorc wrote:

I pulled the raw builder logs from
https://secure.pub.build.mozilla.org/builddata/buildjson/ and assembled
a tab-separated file of all the builds for 2013-03-17 through 2013-03-23:

https://people.mozilla.com/~gszorc/builds-20130317-20130323.txt.bz2

Yes, there were over 290,000 jobs performed that week!

I look forward to interested posts derived from this data. I'll start.

Here is the percent of total builder time we spent performing jobs
broken down by tree:

inbound 43.98%
try 27.48%
central 5.24%
aurora 3.89%
fx-team 2.67%
...

We have an inbound and try problem.

In case you are wondering, the sum of the times of all jobs this week
was 486014956s. That's 5625 days of machine time.

Ehsan Akhgari

unread,

Apr 3, 2013, 8:55:36 PM4/3/13

to Joshua Cranmer 🐧, dev-pl...@lists.mozilla.org

Wouldn't this make the current problems caused by coalescing test runs
even worse? This will make it much harder to deal with bustage: when
mochitest-1 on Linux goes orange, nobody knows what broke it because the
last time it ran was N pushes ago, N being a much larger number than it
is today as a result of coalescing.

Cheers,
Ehsan

Ehsan Akhgari

unread,

Apr 3, 2013, 8:57:40 PM4/3/13

to L. David Baron, dev-pl...@lists.mozilla.org

On 2013-04-03 6:36 PM, L. David Baron wrote:
> On Wednesday 2013-04-03 17:31 -0400, Kartikaya Gupta wrote:

>> 1. Take the latest green m-c change, commit your patch(es) on top of
>> it, and push it to try.
>> 2. If your try push is green, flag it for eventual merge to m-c and
>> you're done.
>> 3. If your try push is not green, update your patch(es) and go back
>> to step 1.
>

> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind. (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)

Yes. In addition to the cost of try pushes, this also incurs the cost
of building and testing those merge commits which would not have existed
otherwise. In a world where 1 out of 2-3 pushes is bustage this might
be a good idea, but that's not currently the case.

Cheers,
Ehsan

Ehsan Akhgari

unread,

Apr 3, 2013, 9:00:43 PM4/3/13

to Gregory Szorc, L. David Baron, dev-pl...@lists.mozilla.org

On 2013-04-03 7:11 PM, Gregory Szorc wrote:

> On 4/3/13 3:36 PM, L. David Baron wrote:
>> On Wednesday 2013-04-03 17:31 -0400, Kartikaya Gupta wrote:

>>> 1. Take the latest green m-c change, commit your patch(es) on top of
>>> it, and push it to try.
>>> 2. If your try push is green, flag it for eventual merge to m-c and
>>> you're done.
>>> 3. If your try push is not green, update your patch(es) and go back
>>> to step 1.
>>

>> This seems like it would lead to a substantial increase in
>> build/test load -- one that I suspect we don't currently have the
>> hardware to support. This is because it would require a full
>> build/test run for every push, which we avoid today because many
>> builds and tests get merged on inbound when things are behind. (I
>> also don't feel like we need a full build/test run for every push,
>> so it feels like unnecessary use of resources to me.)
>

> Let me ask a related question: why does my push to change a .js file in
> /services incur 40+ hours of machine time to run the full reftest suite
> - something which it is practically guaranteed to not interfere with?

Because we don't have a dependency graph between source code files and
unit tests that will be testing something different if you change
something in that source file.

Note that it's very easy to come up with simple examples of unneeded
test runs such as the one you gave above, but this is a very hard
problem to solve. For this specific one, please file a RelEng bug.
We've already limited the full set of builds/tests for things that only
touch browser/, b2g/, mobile/, etc. But fixing this would be a tiny
improvement over the situation today.

> IMO we should have a mechanism in place that audits changesets for
> likely impact, chooses a reasonable set of builds/tests to perform, and
> runs with it. We can supplement this with "no changeset lands in m-c
> until all builds/tests have ran on it" (i.e. a later push on that tree
> or a manual or timer triggered full gamut run). This general idea is
> being considered in bug 849385.

Do you have a concrete proposal on what that mechanism would look like?

Cheers,
Ehsan

Clint Talbert

unread,

Apr 3, 2013, 9:10:51 PM4/3/13

to

Actually, the issues I am aware of were around a security issue and a
person-resource issue to complete the work. The security issue has been
fixed, and this current quarter (Q2), Release Engineering and the
Bugzilla team are planning to team up to finish the work here.

One of the really nice things we can do with autoland is schedule
non-time-sensitive patches for landing when the tree is quieter, like
1AM pacific.

But my point is that autoland is coming back and we hope to have it on a
BMO instance near you soon.

Clint

Ehsan Akhgari

unread,

Apr 3, 2013, 9:33:07 PM4/3/13

to Clint Talbert, dev-pl...@lists.mozilla.org

Do you know who's going to work on this? I'd like us to avoid this work
if it's intended for anything more than pushing to try unless we have a
solution to the two problems I mentioned in my other email.

Thanks!
Ehsan

jmaher

unread,

Apr 3, 2013, 9:33:42 PM4/3/13

to L. David Baron, dev-pl...@lists.mozilla.org

I looked at the data used to calculate the offenders, and I found:

total type, total jobs, total duration, total hours
try builders, 3525, 12239477, 3399.85472222
try testers, 71821, 121294315, 33692.8652778
inbound builders, 7862, 30877533, 8577.0925
inbound testers, 121641, 182883638, 50801.0105556
other builders, 14690, 26990702, 7497.41722222
other testers, 75170, 111729324, 31035.9233333
totals: 294709, 486014989.0, 135004.163611

It looks like our try server usage for builders is not that bad, but our use of testers is much higher than that of all the other branches sans mozilla-inbound.

Trevor Saunders

unread,

Apr 3, 2013, 9:36:00 PM4/3/13

to dev-pl...@lists.mozilla.org

On Wed, Apr 03, 2013 at 08:55:36PM -0400, Ehsan Akhgari wrote:
> On 2013-04-03 7:44 PM, Joshua Cranmer 🐧 wrote:

> Wouldn't this make the current problems caused by coalescing test
> runs even worse? This will make it much harder to deal with
> bustage: when mochitest-1 on Linux goes orange, nobody knows what
> broke it because the last time it ran was N pushes ago, N being a
> much larger number than it is today as a result of coalescing.

maybe, maybe not, it depends on how frequently linux 32 debug
mochitest-1 breaks vs mochitest-1 breaks as the result of a patch, and
the number of patches mochitest-1 linux 32 debug jobs that we currently
coalesce.

Trev

>
> Cheers,
> Ehsan

Jeff Hammel

unread,

Apr 3, 2013, 10:59:15 PM4/3/13

to dev-pl...@lists.mozilla.org

On 04/03/2013 06:33 PM, Ehsan Akhgari wrote:
> On 2013-04-03 9:10 PM, Clint Talbert wrote:

\o/

>
> Do you know who's going to work on this? I'd like us to avoid this
> work if it's intended for anything more than pushing to try unless we
> have a solution to the two problems I mentioned in my other email.
>
> Thanks!
> Ehsan
>

So I'm not sure I understand:

> 1. This will incur a significant increase in our infra resource usage
since all of these patches have to do a full try run. We simply cannot
afford that in today's world where we're struggling against wait times
and infrastructure overload.

If we push to e.g. inbound, we still run all the tests + builds. My
understanding was the whole point was to avoid the double run (running
all/most tests/builds on try and then landing on e.g. inbound where
they're all run again). Is the concern the overhead pushing to try?
autoland or no, that is a problem indeed. If not, I don't really
understand the concern here.

2., all of our damn intermittent, is indeed a concern :( And of course
not just for this reason: tons of human and often machine time is lost
to them. I can imagine this being addressed for autoland, but will
leave ctalbert or others to comment as to plans there since I'm sure
they've thought of everything I could propose right now (and then some).
Obviously the best solution is to kill all the orange, but ... yeah

Jeff

Clint Talbert

unread,

Apr 3, 2013, 11:01:40 PM4/3/13

to

On 4/3/2013 6:33 PM, jmaher wrote:
> I looked at the data used to calculate the offenders, and I found:
>
> total type, total jobs, total duration, total hours
> try builders, 3525, 12239477, 3399.85472222
> try testers, 71821, 121294315, 33692.8652778
> inbound builders, 7862, 30877533, 8577.0925
> inbound testers, 121641, 182883638, 50801.0105556
> other builders, 14690, 26990702, 7497.41722222
> other testers, 75170, 111729324, 31035.9233333
> totals: 294709, 486014989.0, 135004.163611
>

The sheriffs and releng and I have been talking about this problem for
the last month or two, knowing that we were running way in the red. We
have a bunch of solutions, but we hadn't yet crunched the numbers to see
what our best solution is on the way forward. Our best solution is
certainly going to be some combination of process change combined with
some amount of technical optimizations. But what we focus on when is
the million dollar question.

Joel and I did some calculations:
* 200 pushes/day[1]
* 325 test jobs/push
* 25 builds/push
* .41 hours/test (on average, from above numbers)
* 1.1 hours/build (on average, based on try values from above)

Then you can approximate what the load of Kat's suggestion would look
like: 200pushes/day * ((325test/push * .41hrs/test) + (25builds/push *
1.1 hrs/bld)) = 32150hrs/day

So we need 32150 compute hours per day to keep up.
If you see above our totals for the week of data that gps provided us
with you can see that we are currently running at: 135004hours/week /
7days = 19286 compute hours/day

So, while I really like Kat's proposal from a sheriffing standpoint, my
back of the napkin math here makes me worry that our current
infrastructure can't support it.

The only way I see to do something like this approach would be to batch
patches together in order to reduce the number of pushes, or to run
tests intermittently like both dbaron and jmaher mentioned.

We need to figure out what we can do to make this better--it's a
terrible problem. We're running way in the red, and we do need to see
what we can do to be more efficient with the infrastructure we've got
while at the same time finding ways to expand it. There are expansion
efforts underway but expanding the physical infrastructure is not a
short term project. This is on our goals (both releng and ateam) for Q2,
and any help crunching numbers to understand our most effective path
forward is appreciated.

Clint

Jesse Ruderman

unread,

Apr 3, 2013, 11:05:47 PM4/3/13

to

I suggest adding an Auto branch between Try and Central. Advantages:

* Pulling from Central is safe, because it only gets csets that passed
both Try (as individual developer pushes) and Auto (as a group).

* Infrastructure load will be slightly lower, because everyone's pushes
to Try will have a working Central base.

* We can use jcranmer's "sparse test matrix" trick for some Try pushes,
knowing that Auto will catch platform-specific failures before they
reach Central.

* In Scenario D (when subtle patch interactions cause build or test
failures), automation can move on to another set of Try landings, giving
sheriffs time react without the pressure of tree closure. (Sheriffs can
then fill in the test matrix, redo the merge in a way that keeps
mozilla-central free of non-working changesets, or just let the group of
patches bounce.)

* We can add expensive tests (ASan, MSan, TSan, UBSan) without requiring
every Try push to run them.

To clarify my proposal, the ideal scenario looks like:

1. Five patches land separately on Try.
2. (They pass quick(?) tests and don't conflict in obvious ways.)

3. Automation lands the combination (merged or rebased) on Auto.
4. (The combination passes full tests.)

5. Automation pushes the resulting changeset to Central.
6. (No additional tests run, because they'd be redundant.)

Ehsan Akhgari

unread,

Apr 3, 2013, 11:13:37 PM4/3/13

to Jeff Hammel, dev-pl...@lists.mozilla.org

On 2013-04-03 10:59 PM, Jeff Hammel wrote:
> So I'm not sure I understand:
>
> > 1. This will incur a significant increase in our infra resource usage
> since all of these patches have to do a full try run. We simply cannot
> afford that in today's world where we're struggling against wait times
> and infrastructure overload.
>
> If we push to e.g. inbound, we still run all the tests + builds. My
> understanding was the whole point was to avoid the double run (running
> all/most tests/builds on try and then landing on e.g. inbound where
> they're all run again). Is the concern the overhead pushing to try?
> autoland or no, that is a problem indeed. If not, I don't really
> understand the concern here.

Autoland needs to first push to try, and then possibly push to
inbound/central. Without autoland, lots of people avoid using the try
server when they make a judgement call on a patch being safe enough.
With Autoland, that will stop, hence there will be an infra load increase.

> 2., all of our damn intermittent, is indeed a concern :( And of course
> not just for this reason: tons of human and often machine time is lost
> to them.

Sure.

> I can imagine this being addressed for autoland, but will
> leave ctalbert or others to comment as to plans there since I'm sure
> they've thought of everything I could propose right now (and then some).
> Obviously the best solution is to kill all the orange, but ... yeah

Back when I was fighting this war with the help of others, we got the
orange factor down to about 1 orange per push, which was fantastic. Our
experience has shown that this is definitely doable, it just needs
somebody to own the effort and work on it.

Cheers,
Ehsan

Kartikaya Gupta

unread,

Apr 4, 2013, 1:45:36 AM4/4/13

to

Thanks for the many comments! I replied to a bunch of stuff below.

On 13-04-03 18:34 , Justin Lebar wrote:
> This is a really interesting idea.
>
> git is smart enough to let you pull only the csets that end up getting
> merged into master. I don't know how multiple heads works with hg,
> but I wonder if we could make hg smart enough to do the same thing.

I'm not sure if this is what you're asking, but you can pull a specific
head and its ancestor changes using "hg pull -r <rev>".

On 13-04-03 18:36 , L. David Baron wrote:
> This seems like it would lead to a substantial increase in
> build/test load -- one that I suspect we don't currently have the
> hardware to support. This is because it would require a full
> build/test run for every push, which we avoid today because many
> builds and tests get merged on inbound when things are behind. (I
> also don't feel like we need a full build/test run for every push,
> so it feels like unnecessary use of resources to me.)

This is a valid point. I'll note here that a lot of people find the
test/build coalescing on inbound problematic because it makes finding
regressions much harder, and in general I would like to kill such
coalescing. That being said, if you do believe that you don't need a
full test run for every push, then it's extremely simple to do that in
my proposal as well. When the patches are pushed to try, the developer
can choose to run specific sets of tests (which they should already be
doing using trychooser) rather than all of the tests. The rest of my
proposal remains unchanged.

(Note also that I did exactly this when I pushed 6348af4fe6aa to try
earlier today - I knew it would only touch android, so I only ran the
android tests. And once it came out green, I merged it to m-c.)

Using this modification to the original proposal, developers have more
control over which tests are run on their changes. I think this is
better than having some hard-coded set of rules in moz.build files or
the tbpl database, which are (a) very difficult to get right and (b) can
easily get out of date.

If the sheriffs are not satisfied with the test coverage that the
developer chose when pushing to try, they can choose to not merge the
patch. If the developer omitted some tests that did end up breaking on
the merge to m-c, then the situation is no worse than it is now with
coalesced tests - that is, some amount of effort will be needed to
figure out what actually broke the tests.

On 13-04-03 18:43 , Gary Kwong wrote:
> autoBisect relies heavily on `hg bisect` and has been working well with
> JSBugMon especially in Spidermonkey for the better part of ~ 4 years now.
>

How does it deal with bisecting through merges? Do you think it would be
affected by my proposal? Justin's comments in this thread about how
bisecting might actually be improved also make sense to me.

On 13-04-03 19:28 , Joshua Cranmer 🐧 wrote:
> I think the relative merits of this approach is easily decidable by
> looking at two numbers:
> 1. How many of our patch landings would have non-trivial merge conflicts?
> 2. How much patch bustage is caused by interaction failures instead of
> patch author/reviewer negligence?
>
> Note that your proposal makes both of these cases strictly worse: in the
> first case, you force the tree sheriffs (instead of developers) to spend
> more time resolving conflicts; in the second, sheriffs have to spend
> more time finding the cause of the bustage. Keep in mind that we have
> about three sheriffs and about a hundred developers--so it is better to
> optimize for sheriffs' time instead of developers' time (at the very
> least, developers represent better parallelization opportunities).

You have good points, but you're assuming that the sheriffs have to deal
with these problems. I would be perfectly fine with saying "in case of
non-trivial merge conflicts, the sheriffs should leave the patch on try
and ask the developer to rebase against a newer m-c". That is, throw
merge conflicts back to the developer to resolve. This is effectively
what happens now anyway because developers who lose push races have to
manually resolve their merge conflicts.

In the second case (patch bustage due to interaction), again I think
this could be thrown back to the developer, as I stated in my original
proposal (see comments under Scenario D).

I would also be interested in getting concrete numbers for these two
things, though.

On 13-04-03 19:49 , Jesse Ruderman wrote:
> But can we do this with rebased changesets instead of "trivial" merge
> changesets? While the core of hg can handle merges, pretty much none of
> the tools we rely on for understanding history (hg {log, grep, diff,
> bisect}) handle them well.

Do you have specific examples of how hg log, grep, and diff break down
with merges? I've had issues with bisect, but AFAIK the other commands
should be able to deal with merges well enough.

On 13-04-03 19:59 , Jeff Hammel wrote:

> On 04/03/2013 04:44 PM, Joshua Cranmer 🐧 wrote:
>> Instead of running {mochitest-*,reftest,crashtest,xpcshell,marionette}
>> on every single desktop platform on every inbound checkin, run them
>> round-robin. A given push would trigger mochitest on Linux, reftest on
>> mac, and then the next test would run reftest on Linux and mochitest
>> on Mac. Each push would still end up running the full testsuite
>> (modulo some tests that are only run on some platforms) on some
>> platform, and we would be using fewer test resources to achieve that
>> coverage. If most actual test bustage is not platform-specific, this
>> would give us generally sufficiently good coverage for most checkins.
>> I am not qualified to say, however, if this is the case.
>>

> If there was a way/guidelines to do this in some sensible way, or even a
> cultural norm like "reviewer comments on what tests to run", I'd give a
> +0.5 here fwiw

Just wanted to point out that the modified proposal I suggest earlier in
this email allows this to happen, and without requiring any changes
whatsoever to our existing infrastructure. Just use regular trychooser
syntax to pick your tests when pushing to try.

On 13-04-03 20:39 , Gregory Szorc wrote:
> I pulled the raw builder logs from
> https://secure.pub.build.mozilla.org/builddata/buildjson/ and assembled
> a tab-separated file of all the builds for 2013-03-17 through 2013-03-23:

Awesome! I will try to pull out some useful metrics from this. For
example, how much machine time was spent on backout pushes? This is
something that should be almost completely eliminated in my proposal, so
I'm curious to know as a ballpark figure how much saving we would get
from this alone.

On 13-04-03 20:57 , Ehsan Akhgari wrote:
> Yes. In addition to the cost of try pushes, this also incurs the cost
> of building and testing those merge commits which would not have existed
> otherwise. In a world where 1 out of 2-3 pushes is bustage this might
> be a good idea, but that's not currently the case.
>

Disagree. Consider:

----------
Scenario E - the one with no bugs
----------
In this scenario, all programmers are perfect and write perfect patches
that never have bugs. They also know they are perfect so they always
push directly to inbound and never to try.

Current process:

Patch A lands on inbound, tree is green.
Patch B lands on inbound, tree is green.
Patch C lands on inbound, tree is green.
Patch D lands on inbound, tree is green.
Patch E lands on inbound, tree is green.
Inbound is merged to m-c, tree is green.

TBPL time used: 6 units.

My suggested process:

Patch A lands on try, tree is green.
Patch B lands on try, tree is green.
Patch C lands on try, tree is green.
Patch D lands on try, tree is green.
Patch E lands on try, tree is green.
Patches A, B, C, D, and E are merged to m-c, tree is green.

TBPL time used: 6 units.

The number of merge commits in both cases depends on how frequently the
sheriffs choose to merge to m-c. I see no reason to increase or decrease
the frequency just because of the change in process. Of course, if the
overall machine resource usage goes down then we may be able to increase
the rate at which we land patches. This might increase the frequency
with which merges are done, but the system will reach an equilibrium
point at a higher commit frequency rather than diverging.

On 13-04-03 23:01 , Clint Talbert wrote:
> Joel and I did some calculations:
> * 200 pushes/day[1]
> * 325 test jobs/push
> * 25 builds/push
> * .41 hours/test (on average, from above numbers)
> * 1.1 hours/build (on average, based on try values from above)

[snip]

> So we need 32150 compute hours per day to keep up.
> If you see above our totals for the week of data that gps provided us
> with you can see that we are currently running at: 135004hours/week /
> 7days = 19286 compute hours/day
>

This is interesting, and I will need to spend some more time looking at
the data before I can respond to this properly. However I'm not entirely
sure where you got your 325 test jobs/push and 25 builds/push numbers
from. If we are doing 200 pushes/day that's 1400 pushes/week. gps' data
file has 268632 test jobs and 26077 build jobs, which works out to
191.88 test jobs/push and 18.6 builds/push.

Cheers,
kats

Jesse Ruderman

unread,

Apr 4, 2013, 3:23:54 AM4/4/13

to

On 4/3/13 10:45 PM, Kartikaya Gupta wrote:
> On 13-04-03 19:49 , Jesse Ruderman wrote:
> > But can we do this with rebased changesets instead of "trivial" merge
> > changesets? While the core of hg can handle merges, pretty much
none of
> > the tools we rely on for understanding history (hg {log, grep, diff,
> > bisect}) handle them well.
>
> Do you have specific examples of how hg log, grep, and diff break down
> with merges? I've had issues with bisect, but AFAIK the other commands
> should be able to deal with merges well enough.

Understanding a changeset:

* Try to understand what happened in a changeset. (For a normal
changeset, you'd use "hg diff" or "hg log -p" or "hg export".) For a
merge changeset, your first question might be whether the merge was
trivial. If not, where not, and how was each conflict resolved?

* Try to understand the shape of history around a merge. Which parent
was 'inbound' and which parent was 'central'? From central's
perspective, what changes did this bring in from inbound? (These
questions are possible to answer with pushlog, debugancestor, and log.)

hg log:

* [hg log modules/libpref/src/init/all.js] includes lots of bogus
trivial merges.

* Was revision X quickly followed by a backout? With branchy history,
there's no "the next 10 changesets".

hg bisect:

There are three reasons hg bisect might blame a merge. I've hit them
all many times and they're all unpleasant. (I taught Gary's bisect
wrapper to distinguish between the cases, which helps a little.)

A. From the perspective of mozilla-central, the bug was introduced by a
merge from inbound. My 'start' revision came after the common ancestor.
I have to start over or use --extend.

B. The problem was actually introduced by the merge; both parents are
free of the bug. What patches on each side contributed to the problem?
hg bisect can't answer this question.

C. I marked a changeset as 'bad' when it actually had a different bug
than the one I was bisecting for. hg bisect blames a random merge
changeset whose parents it hasn't even tested.

hg grep and hg blame:

* These seem ok, actually. Except that I lose my sense of the order in
which changes were made.

These are just the problems I've hit with the occasional merges from
long-lived branches, and between inbound and central. I'd expect things
to be much worse if every changeset were accompanied by its own merge.
The "merge, then fix" possibility for scenario D scares me too.

Neil

unread,

Apr 4, 2013, 5:54:07 AM4/4/13

to

Gregory Szorc wrote:

> Here is the percent of total builder time we spent performing jobs
> broken down by tree:
>
> inbound 43.98%
> try 27.48%
> central 5.24%

> ...
>
> We have an inbound and try problem.

After patches have passed try, do people then push them to inbound,
because they don't want to watch the tree? Maybe it should be possible
for sheriffs to merge/rebase a completely green try run to m-c as if it
was an m-i merge.

--
Warning: May contain traces of nuts.

jmaher

unread,

Apr 4, 2013, 9:08:45 AM4/4/13

to

the 325 jobs per push come from manually counting jobs on tbpl (ignoring pgo). remember to use showall=1. The total stats from gps include try which has much fewer test jobs per push and inbound coalescing.

jmaher

unread,

Apr 3, 2013, 9:33:42 PM4/3/13

to mozilla.de...@googlegroups.com, L. David Baron, dev-pl...@lists.mozilla.org

I looked at the data used to calculate the offenders, and I found:

total type, total jobs, total duration, total hours
try builders, 3525, 12239477, 3399.85472222
try testers, 71821, 121294315, 33692.8652778
inbound builders, 7862, 30877533, 8577.0925
inbound testers, 121641, 182883638, 50801.0105556
other builders, 14690, 26990702, 7497.41722222
other testers, 75170, 111729324, 31035.9233333
totals: 294709, 486014989.0, 135004.163611

Kartikaya Gupta

unread,

Apr 4, 2013, 10:12:57 AM4/4/13

to

On 13-04-03 23:05 , Jesse Ruderman wrote:
> I suggest adding an Auto branch between Try and Central. Advantages:

[snip]

> * In Scenario D (when subtle patch interactions cause build or test
> failures), automation can move on to another set of Try landings, giving
> sheriffs time react without the pressure of tree closure.

This part doesn't make sense to me. When you say "automation can move on
to another set of Try landings", where would these patches go?
Presumably on this "Auto" branch, since all patches go through Auto. But
in this scenario, Auto is broken from the previous set of try landings,
so you wouldn't really want to put them there. Does Auto also allow
multiple heads? That's the only way this makes sense. But then if Auto
has multiple heads, presumably those heads are getting merged onto m-c,
and so you'll need to run tests on m-c. I think this contradicts your
other statement (step 6). Can you clarify?

On 13-04-04 03:23 , Jesse Ruderman wrote:> * Try to understand what

happened in a changeset. (For a normal
> changeset, you'd use "hg diff" or "hg log -p" or "hg export".) For a
> merge changeset, your first question might be whether the merge was
> trivial. If not, where not, and how was each conflict resolved?

Fair point.

> * Try to understand the shape of history around a merge. Which parent
> was 'inbound' and which parent was 'central'? From central's
> perspective, what changes did this bring in from inbound? (These
> questions are possible to answer with pushlog, debugancestor, and log.)

Agreed. Even now I have trouble figuring out which parent was inbound
and which was central. I think have a policy on how to do the merge and
having it consistently followed would help a lot here, even today. By
this I mean dictate that "when merging to m-c, you must always update to
current m-c, and then merge the new changesets in". That way you know
that the first parent of the merge head was m-c, and the other parents
were merged in. Currently with the inbound <-> m-c merges I don't
believe there is a policy on this. For example, cset 475dc5f51bdb (which
was an inbound -> m-c merge) has a first parent that is from inbound and
a second parent from m-c. Cset 686d76b44d9f on the other hand, which was
a fx-team -> m-c merge, has a first parent that is from m-c and a second
parent from fx-team.

> hg log:
>
> * [hg log modules/libpref/src/init/all.js] includes lots of bogus
> trivial merges.

Yes, it does. You can skip all the merges with -M though. I'm not sure
if there are conditions in which you'd want to list only non-trivial
merges in the log; it seems to me that hg blame is more suited for any
such use cases, because the change description of a non-trivial merge is
pretty useless. (That being said, we could make a policy of annotating
non-trivial merges with something in the change description - e.g.
"Merge inbound to m-c; non-trivial").

> * Was revision X quickly followed by a backout? With branchy history,
> there's no "the next 10 changesets".

In my proposal you would almost never have a backout cset get merged to
m-c. At least not because of tree bustage; maybe for other reasons like
improper review. However then the backout would likely not be "quickly
following" the original change anyway.

> hg bisect:
>
> There are three reasons hg bisect might blame a merge. I've hit them
> all many times and they're all unpleasant. (I taught Gary's bisect
> wrapper to distinguish between the cases, which helps a little.)
>
> A. From the perspective of mozilla-central, the bug was introduced by a
> merge from inbound. My 'start' revision came after the common ancestor.
> I have to start over or use --extend.
>
> B. The problem was actually introduced by the merge; both parents are
> free of the bug. What patches on each side contributed to the problem?
> hg bisect can't answer this question.
>
> C. I marked a changeset as 'bad' when it actually had a different bug
> than the one I was bisecting for. hg bisect blames a random merge
> changeset whose parents it hasn't even tested.
>

Yeah. TBH I don't have a good solution for hg bisect, as I haven't spent
a lot of time with it. I just want to point out that all these problems
exist now anyway; they might get a little more frequent and/or annoying.

On 13-04-04 05:54 , Neil wrote:> After patches have passed try, do

people then push them to inbound,
> because they don't want to watch the tree? Maybe it should be possible
> for sheriffs to merge/rebase a completely green try run to m-c as if it
> was an m-i merge.

This is pretty much exactly what I'm suggesting.

Cheers,
kats

Kartikaya Gupta

unread,

Apr 4, 2013, 10:59:53 AM4/4/13

to Jeff Hammel

(Moving thread from from m.d.tree-managment to m.d.platform)

On 13-04-03 19:03 , Jeff Hammel wrote:
> Without wanting to start a contraversy, is there a particular reason to
> use multiple trees here vs e.g. a checkin-needed flag on bugzilla or
> similar?

I'm not quite sure what you're suggesting. Are you suggesting using
checkin-needed instead of landing on try, and then having the patches
merged directly to m-c? Patches will often contain bugs that only
manifest on tbpl and not in local testing, so pushing patches to m-c
directly without at least one tbpl iteration means that m-c will be
broken most of the time.

Cheers,
kats

Kartikaya Gupta

unread,

Apr 4, 2013, 11:07:03 AM4/4/13

to

On 13-04-03 19:49 , Jesse Ruderman wrote:

Thinking about it a bit more, I'm actually warming up to the idea of
implementing this with rebased changesets rather than trivial merges. I
had assumed that hg supported merge changesets with >2 parents, but this
does not appear to be the case in my (limited) testing. Git allows you
to merge more than two heads/branches together but AFAICT hg does not.
So attempting to merge more than 1 patch from try at a time would in
fact result in chained merge nodes, which isn't great. Doing a rebase of
the try heads might work better.

(Of course, if we switched to git... :))

Cheers,
kats

Gregory Szorc

unread,

Apr 4, 2013, 4:22:10 PM4/4/13

to Kartikaya Gupta, dev-pl...@lists.mozilla.org

You are correct.

Git supports N>2 parent merges (octopus merges). Mercurial does not.

Interesting trivia point: this is one of the very few incompatibilities
between the way Mercurial and Git structure commits and trees.

hg-git (the tool that converts Git and Mercurial repos to and from each
other) handles octopus merges by inserting extra Mercurial changesets
and then annotating those changesets to denote they came from the same
Git commit (so it can reconstruct the original Git commit).

Jesse Ruderman

unread,

Apr 4, 2013, 6:23:52 PM4/4/13

to

On 4/4/13 7:12 AM, Kartikaya Gupta wrote:
> On 13-04-03 23:05 , Jesse Ruderman wrote:
>> I suggest adding an Auto branch between Try and Central. Advantages:
> [snip]
>> * In Scenario D (when subtle patch interactions cause build or test
>> failures), automation can move on to another set of Try landings, giving
>> sheriffs time react without the pressure of tree closure.
>
> This part doesn't make sense to me. When you say "automation can move on
> to another set of Try landings", where would these patches go?
> Presumably on this "Auto" branch, since all patches go through Auto. But
> in this scenario, Auto is broken from the previous set of try landings,
> so you wouldn't really want to put them there. Does Auto also allow
> multiple heads? That's the only way this makes sense. But then if Auto
> has multiple heads, presumably those heads are getting merged onto m-c,
> and so you'll need to run tests on m-c. I think this contradicts your
> other statement (step 6). Can you clarify?

Auto would allow multiple heads, but only for the purpose of discarding
a failed head and replacing it with a different one. Only the newest
head would be "active" and a candidate for being pushed to Central.