On 12/12/11 4:26 PM, John Ford wrote:
> 3) following this "bustage is backed out right away" imperative on
> the wiki [1] and update it to clarify that "pgo bustage means backing
> out until last green pgo build"
>
> [1]
https://wiki.mozilla.org/Tree_Rules/Inbound#Sheriff_Duty
Heh. Aren't you a little curious about why that rule has never, ever,
not even once, been applied? Let's flesh it out a bit with some examples.
"""
Bustage is backed out right away. Changes landed on top of bustage will
be backed out to minimize overhead/time to fix the tree. They will be
relanded by the sheriff once the bustage is cleared.
Say for example, that a push with six csets includes one which adds a
new test, which fails on 10.6 opt. That failure will show up after two
to three hours, during which time another ten or fifteen pushes will
have landed on top of it. The sheriff will immediately back out all
eleven or sixteen pushes, and then will immediately reland all but the
single offensive cset, making him wonder why on earth he just doubled
the build and test load when he knew exactly what to back out, and no
harm had been done to the later pushes. The next time, rather than
relanding in eleven or sixteen separate pushes, he relands them all in a
single blob, which then gets the blame from the talos regression finding
script. The sheriff begins drinking.
---
Say, for example, that we do green PGO builds on a push, then there are
three more pushes including a merge from mozilla-central, then we do PGO
builds on the fourth push, then there are two more pushes, then one of
the PGO builds on the fourth push fails. Call this "philor's Tuesday the
6th of December."
The sheriff will not retrigger that build to see if it was some
transitory problem with the slave of the sort we have all the time. The
sheriff will not close the tree while he tries to figure out the
problem. The sheriff will back out all twenty five csets, leaving the
tree open while he does, rebasing his backout as more things land, and
backing them out too. When he finally wins the push race (there were
pushes every ten minutes or so for the rest of that day) and lands his
backout of twenty five or thirty or forty pushes, he will immediately
reland all but the first four pushes, because he had no reason to
suspect them, and no reason to back them out, either, but that is The Rule.
Then, the sheriff will reschedule all his meetings for the day, or if he
is a volunteer, call in sick at his day job, and push the first four to
try, triggering PGO on the affected build. As each one comes back green
from try, he will reland them. Because it was both intermittent and
cumulative, he will find that every single one of them is green on try,
and will reland all of them. If, by chance, something lands while he is
doing that which shrinks libxul slightly, then even when he relands all
of them on inbound, they will still be green. If, by chance, something
lands while he is doing that which expands libxul slightly, then the
ones which were green on try will fail on inbound. The sheriff will
begin drinking again.
"""
That rule solves two problems: it completely removes from the sheriff's
toolbox the option of closing the tree to deal with bustage, and it
makes actually being the inbound sheriff an intolerable job that nobody
would want to do, rather than the current intolerable job that nearly
nobody wants to do.
Oddly enough, those are not problems that we need to solve.
Had we applied it when you wanted us to, on Friday (that being the only
time we closed the tree for this instance of PGO bustage, I dealt with
Tuesday and Wednesday without feeling the need to close), the net effect
would have been to allow us to churn like crazy while we still didn't
know what the problem was, backing out blameless patches along with
things that went into libxul, then letting a new thing that went into
libxul fill up the remaining space before we relanded any of the things
that had made it under the limit but got caught up in the last backout.
We didn't ever follow that rule in the past because it's a sledgehammer
we don't need, a sledgehammer which mostly applies itself to the
sheriff's head; in the future, we won't follow it both because it's a
sledgehammer we don't need, and because we now know that there are times
when backing out to the last green and allowing anything other than what
landed after it to land is not always the solution to a problem.