Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Staging tree and automated testing of candidates for m-c

0 views

Skip to first unread message

Zack Weinberg

unread,

Oct 28, 2008, 12:08:27 AM10/28/08

to dev. planning

I originally wrote this proposal a couple weeks ago in response to a
long discussion in the platform meeting about how we could avoid
lengthy tree closures while hunting difficult regressions, and sent it
to beltzner. I haven't heard back from him and the topic came up again
in conversation over dinner today, so I thought I'd put this out to a
wider audience now. The proposal is intended primarily to reduce the
chances of regression-causing patches landing on m-c in the first
place, but I think it would also make it easier to deal with them when
they do occur.

I propose to introduce a second mercurial repository, let's call it
mozilla-staging, which takes the primary test burden away from -central.
Developers will normally *pull* from -central but *push* to -staging;
changesets propagate from -staging to -central automatically, after they
pass acceptance tests. There is no total ordering on -staging.
Developers are free to push changesets whose parent is not the tip, and
without worrying about whether someone else has pushed a changeset.
This actually makes all aspects of testing easier.

I need to explain exactly how the automation works before I can talk
about its advantages and disadvantages. If you get bored, feel free to
skip to the end. :) ASCII art follows; please to read in monospace
font.

Suppose we have some base revision, BASE, and four developers
simultaneously push changes against it. I'll label the new revisions
with their authors' first names: ARTIE, HELEN, DAVE, MELL.

m-c ...-BASE(+)

m-s ...-BASE(+)--ARTIE
+-HELEN
+-DAVE
\-MELL

The (+) suffix on BASE means it is marked as having passed the complete
test suite. Note that nothing has changed on m-c yet. The buildbots
notice that there are new unmarked heads on m-s and fire off test
cycles for all four of them, in parallel (or, in practice, as many of
them as we have the hardware to handle). Each changeset, in isolation,
is run through the full set of tests - everything we have. Let's say
that Mell's change has a regression on something or other, but the other
three are fine. The buildbots label each head accordingly:

m-s ...-BASE(+)--ARTIE(+)
+-HELEN(+)
+-DAVE(+)
\-MELL(-)

Mell's change is marked as busted (-), so that head will be ignored from
now on. The other three are candidates for merging back to m-c. Let's
say that those three get through testing in the vertical order shown.
The merge program notes that the head of m-c is still BASE, so it can
just push ARTIE back to m-c:

m-c ...-BASE(+)--ARTIE(+)

m-s ...-BASE(+)--ARTIE(+)
+-HELEN(+)
+-DAVE(+)
\-MELL(-)

It then attempts to merge HELEN with ARTIE *within m-s*. This
succeeds, so the merged rev (let's call it AH) and both parents go back
to m-c.

m-c ...-BASE(+)--ARTIE(+)--AH
+-HELEN(+)-/

m-s ...-BASE(+)--ARTIE(+)--AH
+-HELEN(+)-/
+-DAVE(+)
\-MELL(-)

And then it tries to merge DAVE with AH. This fails. DAVE is
downgraded to (-) within m-s and does not go to m-c.

m-c ...-BASE(+)--ARTIE(+)--AH
+-HELEN(+)-/

m-s ...-BASE(+)--ARTIE(+)--AH
+-HELEN(+)-/
+-DAVE(-)
\-MELL(-)

The bots now idle until new code arrives on m-s. It doesn't matter what
parent revision that new code has. Maybe Mell has fixed her bug, using
her original change as a baseline:

m-c ...-BASE(+)--ARTIE(+)--AH
+-HELEN(+)-/

m-s ...-BASE(+)--ARTIE(+)--AH
+-HELEN(+)-/
+-MELL(-)--MELL2
\-DAVE(-)

That's fine; MELL2 is tested, found good, merged successfully against
AH, and the result pushed to m-c (not shown; the only difference is,
m-c doesn't have DAVE.)

m-s ...-BASE(+)--ARTIE(+)--AH--------AHM
+-HELEN(+)-/ /
+-MELL(-)--MELL2(+)-/
\-DAVE(-)

But a fifth developer, Lupin, came in while that was happening and
pushed another change using AH as a baseline. It'll be tested as is
and then merged against AHM.

/-LUPIN(+)--\
m-s ...-BASE(+)--ARTIE(+)--AH--------AHM--AHML
+-HELEN(+)-/ /
+-MELL(-)--MELL2(+)-/
\-DAVE(-)

Dave is slacking off and hasn't noticed that his patch failed to merge
cleanly, but Zeta cares about that bug, so she merges AHML on top of the
DAVE head in m-s, fixes the rejects, and pushes the result back to m-s.

/-LUPIN(+)--\
m-s ...-BASE(+)--ARTIE(+)--AH--------AHM--AHML
+-HELEN(+)-/ / \
+-MELL(-)--MELL2(+)-/ \
\-DAVE(-)------------------------ZETA

That tests out fine, and nothing else went to m-c in the meantime, so
ZETA goes to m-c as is; m-s and m-c are identical again.

/-LUPIN(+)--\
m-c ...-BASE(+)--ARTIE(+)--AH--------AHM--AHML--ZETA(+)
+-HELEN(+)-/ / /
+-MELL(-)--MELL2(+)-/ /
\-DAVE(-)----------------------

Note that if someone pushes a chain of revisions all at once, only the
tip is tested. This allows someone to apply a sequence of patches that
temporarily break things but then fix them again.

In practice I imagine that the (+) and (-) markers would be sets of
flags somehow associated with each revision (tags?), one for each test
suite plus one more for the merge attempt, if any, each of which could
be + or -; a single - being enough to stop propagation.

So, positives of this approach include:

(1) There's never more than one head on mozilla-central.
(2) Because multiple heads are allowed on mozilla-staging, we can test
them in parallel; this should make it easier for the testing system
to keep up with the commit volume.
(3) All the human-written patches that land on m-c have already been
tested (in isolation) and found not to regress anything. This
decreases the chance of a regression in m-c.
(4) If a patch does regress something when tested in isolation, or
doesn't auto-merge cleanly, it's not kicked out of m-s, it's just
ignored until a human does something about it. This allows the
human to continue developing fixes until it does work, accepting new
code from m-c only when convenient. (Of course, the farther away
from m-c tip it gets, the more likely it will be to need a manual
merge.) For instance, sdwilsh's sqlite upgrade and places perf work
could just simmer on m-s until done.
(5) If m-c is locked down for whatever reason, people can keep pushing
stuff to m-s; the only thing that is suspended is propagation from
m-s to m-c. This makes tree closures less painful.

but some negatives are:

(6) The m-c revision graph does get messier. I am inclined to think
this is only a problem because people aren't used to bushy
revision graphs. They are a more accurate representation of what
really happened, and once you get used to them, make more things
possible.
(7) In order to actually get the promised gain in (2) we may need
a lot more hardware in the build farm, and people have to change
their habits so e.g. they create separate heads for every patch they
push as part of a checkin-needed sweep.
(8) Merge nodes automatically created within mozilla-staging are not
tested before landing; therefore we still have to run a full set of
tests against m-c in order to catch regressions due to interaction
between patches. (Testing those nodes would kill the performance
win of (2).)
(9) When a regression does appear on m-c, isolating it may be tricky.
I would recommend that rather than continue to do the backout dance
we've been doing, we implement some way to have the automated
testers that normally work from m-c tip instead build and test
either a specified rev in m-c, or the tip of m-c with a specified
rev backed out (in a temporary tree, *not* pushed anywhere). That
makes the process noninvasive and capable of dealing with an
arbitrary history graph. Another thing that would help is to say
that the baseline perf numbers used for testing in m-s are the best
perf numbers recorded for m-c since the last time a human tagged a
perf regression as acceptable. That way, if there is a perf
regression in m-c, it will automatically suspend propagation from
m-s to m-c for any change that includes the regression.

Benjamin Smedberg

unread,

Oct 28, 2008, 12:38:41 AM10/28/08

Zack Weinberg wrote:
> I originally wrote this proposal a couple weeks ago in response to a
> long discussion in the platform meeting about how we could avoid
> lengthy tree closures while hunting difficult regressions, and sent it
> to beltzner. I haven't heard back from him and the topic came up again

This all seems like a very complicated solution to the problem. It seems to
me that the core problem here is that the tryserver doesn't do testing. If
the tryserver did testing, we could tell it to pull arbitrary revisions
(older revisions if necessary) of any repo, or a repo+patch, and build+test
it. AFAIK this request has been filed.

When the tryserver gains this important capability, it then becomes pretty
easy to treat ssh://hg.mozilla.org/try as the staging repo you want, and
developers can push the relevant tryserver commits to mozilla-central after
they've passed a tryserver run.

This also avoids adding a lot of moving parts to the checkin process and
further complicating the buildbot system which is already pretty complicated.

--BDS

Zack Weinberg

unread,

Oct 28, 2008, 12:54:20 PM10/28/08

to dev-pl...@lists.mozilla.org

Benjamin Smedberg <benj...@smedbergs.us> wrote:

> Zack Weinberg wrote:
> This all seems like a very complicated solution to the problem. It
> seems to me that the core problem here is that the tryserver doesn't
> do testing. If the tryserver did testing, we could tell it to pull
> arbitrary revisions (older revisions if necessary) of any repo, or a
> repo+patch, and build+test it. AFAIK this request has been filed.
>
> When the tryserver gains this important capability, it then becomes
> pretty easy to treat ssh://hg.mozilla.org/try as the staging repo you
> want, and developers can push the relevant tryserver commits to
> mozilla-central after they've passed a tryserver run.
>
> This also avoids adding a lot of moving parts to the checkin process
> and further complicating the buildbot system which is already pretty
> complicated.

This makes sense as an intermediate stage, but I think my full plan has
significant advantages over it. The most important is that we can
prohibit anyone but the automation from pushing changes to -central,
thus eliminating the possibility of an insufficiently tested change
hitting -central. At the same time, the automatic propagation of good
and only good changes means that developers can push their changes and
forget about them; they don't have to take a second positive action to
land their patches after they've been tested, they don't have to watch
tinderbox after landing the patch, they don't have to worry about
backing stuff out. (There should be some sort of interrupt-type
notification - email, for instance - if a patch doesn't land.) In
essence, we already have all the "moving parts" you don't like, but
they're in developers' heads when they could be in automation.

My plan also gives meaning to the revision history in -staging. The
present -try repository has no meaningful history and therefore makes
it much harder to carry out a scenario like my last example in the
original post (where Zeta fixes up Dave's botched commit).

Also, it may not have been obvious, but my full plan involves only one
more piece of automation than your plan -- the script that checks the
labels attached to staging-tree revisions and does the merges and
propagation back to -central. Thus, it shouldn't be a major additional
IT burden or anything like that.

Benjamin Smedberg

unread,

Oct 28, 2008, 1:19:52 PM10/28/08

Zack Weinberg wrote:

> This makes sense as an intermediate stage, but I think my full plan has
> significant advantages over it. The most important is that we can
> prohibit anyone but the automation from pushing changes to -central,
> thus eliminating the possibility of an insufficiently tested change
> hitting -central. At the same time, the automatic propagation of good

I think this is not a positive step. Even with additional automation, there
will be changes that make it to mozilla-central which cause regressions
(perf regressions being the most obvious, but also regressions discovered by
manual testing). Requiring an intermediate staging build before "reopening"
the tree is pretty drastic, especially if the commit is just a straight backout.

> and only good changes means that developers can push their changes and
> forget about them; they don't have to take a second positive action to
> land their patches after they've been tested, they don't have to watch
> tinderbox after landing the patch, they don't have to worry about
> backing stuff out. (There should be some sort of interrupt-type

So who is going to watch mozilla-central, if not the committers? An
additional intermediate step shouldn't absolve committers of the
responsbility for the final push to mozilla-central.

> notification - email, for instance - if a patch doesn't land.) In
> essence, we already have all the "moving parts" you don't like, but
> they're in developers' heads when they could be in automation.

The advantage of the tryserver is that I am not "on the hook" for changes
pushed to it. I can push to it at the end of my day, wait for results
overnight, and then, if all goes well, do the final push to mozilla-central
in the morning.

> My plan also gives meaning to the revision history in -staging. The
> present -try repository has no meaningful history and therefore makes
> it much harder to carry out a scenario like my last example in the
> original post (where Zeta fixes up Dave's botched commit).

How does the -try repository not have meaningful history?

>
> Also, it may not have been obvious, but my full plan involves only one
> more piece of automation than your plan -- the script that checks the
> labels attached to staging-tree revisions and does the merges and
> propagation back to -central. Thus, it shouldn't be a major additional
> IT burden or anything like that.

You need the following bits of automation:

* gather the results of builds from a particular revision -- this is
non-trivial, even with buildbot
** buildbot doesn't have a simple way to collect results of multiple builds
started from a single push and act on that information... you'd be writing a
lot of custom buildbot code, and that code would require attention at every
buildbot version upgrade
** if you include performance tests, we need a way to automatically detect a
performance regression. We don't have that now, and because of the variance
it's quite difficult
* "label" that revision somewhere: you can't do after-the-fact labeling of
hg revisions within mercurial itself, so you've either got to do this
entirely within buildbot or maintain an external list of ready-to-merge
revisions in a database or some other external tool
* perform the merge and push (pretty easy, once you know what you're doing,
but I really think this needs to have good merge commit messages, not just
"automatic merge to mozilla-central".

The benefit of your proposal seem to be

* fire-and-forget commits, and
* layer of indirection so people can't push bad stuff to mozilla-central

I don't think we should be going for fire-and-forget commits, and I think
the layer of indirection has costs that are not proportional to the benefits.

--BDS

Zack Weinberg

unread,

Oct 30, 2008, 8:48:04 PM10/30/08

to dev-pl...@lists.mozilla.org

Benjamin Smedberg <benj...@smedbergs.us> wrote:

> Zack Weinberg wrote:
> > we can prohibit anyone but the automation from pushing changes to
> > -central, thus eliminating the possibility of an insufficiently
> > tested change hitting -central.
>

> I think this is not a positive step. Even with additional automation,
> there will be changes that make it to mozilla-central which cause
> regressions (perf regressions being the most obvious, but also
> regressions discovered by manual testing). Requiring an intermediate
> staging build before "reopening" the tree is pretty drastic,
> especially if the commit is just a straight backout.

Well, we could allow a straight backout to be done directly to
-central, perhaps, but I think this will not be a problem in practice,
because *-staging never closes*. The automatic testing and merging go
on; the only thing that ceases is propagation to -central. Thus,
having -central not making forward progress under my scheme is not
nearly as troublesome as it is now.

> So who is going to watch mozilla-central, if not the committers? An
> additional intermediate step shouldn't absolve committers of the
> responsbility for the final push to mozilla-central.

The sheriff watches -central, and has the authority to back out
patches that cause regressions that were only detected there (however
that actually gets done). Committers are responsible for fixing their
patches once backed out, but nothing else.

I think absolving committers of responsibility for watching -central is
highly desirable; it is a huge burden on casual or even paid
contributors. It is, for instance, the major reason I am not actively
pursuing commit rights.

> How does the -try repository not have meaningful history?

I guess I don't actually know that; I was going by what it says on the
wiki page (words to the effect of "don't pull -try, it doesn't have
meaningful history");.

(One thing, though, is that -try may contain patches that are not meant
to land as is. I have several times sent something to -try when it
hadn't been reviewed yet, for instance. If my plan gets implemented I
would like to keep the try server around for just that kind of usage.)

> You need the following bits of automation:
>
> * gather the results of builds from a particular revision -- this is
> non-trivial, even with buildbot
> ** buildbot doesn't have a simple way to collect results of multiple
> builds started from a single push and act on that information...
> you'd be writing a lot of custom buildbot code, and that code would
> require attention at every buildbot version upgrade
> ** if you include performance tests, we need a way to automatically
> detect a performance regression. We don't have that now, and because
> of the variance it's quite difficult

I would argue that we want all of the things you list, up to this
point, even if we don't do my plan. They would make manual regression
hunting easier. (And yes, I meant to include perf tests.) If buildbot
integration is a problem, we ought to work with buildbot upstream till
it isn't.

> * "label" that revision somewhere: you can't do after-the-fact
> labeling of hg revisions within mercurial itself

Crud. I thought that was possible. You're right that this is a
major obstacle; I would say that we ought to work with mercurial
upstream to make it possible.

Bernd

unread,

Oct 31, 2008, 2:46:54 AM10/31/08

to dev-pl...@lists.mozilla.org, Zack Weinberg

Zack Weinberg wrote:

> I think absolving committers of responsibility for watching -central is
> highly desirable; it is a huge burden on casual or even paid
> contributors. It is, for instance, the major reason I am not actively
> pursuing commit rights.

This is unfair to the people that help you to get your stuff into the tree.

I know the burden is high, I can't commit during weekdays, as I do not
have 6 hours in a row to watch these damned slow talos boxes that need
3h hours to cycle. But I do not agree that somebody else has to watch
for me if I break something. (Thanks to timeless, bz, bsmedberg for
helping me.)

Now we come to the central question:

Its far beyond my understanding why the mozilla corporation accepts the
current state of affairs at build.
- unclear orange
- damned slow boxes
- no guilty column
- (and don't get me started that one cannot mark in hg web as one could
easily with bonsai.)

The first 3 entries are already constant waste of paid time and spare
time of contributors.

Just put a estimate how much time one wastes by being distracted as one
has to look up tinderbox multiple times during 4 hours (thats a fast
round). Multiply it by the typical hour wages and you will see that
current state at tinderbox is expensive.

Normal managing would put firm dates on getting this into a normal
state. Why there aren't dead lines till when guilty will work, when the
slowest machine will cycle not more than hour? When will there be a
accessible environment to reproduce and fix the semi orange tree and put
a date on this too.

I know this is not my money that needs to be thrown at it, but watching
how mozilla burns money isn't fun either.

Bernd

0 new messages