Goals for "landing code" experience?

Axel Hecht

unread,

Dec 28, 2011, 3:01:37 PM12/28/11

to

Hi,

it so happens that I need to touch our trees more often these days than
in the past few years, and it made me grumpy. Couldn't avoid it this
time, though, I need to land the stuff so that we can ship a localized
Fennec 11.

Do we have goals on what the experience *should* be when landing code on
our mainline repos?

Why I'm asking:

Inbound has proven to make the tree just more opaque to me, and given
that I'm coding because otherwise we don't ship *this* version on aurora
to begin with, the delays are awful. And of course it doesn't help at
all if your code needs to be on aurora.

Things I see range from many colors across the board, build popping up
and going away, being anything of green, blue, red, orange, purple, and
for no reason I can figure out. I follow orange star bugs, and they
don't seem to be doing anything but allowing us to ignore the orange?

I really felt that I couldn't land code without other people doing my
deeds to make that code stick to the tree.

Which sounds wrong for me as a gecko module owner.

Axel

Ehsan Akhgari

unread,

Dec 28, 2011, 3:59:56 PM12/28/11

to Axel Hecht, dev-pl...@lists.mozilla.org

On Wed, Dec 28, 2011 at 3:01 PM, Axel Hecht <ax...@pike.org> wrote:

> Hi,
>
> it so happens that I need to touch our trees more often these days than in
> the past few years, and it made me grumpy. Couldn't avoid it this time,
> though, I need to land the stuff so that we can ship a localized Fennec 11.
>
> Do we have goals on what the experience *should* be when landing code on
> our mainline repos?
>

Not that I know of. We've been trying to make things more streamlined
though.

> Why I'm asking:
>
> Inbound has proven to make the tree just more opaque to me, and given that
> I'm coding because otherwise we don't ship *this* version on aurora to
> begin with, the delays are awful.

I'm not sure what you mean here. If you go to <
https://tbpl.mozilla.org/?tree=Mozilla-Inbound>, you'll see a link to <
https://wiki.mozilla.org/Tree_Rules>, where we explain the check-in rules
for our various trees. Is there anything that you feel we should add to
those pages?

> And of course it doesn't help at all if your code needs to be on aurora.
>

It's true. The traffic on the Aurora and Beta trees is too low to warrant
an inbound tree for them, especially since that mozilla-inbound is merged
by people volunteering their time, which makes it less desirable to have
more inbound trees IMO.

> Things I see range from many colors across the board, build popping up and
> going away, being anything of green, blue, red, orange, purple, and for no
> reason I can figure out.

I believe that the Legend link at the top of every TBPL page should
describe the meaning of different colors. Is there anything you think we
can improve there?

About builds popping up and disappearing, that should mostly happen as a
result of either a bug in buildbot or one in TBPL. You should file those
bugs just like any other bug in Mozilla products.

> I follow orange star bugs, and they don't seem to be doing anything but
> allowing us to ignore the orange?
>

The "War on Orange" project never got too much traction, not as a result of
anybody deciding that we should ignore intermittent test failures, but as a
result of lack of resources.

But with the introduction of mozilla-inbound, the impact of those bugs was
decreased dramatically for most people, perhaps with the exception of
people landing more often on Aurora/Beta than on central.

> I really felt that I couldn't land code without other people doing my
> deeds to make that code stick to the tree.
>
> Which sounds wrong for me as a gecko module owner.
>

I agree that the situation is far from ideal, but I think more concrete
suggestions would be more helpful in improving the situation. This is not
a problem which can be solved overnight. Small and consistent improvements
are the path to success.

Cheers,
--
Ehsan
<http://ehsanakhgari.org/>

Ryan VanderMeulen

unread,

Dec 28, 2011, 4:30:50 PM12/28/11

to

On 12/28/2011 3:59 PM, Ehsan Akhgari wrote:
> The "War on Orange" project never got too much traction, not as a result of
> anybody deciding that we should ignore intermittent test failures, but as a
> result of lack of resources.
>
> But with the introduction of mozilla-inbound, the impact of those bugs was
> decreased dramatically for most people, perhaps with the exception of
> people landing more often on Aurora/Beta than on central.
>

Isn't the bigger issue the possible real bugs that are being exposed by
the orange that aren't being addressed?

Axel Hecht

unread,

Dec 28, 2011, 6:22:57 PM12/28/11

to

On 28.12.11 21:59, Ehsan Akhgari wrote:
> On Wed, Dec 28, 2011 at 3:01 PM, Axel Hecht<ax...@pike.org> wrote:
>
>> Hi,
>>
>> it so happens that I need to touch our trees more often these days than in
>> the past few years, and it made me grumpy. Couldn't avoid it this time,
>> though, I need to land the stuff so that we can ship a localized Fennec 11.
>>
>> Do we have goals on what the experience *should* be when landing code on
>> our mainline repos?
>>
>
> Not that I know of. We've been trying to make things more streamlined
> though.
>
>
>> Why I'm asking:
>>
>> Inbound has proven to make the tree just more opaque to me, and given that
>> I'm coding because otherwise we don't ship *this* version on aurora to
>> begin with, the delays are awful.
>
>
> I'm not sure what you mean here. If you go to<
> https://tbpl.mozilla.org/?tree=Mozilla-Inbound>, you'll see a link to<
> https://wiki.mozilla.org/Tree_Rules>, where we explain the check-in rules
> for our various trees. Is there anything that you feel we should add to
> those pages?

Where would be the information on what to star, and how, when to
retrigger a build, and how, etc. Inbound is really just a manifestation
that there are only a handful of volunteers that can actually drive our
tree.

>> And of course it doesn't help at all if your code needs to be on aurora.
>>
>
> It's true. The traffic on the Aurora and Beta trees is too low to warrant
> an inbound tree for them, especially since that mozilla-inbound is merged
> by people volunteering their time, which makes it less desirable to have
> more inbound trees IMO.

IMHO, it's a shame that we think we need inbound.

>> Things I see range from many colors across the board, build popping up and
>> going away, being anything of green, blue, red, orange, purple, and for no
>> reason I can figure out.
>
>
> I believe that the Legend link at the top of every TBPL page should
> describe the meaning of different colors. Is there anything you think we
> can improve there?
>
> About builds popping up and disappearing, that should mostly happen as a
> result of either a bug in buildbot or one in TBPL. You should file those
> bugs just like any other bug in Mozilla products.

As much as I grokked it, tbpl doesn't show pending builds. Might be the
reason why we actually ended up closing the tree for builds completely
missing the other day, way late.

Anyway, my patches are far from "when it's on central, it'll ship". I'm
bound to make things work on a milestone.

>> I follow orange star bugs, and they don't seem to be doing anything but
>> allowing us to ignore the orange?
>>
>
> The "War on Orange" project never got too much traction, not as a result of
> anybody deciding that we should ignore intermittent test failures, but as a
> result of lack of resources.

Too bad.

> But with the introduction of mozilla-inbound, the impact of those bugs was
> decreased dramatically for most people, perhaps with the exception of
> people landing more often on Aurora/Beta than on central.

Worse, even. I don't think we should create tests to impact developers,
but to impact the code we ship to people.

>> I really felt that I couldn't land code without other people doing my
>> deeds to make that code stick to the tree.
>>
>> Which sounds wrong for me as a gecko module owner.
>>
>
> I agree that the situation is far from ideal, but I think more concrete
> suggestions would be more helpful in improving the situation. This is not
> a problem which can be solved overnight. Small and consistent improvements
> are the path to success.

I'd call success if I feel comfortable about landing code. Selfish, but
true.

Right now, we're in a state where I'd do a lot to not having to hack on
gecko patches. I'm personally in the fortunate position that I can move
mozilla ahead most of the time by not touching our apps.

I personally don't see the incremental path between what I perceive as
status-quo and a state where I'd feel as confident about our tree like I
did, say, 10 years ago (yuck).

Which is why I was asking if there was a goal or plan that I could test
my expectations against.

Axel

Bobby Holley

unread,

Dec 28, 2011, 7:14:59 PM12/28/11

to Axel Hecht, dev-pl...@lists.mozilla.org

I agree that there are lots of things that could be improved. But I'd like
to offer my different perspective on the general direction of things.

I find it easier to land code today than at any other time during my three
and a half years as a Mozilla committer. There have been lots of
improvements, the big ones being tryserver, TBPL, and mozilla-inbound.

I think that mozilla-inbound has made an incredible difference, and my
conversations with contributors at MozCamp EU suggest that sentiment is
shared beyond the cadre of full-time MoCo gecko hackers. Not needing to
watch the tree is a huge productivity and focus boon, and I owe it to the
tireless volunteers who watch over the tree (ehsan, philor, edmorley,
mbrubeck, khuey, and others).

Intermittent orange is a very longstanding and hard problem. I remember
schrep expressing concern in 2008 that it was going to make our whole test
suite useless. Given that, I think we've done a pretty decent job of moving
forward. TBPL makes it very easy to detect and star them, and
mozilla-inbound means that you don't even need to (except on try).

There's always room for improvement, but resources are limited. I'm sure
everyone would welcome concrete proposals on how to improve things, but I
think it's important to recognize the tremendous work done and being done
to keep the tree spinning.

Just my 2c.

bholley

On Wed, Dec 28, 2011 at 3:22 PM, Axel Hecht <ax...@pike.org> wrote:

> On 28.12.11 21:59, Ehsan Akhgari wrote:
>
>> On Wed, Dec 28, 2011 at 3:01 PM, Axel Hecht<ax...@pike.org> wrote:
>>
>> Hi,
>>>
>>> it so happens that I need to touch our trees more often these days than
>>> in
>>> the past few years, and it made me grumpy. Couldn't avoid it this time,
>>> though, I need to land the stuff so that we can ship a localized Fennec
>>> 11.
>>>
>>> Do we have goals on what the experience *should* be when landing code on
>>> our mainline repos?
>>>
>>>
>> Not that I know of. We've been trying to make things more streamlined
>> though.
>>
>>
>> Why I'm asking:
>>>
>>> Inbound has proven to make the tree just more opaque to me, and given
>>> that
>>> I'm coding because otherwise we don't ship *this* version on aurora to
>>> begin with, the delays are awful.
>>>
>>
>>
>> I'm not sure what you mean here. If you go to<

>> https://tbpl.mozilla.org/?**tree=Mozilla-Inbound<https://tbpl.mozilla.org/?tree=Mozilla-Inbound>>,

>> you'll see a link to<

>> https://wiki.mozilla.org/Tree_**Rules<https://wiki.mozilla.org/Tree_Rules>>,

> ______________________________**_________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/**listinfo/dev-planning<https://lists.mozilla.org/listinfo/dev-planning>
>

Matt Brubeck

unread,

Dec 28, 2011, 7:21:13 PM12/28/11

to Axel Hecht

On 12/28/2011 12:01 PM, Axel Hecht wrote:
> Things I see range from many colors across the board, build popping up
> and going away, being anything of green, blue, red, orange, purple, and
> for no reason I can figure out. I follow orange star bugs, and they
> don't seem to be doing anything but allowing us to ignore the orange?
>
> I really felt that I couldn't land code without other people doing my
> deeds to make that code stick to the tree.

It sounds like we need better documentation for how to deal with test
failures in tinderbox (or we need to make the existing documentation
easier to find). I had to pick up this knowledge mostly by poking
around and trying things, and asking in #developers whenever I couldn't
figure something out. There have also been training courses (there was
one at a 2010 All Hands week), but I couldn't easily find materials from
them online.

I've tried to document what I know and make it accessible, for example,
by bringing the "Tree Rules" and "Committing Rules and Responsibilities"
pages up to date, and adding links to them in prominent places. There's
certainly more information that's not on those pages and I don't know
exactly which information people need or want. If you run into
situations that aren't covered in those pages, please (a) ask what to
do, and (b) help me make sure the answers get added to the help pages!

The process for watching the tree is not *that* involved. It becomes
much easier if you do it regularly, which means that the few people who
do it most often are now very efficient at it, and we may end up relying
on them too heavily because everyone else feels that it's not worth
learning to do it themselves. But it's really a very simple process
that you *can* do yourself, and the experienced tree watchers will still
be there to help you.

In concrete terms, here's what I do when I land a patch:

A) Before pushing, open Tinderbox Pushlog (TBPL) for the tree I'm
landing and make sure the tree is open and there are no unstarred failures.

B) After pushing, keep the TBPL page open in an app tab. When new
failures appear, star them. Continue doing this until builds and tests
have completed on all platforms.

During either (A) or (B), if there is an unstarred failure (a red,
orange, or purple test result without an asterisk), then I do the following:

1) Click on the failing job to see a list of suggested bugs. If the
failure clearly matches a known bug, click on the star next to that bug
and then click "Add a comment" and then submit the comment.

2) If the failure might match a known bug but I am not sure, I
middle-click the bug number to open it in a new tab, and on the failing
job to open its log in a new tab. If the log and bug do match, then I
add a comment as in step (1).

3) If the summary does not seem to match any suggested bugs, I search
Bugzilla for the name of the failing test or the error message, to see
if I can find a matching bug. If I find one, I add a comment in the bug
and to the job in TBPL.

4) If I can't figure out whether a known bug exists (because I can't
figure out what part of the log I should search for), then I look on
TBPL to see if there are other similar failures nearby, and I ask on
#developers to see if anyone recognizes it as a known failure.

5) If there is no matching bug, I back out the change (if I suspect that
the failure was caused by my changeset) or retrigger the job (if I
suspect it's an unrelated intermittent failure). After more test runs
it may become clear that the failure is a new regression, and then I
back out the offending changeset.

6) If it turns out to be an unknown intermittent failure, then I file a
bug report with whiteboard:[orange] and blocks:randomorange, with the
name of the test file in the summary, and a link and excerpt from the
log in the description.

That's about it. And at *any* of the above steps you can ask in
#developers for someone with sheriff experience to help out or answer
questions. I'll try to find a place to add this to the documentation
and make it findable.

Matt Brubeck

unread,

Dec 28, 2011, 7:40:52 PM12/28/11

to Axel Hecht

On 12/28/2011 04:21 PM, Matt Brubeck wrote:
> I'll try to find a place to add this to the documentation and
> make it findable.

I cleaned up my brief TBPL guide and posted it here, which you can find
via the "Tree Rules" link on TBPL, or via other MDN documentation on the
development process:

https://developer.mozilla.org/En/Developer_Guide/Committing_Rules_and_Responsibilities#Dealing_with_test_failures

Ed Morley

unread,

Dec 28, 2011, 7:27:47 PM12/28/11

to dev-pl...@lists.mozilla.org

> ...and I owe it to the

> tireless volunteers who watch over the tree (ehsan, philor, edmorley,
> mbrubeck, khuey, and others).

And mak :-)

(I don't expect anyone to know the complete list, and I'm probably still missing people out, but wanted to at least credit him as well for now :-))

Ed Morley

unread,

Dec 28, 2011, 7:27:47 PM12/28/11

to mozilla.de...@googlegroups.com, dev-pl...@lists.mozilla.org

> ...and I owe it to the

> tireless volunteers who watch over the tree (ehsan, philor, edmorley,
> mbrubeck, khuey, and others).

Matt Brubeck

unread,

Dec 28, 2011, 8:38:05 PM12/28/11

to Axel Hecht

On 12/28/2011 03:22 PM, Axel Hecht wrote:
> IMHO, it's a shame that we think we need inbound.

Hopefully we all agree that we should back out patches that cause builds
or tests to fail.

Given that, who should do the work of backing out those patches? Before
inbound, the answer was every developer who landed changes. This meant
that everyone had to stay online and keep an eye on the tree for up to
seven or eight hours after landing any change. Now, only a handful of
people need to watch the tree and back out changes, and the total number
of person-hours is far less.

Do you feel the old system was better? (If so, why?) Or do you have a
concrete alternative that is better than either of these? Note that
even if we fixed all intermittent test failures, we'd still need some
process for dealing with broken patches.

Personally, I think that we can and will improve things by automating
more of the process. For example, the work to automatically push
patches from Bugzilla->Try->Inbound. But this will be an ongoing
process; the code will not appear overnight.

> As much as I grokked it, tbpl doesn't show pending builds. Might be the
> reason why we actually ended up closing the tree for builds completely
> missing the other day, way late.

TBPL shows pending builds in light gray.

You mentioned builds "disappearing" - this may be caused by coalescing
(when machines are scarce, jobs on adjacent pushes get combined) or by
delays between jobs finishing and results appearing in tinderbox (which
has improved by orders of magnitude in the past year, but still needs to
improve futher). Or it might just be a bug in TBPL or elsewhere.
Please ask on #developers or #build if you're not sure!

>> But with the introduction of mozilla-inbound, the impact of those bugs
>> was decreased dramatically for most people, perhaps with the exception
>> of people landing more often on Aurora/Beta than on central.
>
> Worse, even. I don't think we should create tests to impact developers,
> but to impact the code we ship to people.

Many intermittent test failures do *not* impact our users, which is
exactly why I think that prioritizing them below other bugs is often the
correct decision.

When we suspect that randomorange bugs do reflect real problems with the
product, I do see us working to fix them (often working quite hard, as
in bug 687972). But many other randomorange bugs reflect problems in
our tests, not our products. And even when they represent real bugs,
they often (a) are unreproducible and therefore have a higher cost to
find and fix, (b) have no reports from actual users, and (c) happen only
in very rare circumstances.

I trust our module owners to prioritize these bugs correctly along with
all the other bugs in their modules. Random orange bugs *do* impact
developers who push code to our repositories and we should fix them to
minimize that impact, but fixing every intermittent test failure need
not be our first priority.

Ryan VanderMeulen

unread,

Dec 28, 2011, 9:21:40 PM12/28/11

to

On 12/28/2011 8:38 PM, Matt Brubeck wrote:
> Many intermittent test failures do *not* impact our users

How is that determined?

Robert O'Callahan

unread,

Dec 28, 2011, 9:29:07 PM12/28/11

to Ryan VanderMeulen, dev-pl...@lists.mozilla.org

Because they are later resolved to be bugs in the tests.

Rob
--
"If we claim to be without sin, we deceive ourselves and the truth is not
in us. If we confess our sins, he is faithful and just and will forgive us
our sins and purify us from all unrighteousness. If we claim we have not
sinned, we make him out to be a liar and his word is not in us." [1 John
1:8-10]

Ryan VanderMeulen

unread,

Dec 28, 2011, 9:34:37 PM12/28/11

to

On 12/28/2011 9:29 PM, Robert O'Callahan wrote:
> On Thu, Dec 29, 2011 at 3:21 PM, Ryan VanderMeulen<rya...@gmail.com> wrote:
>
>> On 12/28/2011 8:38 PM, Matt Brubeck wrote:
>>
>>> Many intermittent test failures do *not* impact our users
>>>
>>
>> How is that determined?
>
>
> Because they are later resolved to be bugs in the tests.
>
> Rob

I'm asking the question because (as pointed out in this thread already)
there are many intermittent orange bugs where the only activity in them
is the tbpl starring. It seems little has been done to determine the
cause of the failure.

Edmund

unread,

Dec 29, 2011, 3:57:50 AM12/29/11

to

Matt Brubeck wrote:
>>
>> I really felt that I couldn't land code without other people doing my
>> deeds to make that code stick to the tree.
>
> It sounds like we need better documentation for how to deal with test
> failures in tinderbox (or we need to make the existing documentation
> easier to find). I had to pick up this knowledge mostly by poking
> around and trying things, and asking in #developers whenever I couldn't
> figure something out. There have also been training courses (there was
> one at a 2010 All Hands week), but I couldn't easily find materials from
> them online.

I believe I'm the relative newcomer 'round these parts so what I say
might seem 'stupid' or ignorant, of which I do apologize beforehand.

I've been 'watching' the tbpl and #developer for practically 1 1/2 years
(off and on) and I'm still ignorant in the ways of the tree. Sure,
there is a legend and it clearly marks what is what and what colour
means what. However, I'm still no where near the level of understand-
ing to figure out the ins-and-outs of the tree. In other words,
it's highly complicated; kinda like rocket science.

I am sure that if I just sat and watched the veteran-devs (people that
come to mind is philor and bz) that do the starring (but not asking
questions as that'd probably interrupt the flow), I'd probably have
a chance of figuring it out within my lifetime. At the get go when
I was first exposed to the tbpl (1 tree with quite a bit of characters
but not as much as now), I was overwhelmed and I believe I asked someone
whether or not there was some documentation on this whole thing. Simple
answer back then was, "No."

But right now, your procedure does help (now that I got my L3 access)
to understand when to push. However the documentation on what/how to
help star builds is still lacking (IMO). I realize this is all by
experience, but if this experience isn't written down, philor et. al
will most likely take the brunt of the job of doing the starring.
(No offense to anyone else that I've forgotten to mention).

As an anecdote, I tried my go of starring a build for the comm-central
(specifically the CalendarTrunk, which was burning like there was no
tomorrow). My first problem. What do I do first? I took a look
at the full log of the burning build. I noticed the trend with the
other burnings, so I went to bugzilla and filed a bug. (fwiw,
bug #714016) I took a look at other burning bugs from m-c and
basically noticed it was just a post of the log in the comment.
So I pasted a link to the log into the comments. Next problem,
how do I add the bug # to the burning red B?(I think the terminology
is star the red build) I asked on #developers and darktrojan (thanks!)
said "put bug <#>" in the comments. I did that. And that was
my first experience with starring a red build. Of course, there is
a chance that a "clobber" would solve the issue. I wouldn't know
and those who are in the know are usually offline when I'm online.

Now this begs the questions. Should I have done that? Should I have
asked someone else whether or not I should do star the build? I looked
at the CalendarTrunk and it was burning for sometime. Was there a
reason why no one starred any build? Did I overstep the boundaries of
what I should/should not do?

(My calendar build just finished and I definitely can reproduce the
broken build, so it isn't a clobber-required, but a coding issue.)

> I've tried to document what I know and make it accessible, for example,
> by bringing the "Tree Rules" and "Committing Rules and Responsibilities"
> pages up to date, and adding links to them in prominent places. There's
> certainly more information that's not on those pages and I don't know
> exactly which information people need or want. If you run into
> situations that aren't covered in those pages, please (a) ask what to
> do, and (b) help me make sure the answers get added to the help pages!

Do the rules apply to m-c, m-i and c-c? Or are they specific to m-c?
Now I understand that there is a lack of resources in the c-c camp
that 'we' can't star everything. I believe Callek's doing something
along the lines of fixing this.

>
> The process for watching the tree is not *that* involved. It becomes
> much easier if you do it regularly, which means that the few people who
> do it most often are now very efficient at it, and we may end up relying
> on them too heavily because everyone else feels that it's not worth
> learning to do it themselves. But it's really a very simple process
> that you *can* do yourself, and the experienced tree watchers will still
> be there to help you.

I don't know if I've mentioned this anywhere (aside for IRC), but
#developers is an intimidating channel. It's like walking into
an ATC and asking "what are those blips on the screen?".
(Of course, it's not comparable to the life-or-death situation
as in the ATC, but the feeling is similar.)

Again, no offense to anyone. It's a good experience watching
people on #developers interact, even though I have no clue as
to what's going on.

Edmund

Edmund

unread,

Dec 29, 2011, 4:16:42 AM12/29/11

to

I was reading this and would like a clarification.

"Once you have checked in, you need to watch the tree
and make sure the next cycle for every machine is
green. A good rule of thumb is that it will take 1.5
hours to make sure your change compiles correctly on
all platforms, 2.5 hours to make sure the unit tests
pass, and 4 hours to make sure the "Talos" performance
tests don't regress and don't crash on your changes.
Therefore it is unwise to check in if you won't be
available for the next 4 hours."

Pardon my ignorance, but are the Talos performance tests
done in parallel with the compilation and unit testing, or
is it after the compilation and unit testing (thus it would
be a total of 8 hours)?

Thanks

Edmund

Axel Hecht

unread,

Dec 29, 2011, 7:37:06 AM12/29/11

to

On 29.12.11 02:38, Matt Brubeck wrote:
> On 12/28/2011 03:22 PM, Axel Hecht wrote:
>> IMHO, it's a shame that we think we need inbound.
>
> Hopefully we all agree that we should back out patches that cause builds
> or tests to fail.

In theory yes, but in practice, are we?

I guess we do for permanent bustage or test failure that's obviously
bound to the patch landed.

It doesn't seem to be that we have traction on the orangeness that is
random, and I'm sure that we're adding new of that. Just by the
likelihood of it.

> Given that, who should do the work of backing out those patches? Before
> inbound, the answer was every developer who landed changes. This meant
> that everyone had to stay online and keep an eye on the tree for up to
> seven or eight hours after landing any change. Now, only a handful of
> people need to watch the tree and back out changes, and the total number
> of person-hours is far less.
>
> Do you feel the old system was better? (If so, why?) Or do you have a
> concrete alternative that is better than either of these? Note that even
> if we fixed all intermittent test failures, we'd still need some process
> for dealing with broken patches.

The "old" system was really fewer people hitting way less tests, and
well, either you burned the tree or you didn't. Doesn't really compare.

> Personally, I think that we can and will improve things by automating
> more of the process. For example, the work to automatically push patches
> from Bugzilla->Try->Inbound. But this will be an ongoing process; the
> code will not appear overnight.

I would hope that automation relies on good metrics. And that's what I
think we're missing, at least to some extent.

>> As much as I grokked it, tbpl doesn't show pending builds. Might be the
>> reason why we actually ended up closing the tree for builds completely
>> missing the other day, way late.
>
> TBPL shows pending builds in light gray.

My experience is that running builds are gray, and pending builds don't
show at all.

> You mentioned builds "disappearing" - this may be caused by coalescing
> (when machines are scarce, jobs on adjacent pushes get combined) or by
> delays between jobs finishing and results appearing in tinderbox (which
> has improved by orders of magnitude in the past year, but still needs to
> improve futher). Or it might just be a bug in TBPL or elsewhere. Please
> ask on #developers or #build if you're not sure!

Disappearing might not be that much of an issue, in particular it's hard
for me to point a finger on it. It was mostly popping.

>>> But with the introduction of mozilla-inbound, the impact of those bugs
>>> was decreased dramatically for most people, perhaps with the exception
>>> of people landing more often on Aurora/Beta than on central.
>>
>> Worse, even. I don't think we should create tests to impact developers,
>> but to impact the code we ship to people.
>
> Many intermittent test failures do *not* impact our users, which is
> exactly why I think that prioritizing them below other bugs is often the
> correct decision.
>
> When we suspect that randomorange bugs do reflect real problems with the
> product, I do see us working to fix them (often working quite hard, as
> in bug 687972). But many other randomorange bugs reflect problems in our
> tests, not our products. And even when they represent real bugs, they
> often (a) are unreproducible and therefore have a higher cost to find
> and fix, (b) have no reports from actual users, and (c) happen only in
> very rare circumstances.
>
> I trust our module owners to prioritize these bugs correctly along with
> all the other bugs in their modules. Random orange bugs *do* impact
> developers who push code to our repositories and we should fix them to
> minimize that impact, but fixing every intermittent test failure need
> not be our first priority.

As others on this thread, I have hit oranges that don't show any trace
of module owner triage.

Axel

Marco Bonardo

unread,

Dec 29, 2011, 8:21:21 AM12/29/11

to

On 29/12/2011 13:37, Axel Hecht wrote:
>> Hopefully we all agree that we should back out patches that cause builds
>> or tests to fail.
>
> In theory yes, but in practice, are we?

Yes, as soon as the bustage is detectable. Different persons spent a
bunch of time tracking new random oranges and backing out patches even
from days before.
Clearly cases where this is just impossible exist, that's the nature of
random failures.

> It doesn't seem to be that we have traction on the orangeness that is
> random, and I'm sure that we're adding new of that. Just by the
> likelihood of it.

No doubt everyday we may be adding new random failures, but it's an hard
to avoid problem. Luckily more and more reviewers and developers are
aware of common issues bringing to those.

> The "old" system was really fewer people hitting way less tests, and
> well, either you burned the tree or you didn't. Doesn't really compare.

I partially disagree, based on my experience in fixing random oranges in
the "old" system. It was not really different from what happens today,
there were just less changesets, but random failures were still not
burning the tree and hard to detect in time.

> My experience is that running builds are gray, and pending builds don't
> show at all.

there are small delays before results appear on tbpl, btw dark gray is
running builds, light gray is pending builds. Missing entry is either
"changing status" or "I have been coalesced to some of the next pushes,
so you won't ever know my result, and if I fail you'll lose hours trying
to figure me out".

> As others on this thread, I have hit oranges that don't show any trace
> of module owner triage.

This is absolutely true and a problem to solve still. War on orange is
nice, but some of the bugs are just being ignored and globally looks
like there is lack of driving.
Philor pointed out that one defect of inbound is that fewer people look
at failures, while in the past some expert developers could have noticed
a failure and have a fix in minutes. So, maybe we could address this by
better exposing random failures to the public, a weekly post on Planet
may report the top ten random failures and the new ones for the past
week. Shouldn't be too hard to automate that through war on orange, I guess.

-m

Axel Hecht

unread,

Dec 29, 2011, 8:30:55 AM12/29/11

to

Thanks for the updates on the doc.

Derailing the discussion a tad further, here were my landings, with
comments:
https://tbpl.mozilla.org/?tree=Mozilla-Aurora&rev=1ab6d10f385a
Two android xul opt M6 oranges, with empty summary.
Hit bug 482975, our king of orange, no traction for a year?
Purples and reds that restarting fixed.
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=caf52bced4d3
One blue, but no restarted build for that blue?
https://tbpl.mozilla.org/?tree=Mozilla-Aurora&rev=9d9046c6cc6d
One empty summary, two summaries without suggestions.

I had another landing I don't find right now where I couldn't figure out
why I would have been offered the choice I was offered. The likely
candidate was among the choice, luckily, but I checked another one, and
couldn't figure out why it was there.

FWIW, my questions in #developers or #mobile (for the android bustage)
didn't turn up answers, I was lucky enough to get folks to help silently
though.

Axel

Marco Bonardo

unread,

Dec 29, 2011, 8:45:07 AM12/29/11

to

On 29/12/2011 14:30, Axel Hecht wrote:
> https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=caf52bced4d3
> One blue, but no restarted build for that blue?

self-serve shows the green retriggered build, so this may be some
automation failure that did not report the retriggered test.
Usually when something weird happens you can click on the self-serve
link and verify there.
-m

Ehsan Akhgari

unread,

Dec 29, 2011, 12:03:09 PM12/29/11

to Marco Bonardo, dev-pl...@lists.mozilla.org

On Thu, Dec 29, 2011 at 8:21 AM, Marco Bonardo <ma...@supereva.it> wrote:

> It doesn't seem to be that we have traction on the orangeness that is
>> random, and I'm sure that we're adding new of that. Just by the
>> likelihood of it.
>>
>
> No doubt everyday we may be adding new random failures, but it's an hard
> to avoid problem. Luckily more and more reviewers and developers are aware
> of common issues bringing to those.
>
> The "old" system was really fewer people hitting way less tests, and
>> well, either you burned the tree or you didn't. Doesn't really compare.
>>
>
> I partially disagree, based on my experience in fixing random oranges in
> the "old" system. It was not really different from what happens today,
> there were just less changesets, but random failures were still not burning
> the tree and hard to detect in time.

FWIW, I've written up some documentation on avoiding writing tests which
are prone to some known patterns which may cause intermittent oranges: <
https://developer.mozilla.org/en/QA/Avoiding_intermittent_oranges>.

But with software systems which rely on things like event loops or multiple
threads, it's just impossible to get perfectly deterministic behavior. No
matter how hard you try.

I should also say that I strongly disagree with the assertion of the "old"
system being better because we had fewer tests and fewer code changes!

> My experience is that running builds are gray, and pending builds don't
>> show at all.
>>
>
> there are small delays before results appear on tbpl, btw dark gray is
> running builds, light gray is pending builds. Missing entry is either
> "changing status" or "I have been coalesced to some of the next pushes, so
> you won't ever know my result, and if I fail you'll lose hours trying to
> figure me out".
>

If you never see light gray jobs, then please file a bug. I've seen them
tons of times, and so have others.

>
> As others on this thread, I have hit oranges that don't show any trace
>> of module owner triage.
>>
>
> This is absolutely true and a problem to solve still. War on orange is
> nice, but some of the bugs are just being ignored and globally looks like
> there is lack of driving.
> Philor pointed out that one defect of inbound is that fewer people look at
> failures, while in the past some expert developers could have noticed a
> failure and have a fix in minutes. So, maybe we could address this by
> better exposing random failures to the public, a weekly post on Planet may
> report the top ten random failures and the new ones for the past week.
> Shouldn't be too hard to automate that through war on orange, I guess.
>

It's not really that complicated, somebody just needs to triage the orange
bugs, and ping the people responsible for the component in question, and
then ping them some more! As I've said before, the problem is lack of
man-power.

Matt Brubeck

unread,

Dec 29, 2011, 1:05:19 PM12/29/11

to

On 12/29/2011 09:03 AM, Ehsan Akhgari wrote:
>> I partially disagree, based on my experience in fixing random oranges in
>> the "old" system. It was not really different from what happens today,
>> there were just less changesets, but random failures were still not burning
>> the tree and hard to detect in time.

> I should also say that I strongly disagree with the assertion of the "old"
> system being better because we had fewer tests and fewer code changes!

And fewer platforms!

At one point we -- mostly Ehsan :) -- had reduced the average orange
count to less than 2 per push. Since then it's climbed back up to 4 or
more. During that time, in addition to adding new tests and code, we've
also added OS X 10.7 builds, native Android builds, non-PGO builds on
Linux and Windows, etc. So even if the number of intermittently
failing tests had stayed the same, we would still expect to see the
number of failures per push growing.

Hopefully this part of the trend will level off now since I'm not aware
of any imminent new platforms. Eventually we will even retire some,
like Android XUL and OS X 10.5.

Christian Legnitto

unread,

Dec 29, 2011, 1:31:10 PM12/29/11

to Matt Brubeck, dev-pl...@lists.mozilla.org

Also, for those that don't know where we track this:

http://brasstacks.mozilla.com/orangefactor/

I've also looked at what webkit does a little. I really like this view:

http://test-results.appspot.com/dashboards/flakiness_dashboard.html (warning, large table)

If we had a view like that I could see "oh, this test has been flaky, probably not worth bugging people" vs "oh, this test has been green forever and it's now orange, I should look into it". The key part is that flakiness is not really a number...the trend matters.

Thanks,
Christian

Ehsan Akhgari

unread,

Dec 29, 2011, 2:07:27 PM12/29/11

to Christian Legnitto, dev-pl...@lists.mozilla.org, Matt Brubeck

On Thu, Dec 29, 2011 at 1:31 PM, Christian Legnitto
<cleg...@mozilla.com>wrote:

>
> On Dec 29, 2011, at 10:05 AM, Matt Brubeck wrote:
>

> Also, for those that don't know where we track this:
>
> http://brasstacks.mozilla.com/orangefactor/
>
> I've also looked at what webkit does a little. I really like this view:
>

> http://test-results.appspot.com/dashboards/flakiness_dashboard.html(warning, large table)

>
> If we had a view like that I could see "oh, this test has been flaky,
> probably not worth bugging people" vs "oh, this test has been green forever
> and it's now orange, I should look into it". The key part is that flakiness
> is not really a number...the trend matters.
>

We do have something similar to that, see <
http://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=482975&startday=2011-12-01&endday=2011-12-29&tree=mozilla-central>
for example.

--
Ehsan
<http://ehsanakhgari.org/>

Nicholas Nethercote

unread,

Dec 29, 2011, 3:10:43 PM12/29/11

to Ehsan Akhgari, dev-pl...@lists.mozilla.org, Christian Legnitto, Matt Brubeck

One thing that hasn't been said: the fact that (a) TBPL auto-matches
oranges against bugs, and (b) people like philor are very good at
filing bugs for oranges, means that things are much better than they
used to be. 90% of the oranges I get have a bug suggested for them
that is clearly a match. A year or two ago we didn't have these
features and every orange was a head-scratcher.

Furthermore, inbound is nice because (a) you don't have to watch the
whole time (though I usually do) and (b) there's no "oh my god you
can't break the tree" pressure the way there is with
mozilla-central/aurora/beta.

And TBPL is *so* much better than the old tinderbox.

Overall, in the 3 years I've been working for Mozilla, the situation
is far better now than it was in the past, IMO.

I wonder if the auto-coalescing should be turned off, though.

Nick

Marco Bonardo

unread,

Dec 29, 2011, 3:34:54 PM12/29/11

to

On 29/12/2011 20:07, Ehsan Akhgari wrote:
> On Thu, Dec 29, 2011 at 1:31 PM, Christian Legnitto
> <cleg...@mozilla.com>wrote:

>> If we had a view like that I could see "oh, this test has been flaky,
>> probably not worth bugging people" vs "oh, this test has been green forever
>> and it's now orange, I should look into it". The key part is that flakiness
>> is not really a number...the trend matters.
>>
>
> We do have something similar to that, see<
> http://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=482975&startday=2011-12-01&endday=2011-12-29&tree=mozilla-central>
> for example.

My complain is that this is a system that people have to look at
explicitly, but unfortunately it's not at the level of things developers
look at daily or weekly. I suppose if we'd do a poll very few developers
have looked at the orange factor in the last 2 weeks (let alone it was
christmas, my point is more generic).
So as a first step we may find a way to expose this to developers
periodically in places they usually look at. Thus my suggestion to
publish an oranges digest on planet, like once a week. In future when
everyone is more used to care about oranges maybe the orange factor will
get more attention by itself.

-m

Mark Cote

unread,

Jan 3, 2012, 2:16:47 PM1/3/12

to

Top 10 weekly oranges *should* have been going to the
dev.tree-management mailing list for some time now. I still have to hook
up NNTP support so that it goes to the actual newsgroup, though. A blog
post is also a possibility.

Mark

Mark Cote

unread,

Jan 3, 2012, 8:14:22 PM1/3/12

to

Okay, OrangeFactor reports should now be sent to the dev.tree-management
group every Tuesday morning. I also ran it manually just now; the first
such report is at

http://groups.google.com/group/mozilla.dev.tree-management/browse_thread/thread/62e5415f403f68b4#

(NB: There was a hiccup, or actually a few hiccups, in the last couple
weeks in which we lost the test-run count, hence the "?" OF from the
previous week.)

Is it still worthwhile to have it sent to Planet as well? Or perhaps
cross-posted to other groups?

Mark

Axel Hecht

unread,

Jan 6, 2012, 7:15:54 AM1/6/12

to

I did some research, and it seems that the last patch I landed for the
module I own was landed yesterday, 6 years ago. The glory of owning RDF.
Just to clarify which past I'm talking about.

I'm not debating that we had darker times since then for the tree, but
that's also not really relevant to how I feel about landing code today.

Axel

Robert O'Callahan

unread,

Jan 6, 2012, 7:32:47 AM1/6/12

to Axel Hecht, dev-pl...@lists.mozilla.org

On Sat, Jan 7, 2012 at 1:15 AM, Axel Hecht <ax...@pike.org> wrote:

> I did some research, and it seems that the last patch I landed for the
> module I own was landed yesterday, 6 years ago. The glory of owning RDF.
> Just to clarify which past I'm talking about.
>
> I'm not debating that we had darker times since then for the tree, but
> that's also not really relevant to how I feel about landing code today.
>

Like others on this thread, I feel that the current environment for landing
code is the best it's ever been (for a non-inbound-wrangler), and I've been
committing code regularly for over ten years now.

Six years ago we had fewer platforms and hardly any tests so it's not even
close to a fair comparison, and even so watching the Tinderbox waterfall
was a real pain. And if you did regress a test, e.g. Tp, good luck with no
try-server.