Keeping the tree in good shape

sayrer

unread,

Aug 10, 2010, 1:45:40 PM8/10/10

to

See https://wiki.mozilla.org/Platform/2010-08-10#Tree_Health.

What are some steps we can take to improve the situation?

Andreas Gal

unread,

Aug 10, 2010, 2:10:33 PM8/10/10

to sayrer, dev-pl...@lists.mozilla.org

I disagree with item 2). The try server is my compiler for platforms I am not running on my machine (windows + linux). Developer time is a lot more costly than CPU time. Making me install 2 VMs and wait for compiles costs a ton more than scaling up try server. If we compile on cheap beefy boxes and have a ton of those (hundreds, not dozens) instead of costly and slow VMs, we can scale try server and our build infrastructure arbitrarily. Or even better, use a cloud computing service.

Andreas

On Aug 10, 2010, at 10:45 AM, sayrer wrote:

> See https://wiki.mozilla.org/Platform/2010-08-10#Tree_Health.
>
> What are some steps we can take to improve the situation?

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Mike Beltzner

unread,

Aug 10, 2010, 2:15:02 PM8/10/10

to Andreas Gal, dev-pl...@lists.mozilla.org, sayrer

On 2010-08-10, at 11:10 AM, Andreas Gal wrote:

> I disagree with item 2). The try server is my compiler for platforms I am not running on my machine (windows + linux). Developer time is a lot more costly than CPU time. Making me install 2 VMs and wait for compiles costs a ton more than scaling up try server. If we compile on cheap beefy boxes and have a ton of those (hundreds, not dozens) instead of costly and slow VMs, we can scale try server and our build infrastructure arbitrarily. Or even better, use a cloud computing service.

I think you're making some assumptions, and know from talking to Justin and John that cloud computing services have been tested and shown to not be a viable route for scaling. As the wiki notes state, work is being done to increase capacity, as well.

If you personally require a bank of Minis to act as your compilers (so you don't have to do it across VMs) I'm sure that we can arrange for that hardware to be made available. I don't think that's a universal solution, but I think that you're a pretty special case :)

In the meantime, the issue is people who aren't even trying to compile & test even on one development platform, not across all three, breaking all builds and tests. I think that it's reasonable to ask that people compile and test locally before pushing to try to get the cross-platform coverage.

cheers,
mike

Mike Shaver

unread,

Aug 10, 2010, 2:21:34 PM8/10/10

to Andreas Gal, dev-pl...@lists.mozilla.org, sayrer

Use try server to check other platforms once it works on the one you have.
Things should be green on your platform before you push everywhere. Try
server is for finding things you can't find on your platform, not for
letting people skip running them because they can't be bothered. That
happens, and it has to stop.

Mike

On Aug 10, 2010 2:10 PM, "Andreas Gal" <andre...@gmail.com> wrote:

I disagree with item 2). The try server is my compiler for platforms I am
not running on my machine (windows + linux). Developer time is a lot more
costly than CPU time. Making me install 2 VMs and wait for compiles costs a
ton more than scaling up try server. If we compile on cheap beefy boxes and
have a ton of those (hundreds, not dozens) instead of costly and slow VMs,
we can scale try server and our build infrastructure arbitrarily. Or even
better, use a cloud computing service.

Andreas

On Aug 10, 2010, at 10:45 AM, sayrer wrote:

> See https://wiki.mozilla.org/Platform/2010-08-10#Tre...

Andreas Gal

unread,

Aug 10, 2010, 2:26:41 PM8/10/10

to Mike Beltzner, dev-pl...@lists.mozilla.org, sayrer

Sayrer asked what we can do to improve tree health in general. I am not too worried about my personal development experience here. Any developer's time is more expensive than CPU time. A janitor's time is more expensive than CPU time. Providing every developer with the software and hardware to compile across the 4 platforms we care about (mac, windows, linux, mobile) is not feasible and financially inefficient. And this will only get worse if we add more platforms (Windows Mobile 7 anyone?)

I haven't seen the notes you refer to, but it must be possible to scale a compiler farm in some way. Justin and John are the experts here. If cloud computing isn't the right answer, I am sure they will find some other way. Building is the most parallelizable task in CS I can think of. This problem can be solved.

Andreas

On Aug 10, 2010, at 11:15 AM, Mike Beltzner wrote:

> On 2010-08-10, at 11:10 AM, Andreas Gal wrote:
>
>> I disagree with item 2). The try server is my compiler for platforms I am not running on my machine (windows + linux). Developer time is a lot more costly than CPU time. Making me install 2 VMs and wait for compiles costs a ton more than scaling up try server. If we compile on cheap beefy boxes and have a ton of those (hundreds, not dozens) instead of costly and slow VMs, we can scale try server and our build infrastructure arbitrarily. Or even better, use a cloud computing service.
>

Kyle Huey

unread,

Aug 10, 2010, 2:39:01 PM8/10/10

to sayrer, dev-pl...@lists.mozilla.org

I think the biggest single change we can make to keep the tree
open/healthier is to back people out immediately on red/orange.
Fixing patches on m-c ties up the tree for hours and is really
unnecessary now that we have try. I've been as guilty of fixing
bustage on m-c as anyone, but if we want to land and stick patches
fast enough to keep up with this beta schedule we can't afford to tie
up the tree for an entire day to back out a five changeset push one
patch at a time to find the problem.

- Kyle

Benjamin Smedberg

unread,

Aug 10, 2010, 2:40:58 PM8/10/10

to

On 8/10/10 2:21 PM, Mike Shaver wrote:
> Use try server to check other platforms once it works on the one you have.
> Things should be green on your platform before you push everywhere. Try
> server is for finding things you can't find on your platform, not for
> letting people skip running them because they can't be bothered. That
> happens, and it has to stop.

I think it makes sense to say that your cset has build and start (and pass
the new tests) on your primary platform. But even running mochitests on a
platform takes a couple hours of local computer time, and even worse, you
can't really touch that computer while the tests are running because of
focus issues. So I think that perhaps we need a more nuanced version of the
rule!

--BDS

sayrer

unread,

Aug 10, 2010, 3:04:38 PM8/10/10

to

On Aug 10, 11:40 am, Benjamin Smedberg <benja...@smedbergs.us> wrote:
>
> I think it makes sense to say that your cset has build and start (and pass
> the new tests) on your primary platform.

Yes, that is what I meant. Your build should compile on your platform,
and some relevant set of tests should be run.

We are having problems with uncompiled and/or untested patches being
pushed to shared infrastructure.

- Rob

Boris Zbarsky

unread,

Aug 10, 2010, 3:09:46 PM8/10/10

to

On 8/10/10 2:15 PM, Mike Beltzner wrote:
> In the meantime, the issue is people who aren't even trying to compile& test even on one development platform, not across all three, breaking all builds and tests. I think that it's reasonable to ask that people compile and test locally

Given that our tests now require focus to remain on the test window
while running, so effectively require a dedicated machine to run, or
hours of downtime on the part of the person running them, I'm not sure
it's reasonable to ask that people test locally...

-Boris

sayrer

unread,

Aug 10, 2010, 3:12:17 PM8/10/10

to

It is reasonable to ask that relevant tests are run.

- Rob

Mike Shaver

unread,

Aug 10, 2010, 3:15:43 PM8/10/10

to Benjamin Smedberg, dev-pl...@lists.mozilla.org

On Tue, Aug 10, 2010 at 2:40 PM, Benjamin Smedberg
<benj...@smedbergs.us> wrote:
> I think it makes sense to say that your cset has build and start (and pass
> the new tests) on your primary platform. But even running mochitests on a
> platform takes a couple hours of local computer time, and even worse, you
> can't really touch that computer while the tests are running because of
> focus issues. So I think that perhaps we need a more nuanced version of the
> rule!

I think we need to

- eliminate the tests that are focus-sensitive from the default run
- make the test infrastructure run multiple suites in parallel, since
most developers have multiple cores
- figure out if we have too many tests, and whether they have value
commensurate in their cost
- figure out why tests take so long to run on your computer, because
on *minis* we see all of mochitest taking 1h to 1h20m depending on OS,
and I think that includes transferring the builds!
(http://bit.ly/9uJjaB)

And probably other things too. In the interim, people need to at
least not be *reckless* by using try just because (and I am not making
this up) they don't know how to run our test suites.

But most people don't get review turned around in a few hours either,
so I think putting the patch up for review and then doing the full
test run overnight or whatever is pretty OK. It might mean that we
find some bugs twice (once in review, once in test suite) but I don't
think that's a big deal.

Mike

Boris Zbarsky

unread,

Aug 10, 2010, 3:23:20 PM8/10/10

to

On 8/10/10 3:12 PM, sayrer wrote:
> It is reasonable to ask that relevant tests are run.

Yes, agreed.

-Boris

Ehsan Akhgari

unread,

Aug 10, 2010, 3:23:48 PM8/10/10

to Mike Shaver, Benjamin Smedberg, dev-pl...@lists.mozilla.org

On Tue, Aug 10, 2010 at 3:15 PM, Mike Shaver <mike....@gmail.com> wrote:
> I think we need to
>
> - eliminate the tests that are focus-sensitive from the default run

I don't think that's practical for a large chunk of our tests.

> - make the test infrastructure run multiple suites in parallel, since
> most developers have multiple cores

That would be very helpful indeed, but it depends on the focus
requirement to go away.

> - figure out if we have too many tests, and whether they have value
> commensurate in their cost

I don't think that we have too many tests. Although the number is
very high, I have witnessed tons of bugs in my own patches being
uncovered by seemingly unrelated (and sometimes seemingly unneeded)
tests. Bugs that I would not be able to detect if it were not for
those tests.

> - figure out why tests take so long to run on your computer, because
> on *minis* we see all of mochitest taking 1h to 1h20m depending on OS,
> and I think that includes transferring the builds!
> (http://bit.ly/9uJjaB)

I've experienced very similar test run times on my own machine. (And
I've had to run the entire mochitest-plain suite quite a few times).

> But most people don't get review turned around in a few hours either,
> so I think putting the patch up for review and then doing the full
> test run overnight or whatever is pretty OK. It might mean that we
> find some bugs twice (once in review, once in test suite) but I don't
> think that's a big deal.

I think that's a very good suggestion. I've sometimes posted
follow-up patches for review once the try server runs finish and
uncover a problem. This will save the reviewer's time if (s)he has
looked at the patch before the try server finishes.

--
Ehsan
<http://ehsanakhgari.org/>

Boris Zbarsky

unread,

Aug 10, 2010, 3:32:49 PM8/10/10

to

On 8/10/10 3:15 PM, Mike Shaver wrote:
> - eliminate the tests that are focus-sensitive from the default run

But continue running them on tbox, yes? All the focus tests, for
example, fall into this category.

> - figure out if we have too many tests, and whether they have value
> commensurate in their cost

Fwiw, we don't have enough tests... In my opinion.

> - figure out why tests take so long to run on your computer, because
> on *minis* we see all of mochitest taking 1h to 1h20m depending on OS,
> and I think that includes transferring the builds!
> (http://bit.ly/9uJjaB)

Picking numbers at random, that link shows "all of mochitest" (adding up
parts 1-5) taking about 2h50min on Mac OS X 10.5.5 debug. Where did
that 1h to 1h20min number come from?

(For reference, the comparable numbers on other OSes are: OSX 10.6 ==
1h50min, F12 == 2h21min, F12x64 == 2h1min, Win5.1 == 2h16min, win6.1 ==
2h26min, win6.1x64 == 2h26min (in fact looks like copy-paste from the
32-bit times).)

> But most people don't get review turned around in a few hours either,
> so I think putting the patch up for review and then doing the full
> test run overnight or whatever is pretty OK. It might mean that we
> find some bugs twice (once in review, once in test suite) but I don't
> think that's a big deal.

Is there a single command that will do the full test run (mochitest,
reftest, chrome test, browser test, etc)? If there is, I'd love to run
tests overnight using it...

-Boris

Kyle Huey

unread,

Aug 10, 2010, 3:37:36 PM8/10/10

to Boris Zbarsky, dev-pl...@lists.mozilla.org

> Is there a single command that will do the full test run (mochitest,
> reftest, chrome test, browser test, etc)? If there is, I'd love to run
> tests overnight using it...

AFAIK no, but that would be fairly easy to add.

Shawn Wilsher

unread,

Aug 10, 2010, 3:37:52 PM8/10/10

to dev-pl...@lists.mozilla.org

On 8/10/2010 11:10 AM, Andreas Gal wrote:
> I disagree with item 2). The try server is my compiler for platforms I am not running on my machine (windows + linux). Developer time is a lot more costly than CPU time. Making me install 2 VMs and wait for compiles costs a ton more than scaling up try server. If we compile on cheap beefy boxes and have a ton of those (hundreds, not dozens) instead of costly and slow VMs, we can scale try server and our build infrastructure arbitrarily. Or even better, use a cloud computing service.

Note that it says *all* platforms. You are hopefully compiling and
maybe testing on at least one platform.

Cheers,

Shawn

Mike Shaver

unread,

Aug 10, 2010, 3:41:34 PM8/10/10

to Boris Zbarsky, dev-pl...@lists.mozilla.org

On Tue, Aug 10, 2010 at 3:32 PM, Boris Zbarsky <bzba...@mit.edu> wrote:
> On 8/10/10 3:15 PM, Mike Shaver wrote:
>>
>> - eliminate the tests that are focus-sensitive from the default run
>
> But continue running them on tbox, yes? All the focus tests, for example,
> fall into this category.

Sure, or even running them nightly.

>> - figure out if we have too many tests, and whether they have value
>> commensurate in their cost
>
> Fwiw, we don't have enough tests... In my opinion.

I agree that we don't have enough test coverage, yeah; I was
advocating my own devilry, or similar. It may be that consolidating
tests to reduce the amount of time in setup/teardown or similar would
pay dividends, though.

>> - figure out why tests take so long to run on your computer, because
>> on *minis* we see all of mochitest taking 1h to 1h20m depending on OS,
>> and I think that includes transferring the builds!
>> (http://bit.ly/9uJjaB)
>
> Picking numbers at random, that link shows "all of mochitest" (adding up
> parts 1-5) taking about 2h50min on Mac OS X 10.5.5 debug. Where did that 1h
> to 1h20min number come from?

I was looking at opt builds (1-5 + other), which are the test results
that people mostly watch for on try server as well, I think.

> Is there a single command that will do the full test run (mochitest,
> reftest, chrome test, browser test, etc)? If there is, I'd love to run
> tests overnight using it...

I don't believe there is, though I thought there was until I went
looking for it a couple of weeks ago. :-/

Mike

Shawn Wilsher

unread,

Aug 10, 2010, 3:45:32 PM8/10/10

to dev-pl...@lists.mozilla.org

On 8/10/2010 12:32 PM, Boris Zbarsky wrote:
> Is there a single command that will do the full test run (mochitest,
> reftest, chrome test, browser test, etc)? If there is, I'd love to run
> tests overnight using it...

In your object directory:
[py]make mochitest check xpcshell-tests

Note, that doesn't do reftests (not sure how to do that offhand), but it
covers most of our tests.

Cheers,

Shawn

Ehsan Akhgari

unread,

Aug 10, 2010, 3:48:23 PM8/10/10

to Shawn Wilsher, dev-pl...@lists.mozilla.org

On Tue, Aug 10, 2010 at 3:45 PM, Shawn Wilsher <sdw...@mozilla.com> wrote:
> On 8/10/2010 12:32 PM, Boris Zbarsky wrote:
>>

>> Is there a single command that will do the full test run (mochitest,
>> reftest, chrome test, browser test, etc)? If there is, I'd love to run
>> tests overnight using it...
>

> In your object directory:
> [py]make mochitest check xpcshell-tests

Then I wold think that

make mochitest check xpcshell-tests reftest crashtest

is the holy grail here...

--
Ehsan
<http://ehsanakhgari.org/>

Kyle Huey

unread,

Aug 10, 2010, 3:50:15 PM8/10/10

to Ehsan Akhgari, Shawn Wilsher, dev-pl...@lists.mozilla.org

We should add some makefile magic for all-tests or something similar.
File a bug?

- Kyle

On Tue, Aug 10, 2010 at 12:48 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> On Tue, Aug 10, 2010 at 3:45 PM, Shawn Wilsher <sdw...@mozilla.com> wrote:
>> On 8/10/2010 12:32 PM, Boris Zbarsky wrote:
>>>

>>> Is there a single command that will do the full test run (mochitest,
>>> reftest, chrome test, browser test, etc)? If there is, I'd love to run
>>> tests overnight using it...
>>

>> In your object directory:
>> [py]make mochitest check xpcshell-tests
>
> Then I wold think that
>
> make mochitest check xpcshell-tests reftest crashtest
>
> is the holy grail here...
>
> --
> Ehsan
> <http://ehsanakhgari.org/>

Axel Hecht

unread,

Aug 10, 2010, 4:45:58 PM8/10/10

to

I don't think that backing people out aggressively should be discussed
as long as #developers is basically 50% about how our infrastructure
just failed to report/build/star/whichever.

Axel

Nicholas Nethercote

unread,

Aug 10, 2010, 5:20:28 PM8/10/10

to sayrer, dev-pl...@lists.mozilla.org

On Wed, Aug 11, 2010 at 3:45 AM, sayrer <say...@gmail.com> wrote:
> See https://wiki.mozilla.org/Platform/2010-08-10#Tree_Health.

The ability to stop a try server run partway through would be great.
Eg. if I have a stupid windows compile error that'll usually show up
really quickly. I've heard that RelEng is working on this, it'll be
great.

I wonder if the ability to do a try server run on a subset of machines
is useful (eg. just windows machines). It could get complicated,
though.

N

Ben Hearsum

unread,

Aug 10, 2010, 5:23:57 PM8/10/10

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

There's no self serve way to do this yet, as you mention, but RelEng is
more than happy to do it for you. You can file a bug or ping the
buildduty person to get this done. Be sure to mention the changeset and
which jobs you want killed.

Platform / job selection is also being worked on.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkxhw20ACgkQJE25Np0n+NtyewCdEWHvuK70rOXmWKd62GnrMDvp
MyoAniwn6tQPq349njLeaxxhx9g88ahH
=W+5X
-----END PGP SIGNATURE-----

Axel Hecht

unread,

Aug 10, 2010, 7:03:40 PM8/10/10

to

On 10.08.10 23:06, Shawn Wilsher wrote:

> On 8/10/2010 1:45 PM, Axel Hecht wrote:
>> I don't think that backing people out aggressively should be discussed
>> as long as #developers is basically 50% about how our infrastructure
>> just failed to report/build/star/whichever.

> Why? (and your numbers are pretty skewed)

In reverse polish notation:

My numbers aren't skewed, they're biased by a european timezone. Which
means that I get a bit of silence, and then, most of the time, a
struggle to get the tree back to life.

To your first question, backing people out because the logs on tbpl are
cached as empty and the star magic won't work sounds like a bad
solution. And I'm surprised to see real orange among all that random orange.

Practically, if I land in my daytime, folks that could help me to
diagnose the test results beyond what tbpl does wouldn't be awake before
someone else stepped up and quoted a rule to back me out. Yielding yet
another rule to make me not land code.

Axel

Shawn Wilsher

unread,

Aug 10, 2010, 7:14:28 PM8/10/10

to dev-pl...@lists.mozilla.org

On 8/10/2010 4:03 PM, Axel Hecht wrote:
> To your first question, backing people out because the logs on tbpl are
> cached as empty and the star magic won't work sounds like a bad
> solution. And I'm surprised to see real orange among all that random
> orange.

I think you are alluding to some problem that you aren't fully
explaining (logs that are cached as empty? starring not working?), but
I think it doesn't actually matter to khuey's point. Real orange is
pretty easy to spot in my experience. Even then, backouts are cheap and
if it the patches fault, the logs will be available later, and if it
isn't, the patch can land later. Having the tree be perma orange for a
while is not cheap because we then have to close the tree.

> Practically, if I land in my daytime, folks that could help me to
> diagnose the test results beyond what tbpl does wouldn't be awake before
> someone else stepped up and quoted a rule to back me out. Yielding yet
> another rule to make me not land code.

I was under the impression that we had build folks covering basically
every waking hour of the day. Is this no longer true?

Cheers,

Shawn

Kyle Huey

unread,

Aug 10, 2010, 7:58:48 PM8/10/10

to Axel Hecht, dev-pl...@lists.mozilla.org

> To your first question, backing people out because the logs on tbpl are
> cached as empty and the star magic won't work sounds like a bad solution.
> And I'm surprised to see real orange among all that random orange.

Perhaps "immediately" in my original post was the wrong word to use.
I'm not advocating setting up a bot that backs people out 30 seconds
after a build turns orange. Both Friday and Monday the tree was
closed for the better part of the day because once we discovered that
a patch was causing orange/red/whatever we tried to fix it on
Tinderbox rather than back it out. Once we know that a given *push*
(and I say push, not changeset, because we've wasted a lot of time in
the past trying to figure out which changeset in a six changeset push
causes an issue, and then we back out changeset N and find an hour
later that changeset N+2 depended on it to work properly) has caused
red or new orange that push should be backed out. We can't afford to
use mozilla-central to debug and fix patches.

And FWIW, I don't think potentially new randomorange is a big issue
here. At least over the past few days these patches that have failed
have failed cleanly and clearly, either as red or as orange on
multiple platforms (or in platform specific tests).

- Kyle

Mike Beltzner

unread,

Aug 10, 2010, 8:07:35 PM8/10/10

to Axel Hecht, dev-pl...@lists.mozilla.org

On 2010-08-10, at 4:03 PM, Axel Hecht wrote:

> And I'm surprised to see real orange among all that random orange.

They're not random. They are intermittent, and we need to understand what is causing them to be so, and fix that. Thinking of them as random means that you are permitting yourselves and others to ignore them. That's a problem.

Honestly, I'd rather we enforce "do not check in on orange" even if the cost is that we lose code commits. The alternative is ignoring tests, which is dangerous.

cheers,
mike

Robert O'Callahan

unread,

Aug 10, 2010, 8:14:46 PM8/10/10

to

On 11/08/10 6:40 AM, Benjamin Smedberg wrote:
> But even running
> mochitests on a platform takes a couple hours of local computer time,
> and even worse, you can't really touch that computer while the tests are
> running because of focus issues.

FWIW, running tests in a local VM are a good way to ameliorate those
sorts of problems.

Rob

Robert O'Callahan

unread,

Aug 10, 2010, 8:24:18 PM8/10/10

to

On 11/08/10 9:20 AM, Nicholas Nethercote wrote:
> I wonder if the ability to do a try server run on a subset of machines
> is useful (eg. just windows machines). It could get complicated,
> though.

This is possible already. e.g.
http://blog.mozilla.com/cjones/2010/08/10/filter-your-tryserver-builds-a-bit-more-easily/

Rob

Zack Weinberg

unread,

Aug 10, 2010, 8:37:00 PM8/10/10

to cjo...@mozilla.com, dev-pl...@lists.mozilla.org

That script is awesome, but it has a hardwired, probably already
out-of-date list of platforms in it. (It appears, for instance, that it
cannot disable the win7 tests, which are less-than-useful on the
tryserver at present due to permaorange.) I'm not sure if we even
*have* a canonical, guaranteed-up-to-date list of try server platforms
anywhere in scrapeable format, but wouldn't it be nice if the script
didn't have to be hand-edited?

(Putting my money where my mouth is: if releng commits to maintain a
URL with a machine-parseable list of identifiers that can follow
"mozconfig-extra-", I'll make the script use it.)

zw

Ben Hearsum

unread,

Aug 10, 2010, 8:48:05 PM8/10/10

to Shawn Wilsher, dev-pl...@lists.mozilla.org

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10-08-10 7:14 PM, Shawn Wilsher wrote:
>> Practically, if I land in my daytime, folks that could help me to
>> diagnose the test results beyond what tbpl does wouldn't be awake before
>> someone else stepped up and quoted a rule to back me out. Yielding yet
>> another rule to make me not land code.

We can definitely help with infrastructure issues, but we're far from
experts when it comes to random orange and other such things.

> I was under the impression that we had build folks covering basically
> every waking hour of the day. Is this no longer true?

Somebody is probably around most of the time (we have people in -8, -5,
+3, and +12 of UTC), except for some parts west coast Fridays and much
of that weekend. We only have one person in +3 and one in +12, so if
either of them are gone for any reason we lose coverage. They've also
got other work to do too, and aren't at a buildduty level of
responsiveness all of the time.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkxh80UACgkQJE25Np0n+NsKdwCfWIcAiGUhDlu1nFk9oi5ufyjE
j+4AnRnPMyFsI3iHQZM/oRsThlhVq0E1
=5Dfa
-----END PGP SIGNATURE-----

Ben Hearsum

unread,

Aug 10, 2010, 8:48:05 PM8/10/10

to Shawn Wilsher, dev-pl...@lists.mozilla.org

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10-08-10 7:14 PM, Shawn Wilsher wrote:

>> Practically, if I land in my daytime, folks that could help me to
>> diagnose the test results beyond what tbpl does wouldn't be awake before
>> someone else stepped up and quoted a rule to back me out. Yielding yet
>> another rule to make me not land code.

We can definitely help with infrastructure issues, but we're far from

experts when it comes to random orange and other such things.

> I was under the impression that we had build folks covering basically

> every waking hour of the day. Is this no longer true?

Somebody is probably around most of the time (we have people in -8, -5,

Chris Jones

unread,

Aug 10, 2010, 8:52:53 PM8/10/10

to Zack Weinberg, dev-pl...@lists.mozilla.org

On 08/10/2010 07:37 PM, Zack Weinberg wrote:
> Robert O'Callahan<rob...@ocallahan.org> wrote:
>
>> On 11/08/10 9:20 AM, Nicholas Nethercote wrote:
>>> I wonder if the ability to do a try server run on a subset of
>>> machines is useful (eg. just windows machines). It could get
>>> complicated, though.
>>
>> This is possible already. e.g.
>> http://blog.mozilla.com/cjones/2010/08/10/filter-your-tryserver-builds-a-bit-more-easily/
>
> That script is awesome, but it has a hardwired, probably already
> out-of-date list of platforms in it. (It appears, for instance, that it
> cannot disable the win7 tests, which are less-than-useful on the
> tryserver at present due to permaorange.) I'm not sure if we even
> *have* a canonical, guaranteed-up-to-date list of try server platforms
> anywhere in scrapeable format, but wouldn't it be nice if the script
> didn't have to be hand-edited?
>

That'd be great, I'd love that. FTR the script was just a quick hack to
save me some time. Another option I thought of a while ago was an hg
extension in the same spirit as qimportbz, where one could just |hg pull
-u| its repo to update. Should be more flexible and would have access
to better mq state checking etc. (Maybe they should merge into an mozhg
extension?)

> (Putting my money where my mouth is: if releng commits to maintain a
> URL with a machine-parseable list of identifiers that can follow
> "mozconfig-extra-", I'll make the script use it.)
>

Sure, script or hg extension, this sounds really useful.

Cheers,
Chris

Daniel Holbert

unread,

Aug 10, 2010, 10:54:51 PM8/10/10

to dev-pl...@lists.mozilla.org

RE try pushes that turn every box red:

Note that the recommended way of "skipping" a build (adding 'exit' in
mozconfig-extra-$platform) will actually turn that build red.

So in cases where a developer is using TryServer to debug an issue with a
specific platform that they don't have local builds for, they're likely to
get near-full-red or full-red on their TryServer push, simply because
they've "turned off" the other platforms where they already know their
patch builds correctly.

(I'm not claiming that this is the most common cause of full-red TryServer
cycles -- I just wanted to point out that a full-red TryServer cycle
doesn't necessarily mean that the developer neglected to test locally --
in this case, it'd actually mean they're being a good citizen of the tree
by passing on TryServer cycles that they don't need.)

~Daniel

On 08/10/2010 10:45 AM, sayrer wrote:
> See https://wiki.mozilla.org/Platform/2010-08-10#Tree_Health.
>
> What are some steps we can take to improve the situation?

Ehsan Akhgari

unread,

Aug 10, 2010, 10:56:18 PM8/10/10

to Zack Weinberg, dev-pl...@lists.mozilla.org, cjo...@mozilla.com

On Tue, Aug 10, 2010 at 8:37 PM, Zack Weinberg <zwei...@mozilla.com> wrote:
> Robert O'Callahan <rob...@ocallahan.org> wrote:
>

> That script is awesome, but it has a hardwired, probably already
> out-of-date list of platforms in it. (It appears, for instance, that it
> cannot disable the win7 tests, which are less-than-useful on the
> tryserver at present due to permaorange.) I'm not sure if we even
> *have* a canonical, guaranteed-up-to-date list of try server platforms
> anywhere in scrapeable format, but wouldn't it be nice if the script
> didn't have to be hand-edited?
>

> (Putting my money where my mouth is: if releng commits to maintain a
> URL with a machine-parseable list of identifiers that can follow
> "mozconfig-extra-", I'll make the script use it.)

https://bugzilla.mozilla.org/show_bug.cgi?id=578895#c1

:-)

--
Ehsan
<http://ehsanakhgari.org/>

Chris Jones

unread,

Aug 11, 2010, 12:35:19 AM8/11/10

to Daniel Holbert

On 08/10/2010 09:54 PM, Daniel Holbert wrote:
> RE try pushes that turn every box red:
>
> Note that the recommended way of "skipping" a build (adding 'exit' in
> mozconfig-extra-$platform) will actually turn that build red.
>

That's a good point. I updated
http://people.mozilla.com/~cjones/tryselect to print "CANCELLED-BUILD"
before exiting, so that when we have tbpl parsing logs client-side (bug
585187), it can filter these out based on "CANCELLED-BUILD".

Cheers,
Chris

Neil

unread,

Aug 11, 2010, 4:29:49 AM8/11/10

to

Ehsan Akhgari wrote:

>On Tue, Aug 10, 2010 at 3:15 PM, Mike Shaver <mike....@gmail.com> wrote:
>
>
>>I think we need to
>>
>>- eliminate the tests that are focus-sensitive from the default run
>>
>>
>I don't think that's practical for a large chunk of our tests.
>

Maybe we could split up the focus-sensitive tests, and run those
first... then the other tests could run, possibly in parallel.

--
Warning: May contain traces of nuts.

Robert Kaiser

unread,

Aug 11, 2010, 10:26:20 AM8/11/10

to

Boris Zbarsky schrieb:

> Is there a single command that will do the full test run (mochitest,
> reftest, chrome test, browser test, etc)? If there is, I'd love to run
> tests overnight using it...

Please file a bug!

I have thought about that a few times idly as well, but I rarely am
working on anything where I need it.

People like you surely would profit from it though, and it should be
easy to add.

I'll make sure it's ported to comm-central as well.

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community needs answers to. And most of the time,
I even appreciate irony and fun! :)

Zack Weinberg

unread,

Aug 11, 2010, 1:33:11 AM8/11/10

to Ehsan Akhgari, dev-pl...@lists.mozilla.org, cjo...@mozilla.com

Ehsan Akhgari <ehsan....@gmail.com> wrote:

> On Tue, Aug 10, 2010 at 8:37 PM, Zack Weinberg
> <zwei...@mozilla.com> wrote:
> > Robert O'Callahan <rob...@ocallahan.org> wrote:
> >

> > That script is awesome, but it has a hardwired, probably already
> > out-of-date list of platforms in it. (It appears, for instance,
> > that it cannot disable the win7 tests, which are less-than-useful
> > on the tryserver at present due to permaorange.) I'm not sure if
> > we even *have* a canonical, guaranteed-up-to-date list of try
> > server platforms anywhere in scrapeable format, but wouldn't it be
> > nice if the script didn't have to be hand-edited?
> >
> > (Putting my money where my mouth is: if releng commits to maintain a
> > URL with a machine-parseable list of identifiers that can follow
> > "mozconfig-extra-", I'll make the script use it.)
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=578895#c1
>
> :-)

That's ... less conveniently parseable than I'd like, but I'll see what
I can do. Tomorrow :)

zw

Ben Bucksch

unread,

Aug 11, 2010, 3:47:06 PM8/11/10

to

On 10.08.2010 21:15, Mike Shaver wrote:
> - figure out if we have too many tests, and whether they have value
> commensurate in their cost

I think trying to optimize the tests is a better first approach.

Aren't a lot of these starting up and shutting down the engine? Running
several tests in the same JS engine instance, maybe in a separate
context, might speed up by factor of 10.

Also, at least in the Thunderbird tests, there's a lot of "wait 10
seconds", because there were concurrency issues that nobody bothered to
find out and just put that hack in. That costs a lot of time as well,
aggregated.

> be *reckless* by using try just because (and I am not making
> this up) they don't know how to run our test suites.

Yes, that's a bad reason. However, the testsuites should be easier to
run. For several suites, there's barely any documentation and I had to
ask people to know how to run them, and I was told I need a certain,
freaky setup.

Ben Bucksch

unread,

Aug 11, 2010, 4:12:51 PM8/11/10

to

On 10.08.2010 19:45, sayrer wrote:

See https://wiki.mozilla.org/Platform/2010-08-10#Tree_Health.

What are some steps we can take to improve the situation?

It would be some engineering effort, but what about this:

Let's for a moment assume:

Everybody checks into mozilla-central once the patch compiles and works for him.
Compiling and running all other platforms is too expensive for an individual
Running the testsuite is too expensive as well, even on one platform, all the more on all of them.
We entirely ignore whether the tree is red or orange.

So:

The tinderbox build machines do a depend build for every commit. They are fast, bare machines who come back with the result in a matter of about one minute, normally.
If a changeset is determined to cause red, it's automatically backed out (by tinderbox, i.e. a script, not a human).
Optional: If a changeset is determined to cause orange, it's automatically backed out.
Optional: If there are several commits before the depend builds finished, the build system splits them up, and applies each new changeset on the last known-"good" revision, and builds that.
There are tinderbox build machines who do a full clobber build, in a round, like now
There are tinderbox build machines who do a full testsuite run, in a round, like now
If there's red or orange, the tinderbox system takes all changesets between the bad revision and the last "good" revision, and spawns the kind of build/test that failed for each changeset. That way, the system automatically finds out which changeset to blame. Then the backout routine kicks in.
Optional, in the last step: Only the failing test is run for each changeset, to dramatically speed up the result.

In short:

tinderbox is optimized to give fast results, in < 1 minute (not < 1 hour).
tinderbox operates on changesets, not linear
tinderbox reacts automatically with backout when it determined a patch to cause serious wreckage (red, not orange)

Advantages:

The consequences of red/orange are much faster resolved, in a matter of minutes, not hours. An error of a developer does not cause problems for all the others.
There's no problem to "check in on red" or "on orange" anymore, because there's no overlap and confusion anymore.
Sheriffing is much less needed, because tinderbox does a lot of that automatically (and much faster)

Kyle Huey

unread,

Aug 11, 2010, 4:29:10 PM8/11/10

to Ben Bucksch, dev-pl...@lists.mozilla.org

On Wed, Aug 11, 2010 at 1:12 PM, Ben Bucksch
<ben.buck...@beonex.com> wrote:
> A bunch of stuff

Ignoring for the moment the fanciful builders that finish jobs in one
minute, an automated backout system is probably impossible because of
the intermittent failures.

- Kyle

Mark Banner

unread,

Aug 11, 2010, 4:58:12 PM8/11/10

to

On 11/08/2010 20:47, Ben Bucksch wrote:
> Also, at least in the Thunderbird tests, there's a lot of "wait 10
> seconds", because there were concurrency issues that nobody bothered to
> find out and just put that hack in. That costs a lot of time as well,
> aggregated.

Actually, AFAIK most of those have been taken out, except in the bloat
tests (which afaik most people don't actually run).

Standard8

Ben Bucksch

unread,

Aug 11, 2010, 5:12:52 PM8/11/10

to

On 11.08.2010 22:29, Kyle Huey wrote:
> On Wed, Aug 11, 2010 at 1:12 PM, Ben Bucksch
> <ben.buck...@beonex.com> wrote:
>> A bunch of stuff
> Ignoring for the moment the fanciful builders that finish jobs in one
> minute

Ah yes? I happen to have such a "fanciful" builder: "make -s -j4",
assuming the tree is already full build, i.e. ./configure && time make
-s -j4 && touch netwerk/protocol/http/nsHttpChannel.cpp && time make -s
-j4. First build takes ~35 minutes, the second takes ~1 minute. All make
should do is check which files changed, compile that, link it.

On 2-core Athlon which sells for 200-400 bucks for the whole machine.

To get to the good tree: You just use hg.

> an automated backout system is probably impossible because of
> the intermittent failures.

Intermittent red? I specifically wrote not to back out on orange. Have
you even read and mentally processed my "bunch of stuff" before tossing
it away?

And Beltzer wrote that intermittent orange is are bugs that *need* to be
fixed, and I totally agree. If you care about the tests, you better make
them reliable. It's been years now that this state exists.

Shawn Wilsher

unread,

Aug 11, 2010, 5:25:49 PM8/11/10

to dev-pl...@lists.mozilla.org

On 8/11/2010 2:12 PM, Ben Bucksch wrote:
> Intermittent red? I specifically wrote not to back out on orange. Have
> you even read and mentally processed my "bunch of stuff" before tossing
> it away?

No, you did not specifically say to not back out on orange. You said:
> * Optional: If a changeset is determined to cause orange, it's
> automatically backed out.

Please, let's keep it civil and not accuse people of not reading posts.

Cheers,

Shawn

Kyle Huey

unread,

Aug 11, 2010, 5:31:59 PM8/11/10

to Ben Bucksch, dev-pl...@lists.mozilla.org

On Wed, Aug 11, 2010 at 2:12 PM, Ben Bucksch

<ben.buck...@beonex.com> wrote:
> On 11.08.2010 22:29, Kyle Huey wrote:
>>
>> On Wed, Aug 11, 2010 at 1:12 PM, Ben Bucksch
>> <ben.buck...@beonex.com> wrote:
>>>
>>> A bunch of stuff
>>
>> Ignoring for the moment the fanciful builders that finish jobs in one
>> minute
>
> Ah yes? I happen to have such a "fanciful" builder: "make -s -j4", assuming
> the tree is already full build, i.e. ./configure && time make -s -j4 &&
> touch netwerk/protocol/http/nsHttpChannel.cpp && time make -s -j4. First
> build takes ~35 minutes, the second takes ~1 minute. All make should do is
> check which files changed, compile that, link it.
>
> On 2-core Athlon which sells for 200-400 bucks for the whole machine.

I'm sure Releng would be really interested (and I know I am!) in your
configuration because our builders have some fairly beefy hardware and
are nowhere near that fast.

>> an automated backout system is probably impossible because of
>> the intermittent failures.
>
> Intermittent red? I specifically wrote not to back out on orange. Have you
> even read and mentally processed my "bunch of stuff" before tossing it away?

We have a decent amount of intermittent red (as dholbert notes). You
said backing out on orange was optional, which certainly wouldn't be
possible. I did read your post, I just don't think that most of the
solutions are practical (mostly because of the speed issue, but I'm
more than happy to be proven wrong).

> And Beltzer wrote that intermittent orange is are bugs that *need* to be
> fixed, and I totally agree. If you care about the tests, you better make
> them reliable. It's been years now that this state exists.

I completely agree, but right now our choice is to spend resources
fixing orange or fixing blockers. That situation is not likely to
change until Fx 4 gets close to shipping.

- Kyle

aza...@mozilla.com

unread,

Aug 11, 2010, 6:13:39 PM8/11/10

to

On Aug 10, 5:07 pm, Mike Beltzner <beltz...@mozilla.com> wrote:
> On 2010-08-10, at 4:03 PM, Axel Hecht wrote:
>
> > And I'm surprised to see real orange among all that random orange.
>
> They're not random. They are intermittent, and we need to understand what is causing them to be so, and fix that.

I agree that this is a serious problem. And the amount of intermittent
orange is very substantial - I see a few known intermittent oranges on
try pretty much each time I push, no matter what I push. Time is
wasted figuring out what is a real orange and what isn't, both in
nervously looking at your own pushes, and also at others' (when
deciding if to push after them), and far worse than that, it prevents
the possibility of automated backing out of bad pushes.

This may be a naive thing to say, but is there a reason not to, at
some future point in time, focus almost entirely on fixing important
intermittent oranges, and after that, 'weaken' the remaining ones, so
that when they do fail, they show up as less than orange but more than
success (that is, a warning, not a failure)?

If we did that, then we could automatically back out any test that
causes an orange. Zero tolerance. The tree is always green (+ some
warnings, which are all on tests known to be unreliable, but of low
importance). No need for human intervention at all to fix oranged
trees. No need for people to carefully look at the tree before
pushing, in order to decide if it's "too orange" to push - just push
away (assuming what you are pushing passed try), as the automatic
system will back out any patch before yours that causes failures
anyhow. Push and forget, and get an email back later sometime whether
it stuck or not.

The preparation for this (fixing all important intermittent oranges)
would be hard, and could only be started sometime in the future,
obviously - but maybe it would be worth it?

- azakai

Ben Bucksch

unread,

Aug 11, 2010, 6:22:33 PM8/11/10

to Kyle Huey, dev-pl...@lists.mozilla.org

On 11.08.2010 23:31, Kyle Huey wrote:

> On Wed, Aug 11, 2010 at 2:12 PM, Ben Bucksch
> <ben.buck...@beonex.com> wrote:
>> ./configure &&
>> time make -s -j4 &&
>> touch netwerk/protocol/http/nsHttpChannel.cpp &&
>> time make -s -j4

>> First build takes ~35 minutes, the second takes ~1 minute. All make
>> should do is check which files changed, compile that, link it.

> really interested (and I know I am!) in your configuration

There's nothing special about it, completely standard opt build. Try the
commands above.

> We have a decent amount of intermittent red (as dholbert notes).

Ah, I didn't know that (might have seen a few of those, but considered
them a puzzling hickup). Since when are compiles non-deterministic? I'd
think there's something seriously wrong, then.

> I completely agree, but right now our choice is to spend resources
> fixing orange or fixing blockers. That situation is not likely to
> change until Fx 4 gets close to shipping.

You're also spending a huge amount of time on creating and reviewing
tests (on these very blockers), so obviously the tests are considered
important, so that's not an argument. A testsuite which is ignored
because it's crying wolf all the time is not terribly useful. As this
very argument (we can't do certain reactions to orange, because it may
be wrong) shows.

Zack Weinberg

unread,

Aug 11, 2010, 6:34:53 PM8/11/10

to Kyle Huey, Ben Bucksch, dev-pl...@lists.mozilla.org

Kyle Huey <m...@kylehuey.com> wrote:

> On Wed, Aug 11, 2010 at 2:12 PM, Ben Bucksch
> <ben.buck...@beonex.com> wrote:
> > Ah yes? I happen to have such a "fanciful" builder: "make -s -j4",
> > assuming the tree is already full build, i.e. ./configure && time
> > make -s -j4 && touch netwerk/protocol/http/nsHttpChannel.cpp &&
> > time make -s -j4. First build takes ~35 minutes, the second takes
> > ~1 minute. All make should do is check which files changed, compile
> > that, link it.
> >
> > On 2-core Athlon which sells for 200-400 bucks for the whole
> > machine.
>

> I'm sure Releng would be really interested (and I know I am!) in your
> configuration because our builders have some fairly beefy hardware and
> are nowhere near that fast.

My development machine cost order of US$1500, has eight cores, I run
Linux and use ccache.

$ time make -j8 # no changes at all
...
real 0m40.191s
user 0m35.034s
sys 0m9.501s

The numbers are similar if there really is only one file to recompile -
nearly all of the additional 30s here is constructing libgklayout.a and
then libxul.so:

$ touch ../moz-central/layout/style/nsCSSParser.cpp
$ time make -j8
real 1m9.679s
user 0m51.615s
sys 0m17.389s

But let's take a look at a nontrivial rebuild, eh?

$ (cd ../moz-central && hg pull -u)
...
183 files updated, 0 files merged, 1 files removed, 0 files unresolved
...
$ time make -j8
...
real 8m32.606s
user 36m44.846s
sys 3m4.316s

So I don't think <1 minute turnaround for typical pushes is gonna
happen.

zw

Daniel Holbert

unread,

Aug 11, 2010, 6:35:51 PM8/11/10

to Ben Bucksch, dev-pl...@lists.mozilla.org

On 08/11/2010 03:22 PM, Ben Bucksch wrote:
>> We have a decent amount of intermittent red (as dholbert notes).
>
> Ah, I didn't know that (might have seen a few of those, but considered
> them a puzzling hickup). Since when are compiles non-deterministic? I'd
> think there's something seriously wrong, then.

There are a number of causes -- probably the most frequent is network
congestion (which can make hg clone or wget commands time out on build
boxes).

There's also occasional redness from filesystem corruption or other
random machine-specific blips.

One red that I saw just yesterday is as-yet-undiagnosed:
https://bugzilla.mozilla.org/show_bug.cgi?id=579790

These reds are all relatively infrequent, but the point is that they
*do* happen -- and our developer-caused redness is (mercifully)
infrequent-enough that these sporadic reds would probably represent a
significant proportion of any auto-backouts. And that'd be bad.

~Daniel

Ben Bucksch

unread,

Aug 11, 2010, 6:38:08 PM8/11/10

to

Yes, I should have been more polite. I felt offended by that "> A bunch
of stuff" etc..

It was just a proposal, an idea, I felt it would dramatically improve
the situation. I'm not a build person and can't help. Sorry if that idea
was old news.

Ben

Nick Thomas

unread,

Aug 11, 2010, 9:34:00 PM8/11/10

to Chris Jones, Daniel Holbert

On 11/08/10 4:35 PM, Chris Jones wrote:
> That's a good point. I updated
> http://people.mozilla.com/~cjones/tryselect to print "CANCELLED-BUILD"
> before exiting, so that when we have tbpl parsing logs client-side (bug
> 585187), it can filter these out based on "CANCELLED-BUILD".

echo "TinderboxPrint: CANCELLED-BUILD" and you can do this with
tinderbox & TBPL already.

Chris Jones

unread,

Aug 11, 2010, 11:30:58 PM8/11/10

to Nick Thomas, Daniel Holbert

Cool. Done.

Cheers,
Chris

johnjbarton

unread,

Aug 12, 2010, 1:02:50 AM8/12/10

to

On 8/11/2010 12:47 PM, Ben Bucksch wrote:
...

> Also, at least in the Thunderbird tests, there's a lot of "wait 10
> seconds", because there were concurrency issues that nobody bothered to
> find out and just put that hack in. That costs a lot of time as well,
> aggregated.

We removed all the waits in Firebug's FBTest suite, it's not reliable or
it slows down testing. We watch mutation events until the UI has what
the test expects or we fail. Our tests are more reliable now, if more
painful to write.

jjb

Robert O'Callahan

unread,

Aug 12, 2010, 1:04:30 AM8/12/10

to

On 12/08/10 10:13 AM, aza...@mozilla.com wrote:
> This may be a naive thing to say, but is there a reason not to, at
> some future point in time, focus almost entirely on fixing important
> intermittent oranges, and after that, 'weaken' the remaining ones, so
> that when they do fail, they show up as less than orange but more than
> success (that is, a warning, not a failure)?

Yes. That would mean spending a long period of time working on issues
which mostly do not affect our users.

We need to make it easier to fix intermittent oranges. We've taken some
steps here --- crash stacks, hang stacks, increasing use of VM record
and replay --- but more could be done.

Rob

Robert O'Callahan

unread,

Aug 12, 2010, 1:11:21 AM8/12/10

to

On 12/08/10 10:22 AM, Ben Bucksch wrote:
> A testsuite which is ignored because it's crying wolf all the time is not terribly useful.

It's not being ignored. We frequently back out patches because they
caused test failures. Even more so, people find and fix lots of bugs
before checking in by noticing test failures on try-server.

Intermittent orange sucks, and we need to get better at fixing it, but
we need to keep the problem in perspective. We run hundreds of thousands
of tests on every push and usually we get a handful of intermittent
failures.

Rob

L. David Baron

unread,

Aug 12, 2010, 1:52:02 AM8/12/10

to dev-pl...@lists.mozilla.org

On Tuesday 2010-08-10 10:45 -0700, sayrer wrote:
> See https://wiki.mozilla.org/Platform/2010-08-10#Tree_Health.
>
> What are some steps we can take to improve the situation?

I think one thing we need to get better at is backing out changes
that cause new high-frequency intermittent oranges.

Given a graph of when an orange occurred, it's often pretty easy to
tell approximately when it started, and what changesets might have
been likely to cause it. Better tools would make looking at this
sort of thing easier.

I've seen very high frequency intermittent oranges stay in the tree
for weeks, and when I looked, I figured out which change caused them
in only a few minutes. We need to get better about doing that
rather than just starring and moving on.

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

Mook

unread,

Aug 12, 2010, 3:01:56 AM8/12/10

to

Is the list around 1079 equivalent? That is, grep for
"BRANCHES['tryserver']" and "TRY_SLAVES".

(I am not the build team, don't trust anything I say)

--
Mook

Mike Kristoffersen

unread,

Aug 12, 2010, 4:38:38 AM8/12/10

to

On Wed, 11 Aug 2010 15:13:39 -0700, aza...@mozilla.com wrote:

> On Aug 10, 5:07 pm, Mike Beltzner <beltz...@mozilla.com> wrote:
>> On 2010-08-10, at 4:03 PM, Axel Hecht wrote:
>>
>> > And I'm surprised to see real orange among all that random orange.
>>
>> They're not random. They are intermittent, and we need to understand
>> what is causing them to be so, and fix that.
>
> I agree that this is a serious problem. And the amount of intermittent
> orange is very substantial - I see a few known intermittent oranges on
> try pretty much each time I push, no matter what I push. Time is wasted
> figuring out what is a real orange and what isn't, both in nervously
> looking at your own pushes, and also at others' (when deciding if to
> push after them), and far worse than that, it prevents the possibility
> of automated backing out of bad pushes.

Can someone explain to me why we can't temporary remove whatever test
generate an intermittent orange after a bug has been opened on it?

I'm sure there are very good reasons, just I don't know them ;)

To me it seems like if a test is known to be an intermittent orange then
people wont tend to care about them anyway when submitting, and maybe
worse, they might not care about them after it has actually been fixed.

^__^
MikeK

Neil

unread,

Aug 12, 2010, 4:52:55 AM8/12/10

to

Shawn Wilsher wrote:

Actually I interpreted "cause orange" as excluding known intermittent
orange.

Neil

unread,

Aug 12, 2010, 4:55:27 AM8/12/10

to

Daniel Holbert wrote:

>On 08/11/2010 03:22 PM, Ben Bucksch wrote:
>
>
>>>We have a decent amount of intermittent red (as dholbert notes).
>>>
>>>
>>Ah, I didn't know that (might have seen a few of those, but considered them a puzzling hickup). Since when are compiles non-deterministic? I'd think there's something seriously wrong, then.
>>
>>
>There are a number of causes -- probably the most frequent is network congestion (which can make hg clone or wget commands time out on build boxes).
>

I can't wait for purple to be implemented.

Mark Banner

unread,

Aug 12, 2010, 5:17:05 AM8/12/10

to

On 12/08/2010 06:11, Robert O'Callahan wrote:
> Intermittent orange sucks, and we need to get better at fixing it, but
> we need to keep the problem in perspective. We run hundreds of thousands
> of tests on every push and usually we get a handful of intermittent
> failures.

That's certainly one perspective which is quite true. However, if you
take a look at kaie's stats [1], the tree has only been green for about
1% of the time since the stats began.

I'm quite sure that this adds up to a significant amount of developer
time spent figuring out which random oranges these bugs are (and which
aren't), and starring etc.

Plus you may also start get the depression factor of I want to push, but
oh no the tree is orange again, do I really want to star all these
oranges? (which has been even worse in the past when people then checkin
on top of unstarred oranges whilst you're starring them).

Standard8

[1] http://www.kuix.de/mozilla/tinderboxstat/index.php?tree=Firefox

Justin Dolske

unread,

Aug 12, 2010, 5:23:19 AM8/12/10

to

On 8/12/10 1:38 AM, Mike Kristoffersen wrote:

> Can someone explain to me why we can't temporary remove whatever test
> generate an intermittent orange after a bug has been opened on it?

This has been done quite a few times for tests that are falling over
very frequently.

It's a little less clear what to do when the test only fails
infrequently, but is doing worthwhile testing the rest of the time
(perhaps in an undertested area!)... Intermittent orange sucks, but so
do real regressions.

Justin

Shawn Wilsher

unread,

Aug 12, 2010, 11:37:11 AM8/12/10

to dev-pl...@lists.mozilla.org

On 8/12/2010 2:17 AM, Mark Banner wrote:
> I'm quite sure that this adds up to a significant amount of developer
> time spent figuring out which random oranges these bugs are (and which
> aren't), and starring etc.

It really don't take much time at all with TBPL helping folks find the
bugs and easily adding a comment. There are a few that you have to
actually load the logs on, but those aren't terribly common (in my
experience).

> Plus you may also start get the depression factor of I want to push, but
> oh no the tree is orange again, do I really want to star all these
> oranges? (which has been even worse in the past when people then checkin
> on top of unstarred oranges whilst you're starring them).

The sheriff should be starring builds. If sheriffs aren't doing their
job, this should be raised separately.

Cheers,

Shawn

L. David Baron

unread,

Aug 12, 2010, 11:52:49 AM8/12/10

to Mike Kristoffersen, dev-pl...@lists.mozilla.org

On Thursday 2010-08-12 03:38 -0500, Mike Kristoffersen wrote:
> Can someone explain to me why we can't temporary remove whatever test
> generate an intermittent orange after a bug has been opened on it?

When it's the test's fault, sure.

But it's also not uncommon for some low-level breakage to cause
intermittent orange across a decent number of tests. This happened
last week for a few of the style system mochitests as a result of
something in the tracemonkey merge (bug 583262, I think), and I know
this because people started commenting in the random orange bugs
(e.g., bug 527614) for those tests quite frequently.

When that sort of thing happens, we shouldn't start disabling tests.
We should back out the change that caused the intermittent orange.

Kyle Huey

unread,

Aug 12, 2010, 12:07:25 PM8/12/10

to Daniel Holbert, dev-pl...@lists.mozilla.org

On Thu, Aug 12, 2010 at 8:54 AM, Daniel Holbert <dhol...@mozilla.com> wrote:
> And in the meantime (until the intermittent issue is fixed), the test is
> still useful for catching breakage in whatever code it's testing. (which
> would presumably change it from intermittent-orange to permanent-orange)

This. Even tests that time out or otherwise fail intermittently can
(and do) catch actual regressions by failing differently.

- Kyle

aza...@mozilla.com

unread,

Aug 12, 2010, 12:28:52 PM8/12/10

to

On Aug 12, 9:07 am, Kyle Huey <m...@kylehuey.com> wrote:

> On Thu, Aug 12, 2010 at 8:54 AM, Daniel Holbert <dholb...@mozilla.com> wrote:
> > And in the meantime (until the intermittent issue is fixed), the test is
> > still useful for catching breakage in whatever code it's testing. (which
> > would presumably change it from intermittent-orange to permanent-orange)
>
> This. Even tests that time out or otherwise fail intermittently can
> (and do) catch actual regressions by failing differently.
>
> - Kyle

Good point. But, why not do this - whenever a push has an orange, the
system will run that same failing test 10 more times. If all are ok,
the test is green + a comment about an intermittent failure. If one or
more has failed, the test is an actual failure and is automatically
backed out. Otherwise it sticks.

(Obviously there would still be some false positives, but 10 or some
other number can be chosen to reduce those down to almost 0.)

- azakai

Kyle Huey

unread,

Aug 12, 2010, 12:32:45 PM8/12/10

to aza...@mozilla.com, dev-pl...@lists.mozilla.org

> Good point. But, why not do this - whenever a push has an orange, the
> system will run that same failing test 10 more times. If all are ok,
> the test is green + a comment about an intermittent failure. If one or
> more has failed, the test is an actual failure and is automatically
> backed out. Otherwise it sticks.

The main obstacle to doing this would be that sometimes a test fails
and leaves the browser in a state that causes later failures (tabs
lying around, focus lost, etc). We'd probably have to change the test
harness to restart the browser on the test in question, but that
doesn't sound too difficult (I know, famous last words ...).

- Kyle

aza...@mozilla.com

unread,

Aug 12, 2010, 1:45:35 PM8/12/10

to

If re-running the test itself 10 times doesn't work (still see
failures), then I guess running the entire set of tests it is in for
10 times after that could be done. This would take a while, but only
on intermittent tests that *also* mess up the browser for further
tests (in that case anyhow you need to run all those tests too, since
they also were marked as failures). So probably a rare occurrence.

(This would add some overhead to *real* test failures. But not all the
10 additional runs would be done - since they all fail, you can stop
after the first few. So this seems worth the benefits of completely
automating pushing and backing out, and of removing random oranges
from the tree.)

- azakai

bmoss

unread,

Aug 12, 2010, 3:36:43 PM8/12/10

to

There was a thread on .platform a little while back titled "War on
Orange", I think its worth a quick read. We, tools and automation,
have begun working on various pieces of this puzzle.

* Our intern Andrew has been looking into ways we can solve the
"focus" problem through changes to the frameworks and has made some
good progress on this.
* We are looking into what we need to do to improve "Topfails" which
should enhance our ability to detect, track and deal with test
failures.
* We are trying to create one or more metrics to measure where we are
currently at and be able to tell what kind of progress we are making
against the "Oranges", the "Orange Factor" if you will.
* We are working on developing a mechanism with which we can run/re-
run/skip individual or groups of tests.
* We are looking into how we can run tests in parallel and/or with
remote webserver (on mobile testing, we have seen a significant speed
up not to mention the reduction in required resources).

Any and all feedback is welcome.

Additional references for the interested.

Jesse's etherpad: http://etherpad.mozilla.com:9000/WarOnOrange
Clint's wiki page: https://wiki.mozilla.org/User:Ctalbert/WarOnOrange
Joel's blog post: http://elvis314.wordpress.com/2010/07/05/improving-personal-hygiene-by-adjusting-mochitests/

Kyle Huey

unread,

Aug 12, 2010, 3:46:14 PM8/12/10

to bmoss, dev-pl...@lists.mozilla.org

On Thu, Aug 12, 2010 at 12:36 PM, bmoss <btm...@gmail.com> wrote:
> * We are working on developing a mechanism with which we can run/re-
> run/skip individual or groups of tests.

Is there a bug/user repo/person I should talk to to look at for people
who are interested in this?

- Kyle

Axel Hecht

unread,

Aug 12, 2010, 4:11:46 PM8/12/10

to

On 11.08.10 22:12, Ben Bucksch wrote:
> On 10.08.2010 19:45, sayrer wrote:
>> Seehttps://wiki.mozilla.org/Platform/2010-08-10#Tree_Health.

>>
>> What are some steps we can take to improve the situation?
>

> It would be some engineering effort, but what about this:
>
> Let's for a moment assume:
>
> * Everybody checks into mozilla-central once the patch compiles and
> works for him.
> * Compiling and running all other platforms is too expensive for an
> individual
> * Running the testsuite is too expensive as well, even on one
> platform, all the more on all of them.
> * We entirely ignore whether the tree is red or orange.
>
> So:
>
> * The tinderbox build machines do a depend build for every commit.
> They are fast, bare machines who come back with the result in a
> matter of about one minute, normally.
> * If a changeset is determined to cause red, it's automatically
> backed out (by tinderbox, i.e. a script, not a human).

> * Optional: If a changeset is determined to cause orange, it's
> automatically backed out.

> * Optional: If there are several commits before the depend builds
> finished, the build system splits them up, and applies each new
> changeset on the last known-"good" revision, and builds that.
> * There are tinderbox build machines who do a full clobber build, in
> a round, like now
> * There are tinderbox build machines who do a full testsuite run, in
> a round, like now
> * If there's red or orange, the tinderbox system takes all
> changesets between the bad revision and the last "good" revision,
> and spawns the kind of build/test that failed for each changeset.
> That way, the system automatically finds out which changeset to
> blame. Then the backout routine kicks in.
> * Optional, in the last step: Only the failing test is run for each
> changeset, to dramatically speed up the result.
>
>
> In short:
>
> * tinderbox is optimized to give fast results, in < 1 minute (not <
> 1 hour).
> * tinderbox operates on changesets, not linear
> * tinderbox reacts automatically with backout when it determined a
> patch to cause serious wreckage (red, not orange)
>
> Advantages:
>
> * The consequences of red/orange are much faster resolved, in a
> matter of minutes, not hours. An error of a developer does not
> cause problems for all the others.
> * There's no problem to "check in on red" or "on orange" anymore,
> because there's no overlap and confusion anymore.
> * Sheriffing is much less needed, because tinderbox does a lot of
> that automatically (and much faster)
>

I don't see what's doing the hg merging here?

PS: tinderbox doesn't operate anything, it's kinda sad that we hardly
visualize at all what is actually happening in the releng infra. Most of
that get's thrown away, and then thrown onto tinderbox, and then tbpl
comes in and makes up a completely new story on top of that incomplete
data. It may not be too far away from what happens, but it's not the
complete picture either.

Axel

aza...@mozilla.com

unread,

Aug 12, 2010, 4:43:57 PM8/12/10

to

On Aug 12, 12:36 pm, bmoss <btm...@gmail.com> wrote:
> * We are working on developing a mechanism with which we can run/re-
> run/skip individual or groups of tests.

Actually just re-running an entire test job (as in, Debug Linux
Mochitest #5, etc.) would be great. With that we could entirely
automate pushing and backing out (as discussed earlier.)

Being able to run individual tests or small groups of them would be a
significant optimization, but the move from manually looking for
oranges and manually backing out ==> an automatic system that handles
oranges&backouts, would be much more important.

- azakai

bmoss

unread,

Aug 12, 2010, 4:45:52 PM8/12/10

to

Clint or Joel would be the right folks to chat with.

Bob

Nicholas Nethercote

unread,

Aug 12, 2010, 8:57:49 PM8/12/10

to Kyle Huey, Axel Hecht, dev-pl...@lists.mozilla.org

On Tue, Aug 10, 2010 at 4:58 PM, Kyle Huey <m...@kylehuey.com> wrote:
>
> We can't afford to use mozilla-central to debug and fix patches.

The JS team has their own tracemonkey repo, and Sayre merges it to m-c
every few days. This seems to work well. From my point of view, I
find landing a patch on TM much less stressful than I would landing it
on mozilla-central, because if I stuff it up I'll be inconveniencing a
much smaller group of people.

(A similar thing happens with Nanojit -- it has its own repo because
it's shared with Adobe, and I'm responsible for merging changes to TM
every so often. It too works well.)

Would having more staging repos like this cause problems?

Nick

Ehsan Akhgari

unread,

Aug 12, 2010, 11:39:34 PM8/12/10

to Mook, Lukas Sebastian Blakk, dev-pl...@lists.mozilla.org

I have no idea, but Lukas might!

Ehsan

Mike Kristoffersen

unread,

Aug 13, 2010, 3:04:42 AM8/13/10

to

Hmm... Do we have any indication of typical frequency ranges that we see
our intermittent failures?

Now statistics was never my strong point, and assuming I understand the
above correctly, wouldn't it seem that it will not be too uncommon that
a push that doesn't lead to failures 100% of the times would have a
significant chance of getting its first test run to succeed, and hence
make following pushes more likely to be backed out in error.

I don't know if we should do the 10x run on the previous version to
detect if it was indeed the latest push that introduced the error.

I'm all in for automation of anything that can be automated, but I think
we should consider that if we automate the push process too much, there
is a risk that some people will think "Oh, I'll just try to push my
patch, it will be backed out automatically anyway if there is a problem
with it" (like what has happened on the try server and some people seem
to think is a misuse of shared resources).

^__^
MikeK

btw: Thank you, to all of you who helped me understand the reasons for
not automatically disabling tests that failed intermittent after opening
bugs on them, still digesting the explanations.

Mike Kristoffersen

unread,

Aug 13, 2010, 4:24:17 AM8/13/10

to

On Thu, 12 Aug 2010 17:57:49 -0700, Nicholas Nethercote wrote:
>
> Would having more staging repos like this cause problems?
>
> Nick

I think it is a good idea, but as the individual patches now land in
bigger chunks, I would guess that the risk of merging problems increase?
Not sure if the total merging effort increases or decreases thou...

^__^
MikeK

Blair McBride

unread,

Aug 13, 2010, 7:59:38 AM8/13/10

to dev-pl...@lists.mozilla.org

On 13/08/10 8:24 PM, Mike Kristoffersen wrote:
>> Would having more staging repos like this cause problems?
>> >
>> > Nick
> I think it is a good idea, but as the individual patches now land in
> bigger chunks, I would guess that the risk of merging problems increase?

Not if the code that's being worked on is something that's mostly
self-contained, or not otherwise worked on in mozilla-central. If not,
then merging from mozilla-central into the staging repo regularly should
(usually) make any merge issues minor. Then once you merge back into
mozilla-central (which would happen less often), there shouldn't be any
merge issues. Of course, that strategy does require someone to actively
maintain the staging repo.

- Blair

Lukas Blakk

unread,

Aug 13, 2010, 10:06:25 AM8/13/10

to Ehsan Akhgari, Mook, dev-pl...@lists.mozilla.org

On 10-08-12 8:39 PM, Ehsan Akhgari wrote:
> On 10-08-12 3:01 AM, Mook wrote:
>

> I have no idea, but Lukas might!
>
> Ehsan
>

If you are looking for a list of which platforms you could write a
mozconfig-extra for then look here:
http://hg.mozilla.org/build/buildbot-configs/file/5be6f01c2027/mozilla/config.py#l54

I have working code in staging right now for bug 73184, so using your
commit message to specify builds should become available sometime during
the next week please see:
https://wiki.mozilla.org/User:Lukasblakk/TryServerSyntax#Usage_Examples
for details on how that will work. The bonus of the commit message is
that you will be able to not only ask for certain desktop/mobile
platform options but you will also be able to run a particular test or
talos suite.

Cheers,
Lukas

aza...@mozilla.com

unread,

Aug 13, 2010, 4:47:46 PM8/13/10

to

>
> > Good point. But, why not do this - whenever a push has an orange, the
> > system will run that same failing test 10 more times. If all are ok,
> > the test is green + a comment about an intermittent failure. If one or
> > more has failed, the test is an actual failure and is automatically
> > backed out. Otherwise it sticks.
>
> > (Obviously there would still be some false positives, but 10 or some
> > other number can be chosen to reduce those down to almost 0.)
>
> > - azakai
>
> Hmm... Do we have any indication of typical frequency ranges that we see
> our intermittent failures?
>
> Now statistics was never my strong point, and assuming I understand the
> above correctly, wouldn't it seem that it will not be too uncommon that
> a push that doesn't lead to failures 100% of the times would have a
> significant chance of getting its first test run to succeed, and hence
> make following pushes more likely to be backed out in error.

The assumption is that a test with intermittent failures will fail,
for no reason, at some frequency - but that if it *passes*, it must be
valid. So we can rule out those intermittent failures using
statistics.

If that is not the case - if bad patches can succeed through luck -
then we have far worse problems than this, it would mean we
*currently* are entering bad patches and have very little way to
prevent that. In other words not only an automatic system would fail
here, but the current one. But I don't think that is the case. (Please
correct me if I'm wrong though.)

>
> I don't know if we should do the 10x run on the previous version to
> detect if it was indeed the latest push that introduced the error.
>
> I'm all in for automation of anything that can be automated, but I think
> we should consider that if we automate the push process too much, there
> is a risk that some people will think "Oh, I'll just try to push my
> patch, it will be backed out automatically anyway if there is a problem
> with it" (like what has happened on the try server and some people seem
> to think is a misuse of shared resources).

This is a big risk, I fully agree. It can be dealt with in various
ways. But I don't think the solution is "keep things slow&unautomated
because if make things fast then people will abuse the speed" ;)

- azakai

lsblakk

unread,

Aug 13, 2010, 5:23:59 PM8/13/10

to

On Aug 13, 7:06 am, Lukas Blakk <lsbl...@mozilla.com> wrote:

> I have working code in staging right now for bug 73184,

Sorry - that should have been https://bugzilla.mozilla.org/show_bug.cgi?id=473184

Cheers,
Lukas

Robert O'Callahan

unread,

Aug 15, 2010, 6:03:15 PM8/15/10

to

On 14/08/10 2:06 AM, Lukas Blakk wrote:
> I have working code in staging right now for bug 73184, so using your
> commit message to specify builds should become available sometime during
> the next week please see:
> https://wiki.mozilla.org/User:Lukasblakk/TryServerSyntax#Usage_Examples
> for details on how that will work. The bonus of the commit message is
> that you will be able to not only ask for certain desktop/mobile
> platform options but you will also be able to run a particular test or
> talos suite.

The syntax looks pretty good. The only comment I have is that instead of
'b' for both, maybe allow 'od'/'do'? That's probably slightly easier to
remember and more extensible if we ever add other build types.

Rob

Justin Lebar

unread,

Aug 16, 2010, 12:18:57 PM8/16/10

to

On Aug 10, 10:45 am, sayrer <say...@gmail.com> wrote:
> Seehttps://wiki.mozilla.org/Platform/2010-08-10#Tree_Health.
>
> What are some steps we can take to improve the situation?

As a general comment, it seems to me that we do things backwards: The
reverse of our "check in, then test" paradigm seems much more
sensible.

I realize this isn't a place we can get to tomorrow, but I think
there's a bigger picture here than just the pain points of our current
system (intermittent orange, having to watch the tree for hours,
difficulty of deciding what to back out and when).

Mike Beltzner

unread,

Aug 16, 2010, 12:55:08 PM8/16/10

to Justin Lebar, dev-pl...@lists.mozilla.org

On 2010-08-16, at 12:18 PM, Justin Lebar wrote:

> I realize this isn't a place we can get to tomorrow, but I think
> there's a bigger picture here than just the pain points of our current
> system (intermittent orange, having to watch the tree for hours,
> difficulty of deciding what to back out and when).

We can get pretty close today with the Tryserver, though. Two big projects (Tab Candy, Firefox Sync) recently landed on mozilla-central with very little surprise and pain; both were tested repeatedly and thoroughly on tryserver for test and performance regressions beforehand, and many changes were made based on those test cycles.

cheers,
mike

Boris Zbarsky

unread,

Aug 23, 2010, 2:42:04 PM8/23/10

to

On 8/13/10 4:47 PM, aza...@mozilla.com wrote:
> If that is not the case - if bad patches can succeed through luck -
> then we have far worse problems than this, it would mean we
> *currently* are entering bad patches and have very little way to
> prevent that. In other words not only an automatic system would fail
> here, but the current one. But I don't think that is the case. (Please
> correct me if I'm wrong though.)

We have some tests that will almost always fail on bad patches but might
succeed through a freak coincidence every so often. Not many, but a few.

-Boris