Improving Mac OS X 10.6 test wait times by reducing 10.7 load

Armen Zambrano G.

unread,

Apr 25, 2013, 1:30:38 PM4/25/13

to

(please follow up through mozilla.dev.planning)

Hello all,
I have recently been looking into our Mac OS X test wait times which
have been bad for many months and progressively getting worst.
Less than 80% of test jobs on OS X 10.6 and 10.7 are able to start
within 15 minutes of being requested.
This slows down getting tests results for OS X and makes tree closures
longer if we have Mac OS X test back logs.
Unfortunately, we can't buy any more revision 4 Mac minis (they're not
sold anymore) as Apple discontinues old hardware as new ones comes out.

In order to improve the turnaround time for Mac testing, we have to look
into reducing our test load in one of these two OSes (both of them run
on revision 4 minis).
We have over a third of our OS X users running 10.6. Eventually, down
the road, we could drop 10.6 but we still have a significant amount of
our users there; even though Mac stopped serving them major updates
since July 2011 [1].

Our current Mac OS X distribution looks like this:
* 10.6 - 43%
* 10.7 - 30%
* 10.8 - 27%
OS X 10.8 is the only version that is growing.

In order to improve our wait times, I propose that we stop testing on
tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as
10.6 to increase our capacity.

Please let us know if this plan is unacceptable and needs further
discussion.

best regards,
Armen Zambrano - Mozilla's Release Engineering

Andreas Gal

unread,

Apr 25, 2013, 1:40:34 PM4/25/13

to Armen Zambrano G., dev-pl...@lists.mozilla.org

How many 10.7 machines do we operate in that pool?

Andreas

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Justin Lebar

unread,

Apr 25, 2013, 1:55:03 PM4/25/13

to Andreas Gal, dev-pl...@lists.mozilla.org, Armen Zambrano G.

It would be nice if we had data indicating how often tests fail on
just one version of MacOS, so we didn't have guess how useful having
10.6, 10.7, and 10.8 tests are. That's bug 860870. It's currently
blocked on treeherder, but maybe it should be re-prioritized, since we
keep running into cases where this data would be helpful.

Anyway, disabling the 10.7 tests sounds reasonable to me given no
data, but maybe we continue running these tests on m-c? Maybe we also
deprecate the 10.7 tests on tryserver, so you only get the tests if
you really really want them?

On Thu, Apr 25, 2013 at 1:40 PM, Andreas Gal <g...@mozilla.com> wrote:
>
> How many 10.7 machines do we operate in that pool?
>
> Andreas
>
> On Apr 25, 2013, at 10:30 AM, "Armen Zambrano G." <arm...@mozilla.com> wrote:
>

Armen Zambrano G.

unread,

Apr 25, 2013, 2:02:24 PM4/25/13

to Justin Lebar, Andreas Gal

On 2013-04-25 1:40 PM, Andreas Gal wrote:>
> How many 10.7 machines do we operate in that pool?
>
> Andreas

84 of them are 10.6
86 of them are 10.7

Unfortunately, we have a lot of them down (maybe a dozen) trying to fix
them (broken hard drives, bad memory, NIC). They don't have warranty.

On 2013-04-25 1:55 PM, Justin Lebar wrote:
> It would be nice if we had data indicating how often tests fail on
> just one version of MacOS, so we didn't have guess how useful having
> 10.6, 10.7, and 10.8 tests are. That's bug 860870. It's currently
> blocked on treeherder, but maybe it should be re-prioritized, since we
> keep running into cases where this data would be helpful.
>

It would be nice indeed.

> Anyway, disabling the 10.7 tests sounds reasonable to me given no
> data, but maybe we continue running these tests on m-c? Maybe we also
> deprecate the 10.7 tests on tryserver, so you only get the tests if
> you really really want them?
>

We could come to the compromise of running them on m-c, m-a, m-b and
m-r. Only this would help a lot since most of the load comes from m-i
and try. We could make it a non-by-default platform on try.
I assume that the wait times for 10.6 should be good enough but we
should be willing to revisit later down the road if they get bad again.

We can start with decreasing the load and visit again down the road.

Sounds good?

cheers,
Armen

Alex Keybl

unread,

Apr 25, 2013, 2:35:39 PM4/25/13

to Armen Zambrano G., Justin Lebar, Andreas Gal, dev-pl...@lists.mozilla.org

> We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try.

This strategy would prevent any holes in our coverage, but accomplish the goal of reducing load. Seems very reasonable, given how infrequently I've seen tests fail for one OS X version but not another.

-Alex

Justin Lebar

unread,

Apr 25, 2013, 2:39:46 PM4/25/13

to Alex Keybl, dev-pl...@lists.mozilla.org, Andreas Gal, Armen Zambrano G.

>> We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try.

I wonder if we should do the same for debug 10.6 tests (and maybe builds).

The fact of the matter is that coalescing reduces our test coverage on
m-i as it is; so long as we run these tests on central and we're OK
with occasional bustage there, this seems pretty reasonable to me.

On Thu, Apr 25, 2013 at 2:35 PM, Alex Keybl <ake...@mozilla.com> wrote:
>> We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try.
>

> This strategy would prevent any holes in our coverage, but accomplish the goal of reducing load. Seems very reasonable, given how infrequently I've seen tests fail for one OS X version but not another.
>
> -Alex
>

Andrew McCreight

unread,

Apr 25, 2013, 2:44:45 PM4/25/13

to Justin Lebar, dev-pl...@lists.mozilla.org

----- Original Message -----
> >> We could come to the compromise of running them on m-c, m-a, m-b
> >> and m-r. Only this would help a lot since most of the load comes
> >> from m-i and try. We could make it a non-by-default platform on
> >> try.
>
> I wonder if we should do the same for debug 10.6 tests (and maybe
> builds).

I think all three platforms use a single build, so it is just the number of tests run that will be reduced.

Andrew

Armen Zambrano Gasparnian

unread,

Apr 25, 2013, 2:47:50 PM4/25/13

to Justin Lebar, Alex Keybl, Andreas Gal, dev-pl...@lists.mozilla.org

On 2013-04-25 2:39 PM, Justin Lebar wrote:
>>> We could come to the compromise of running them on m-c, m-a, m-b and m-r. Only this would help a lot since most of the load comes from m-i and try. We could make it a non-by-default platform on try.

> I wonder if we should do the same for debug 10.6 tests (and maybe builds).

Is this what you're saying?
* 10.6 opt tests - per-checkin (no change)
* 10.6 debug tests - reduced
* 10.7 opt tests - reduced
* 10.7 debug tests - reduced

* reduced --> m-c, m-a, m-b, m-r, esr17

>
> The fact of the matter is that coalescing reduces our test coverage on
> m-i as it is; so long as we run these tests on central and we're OK
> with occasional bustage there, this seems pretty reasonable to me.

Great!

Justin Lebar

unread,

Apr 25, 2013, 3:14:10 PM4/25/13

to Armen Zambrano Gasparnian, Andreas Gal, Alex Keybl, dev-pl...@lists.mozilla.org

> Is this what you're saying?
> * 10.6 opt tests - per-checkin (no change)
> * 10.6 debug tests - reduced
> * 10.7 opt tests - reduced
> * 10.7 debug tests - reduced
>
> * reduced --> m-c, m-a, m-b, m-r, esr17

Yes.

Now that I think about this more, maybe we should go big or go home:
change 10.6 opt tests to reduced as well, and see how it goes. We can
always change it back.

If it goes well, we can try to do the same thing with the Windows tests.

We should get the sheriffs to sign off.

On Thu, Apr 25, 2013 at 2:47 PM, Armen Zambrano Gasparnian
<arm...@mozilla.com> wrote:
> On 2013-04-25 2:39 PM, Justin Lebar wrote:
>>>>

>>>> We could come to the compromise of running them on m-c, m-a, m-b and
>>>> m-r. Only this would help a lot since most of the load comes from m-i and
>>>> try. We could make it a non-by-default platform on try.
>>

Ed Morley

unread,

Apr 25, 2013, 4:12:16 PM4/25/13

to Justin Lebar, Armen Zambrano Gasparnian, Andreas Gal, Alex Keybl, dev-pl...@lists.mozilla.org

On 25 April 2013 20:14:10, Justin Lebar wrote:
>> Is this what you're saying?
>> * 10.6 opt tests - per-checkin (no change)
>> * 10.6 debug tests - reduced
>> * 10.7 opt tests - reduced
>> * 10.7 debug tests - reduced
>>
>> * reduced --> m-c, m-a, m-b, m-r, esr17
>
> Yes.
>
> Now that I think about this more, maybe we should go big or go home:
> change 10.6 opt tests to reduced as well, and see how it goes. We can
> always change it back.
>
> If it goes well, we can try to do the same thing with the Windows tests.
>
> We should get the sheriffs to sign off.

Worth a shot, we can always revert :-) Only thing I might add, is that
we'll need a way to opt into 10.6 test jobs on Try, in case someone has
to debug issues found on mozilla-central (eg using sfink's undocumented
OS version specific syntax).

Ed

jmaher

unread,

Apr 26, 2013, 9:10:03 AM4/26/13

to

I had to revert a talos change on inbound due to 10.6 failures only just on Wednesday. This was due to a different version of python on 10.6 :(

-Joel

Armen Zambrano G.

unread,

Apr 26, 2013, 9:49:18 AM4/26/13

to

Maybe we can keep one of the talos jobs around? (until releng fixes the
various python versions' story)

IIUC this was more of an infra issue rather than a Firefox testing issue.

jmaher

unread,

Apr 26, 2013, 10:21:32 AM4/26/13

to

On Friday, April 26, 2013 9:49:18 AM UTC-4, Armen Zambrano G. wrote:
>
> Maybe we can keep one of the talos jobs around? (until releng fixes the
>
> various python versions' story)
>
> IIUC this was more of an infra issue rather than a Firefox testing issue.

It was infra related, but it was specific to the 10.6 platform. Even knowing that, I fully support the proposed plan. We could have easily determined the root cause of the 10.6 specific failure a day later on a different branch.

Phil Ringnalda

unread,

Apr 26, 2013, 10:53:22 AM4/26/13

to

So what we're saying is that we are going to completely reverse our
previous tree management policy?

Currently, m-c is supposed to be the tree that's safely unbroken, and we
know it's unbroken because the tests that we run on it have already been
run on the tree that merged into it, and you should almost never push
directly to it unless you're in a desperate hurry to hit a nightly.

This change would mean that we expect to have merges of hundreds of
csets from inbound sometimes break m-c with no idea which one broke it,
that we expect to sometimes have permaorange on it for days, and that
it's better to push your widget/cocoa/ pushes directly to m-c than to
inbound.

Justin Lebar

unread,

Apr 26, 2013, 11:11:59 AM4/26/13

to Phil Ringnalda, dev-pl...@lists.mozilla.org

> So what we're saying is that we are going to completely reverse our
> previous tree management policy?

Basically, yes.

Although, due to coalescing, do you always have a full run of tests on
the tip of m-i before merging to m-c?

A better solution would be to let you trigger a full set of tests (w/o
coalescing) on m-i before merging to m-c. We've been asking for a
similar feature for tryserver (let us add new jobs to my push) for a
long time. Perhaps if we made this change, we could get releng to
implement that feature sooner rather than later, particularly if this
change caused pain to other teams who pull from a broken m-c.

I am not above effecting a sense of urgency in order to get bugs fixed. :)

> Currently, m-c is supposed to be the tree that's safely unbroken, and we
> know it's unbroken because the tests that we run on it have already been
> run on the tree that merged into it, and you should almost never push
> directly to it unless you're in a desperate hurry to hit a nightly.
>
> This change would mean that we expect to have merges of hundreds of
> csets from inbound sometimes break m-c with no idea which one broke it,
> that we expect to sometimes have permaorange on it for days, and that
> it's better to push your widget/cocoa/ pushes directly to m-c than to
> inbound.

Ryan VanderMeulen

unread,

Apr 26, 2013, 11:19:17 AM4/26/13

to

On 4/26/2013 11:11 AM, Justin Lebar wrote:
>> So what we're saying is that we are going to completely reverse our
>> previous tree management policy?
>
> Basically, yes.
>
> Although, due to coalescing, do you always have a full run of tests on
> the tip of m-i before merging to m-c?
>

Yes. Note that we generally aren't merging inbound tip to m-c - we're
taking a known-green cset (including PGO tests).

Justin Lebar

unread,

Apr 26, 2013, 11:29:30 AM4/26/13

to Ryan VanderMeulen, Phil Ringnalda, dev-pl...@lists.mozilla.org

As a compromise, how hard would it be to run the Mac 10.6 and 10.7
tests on m-i occasionally, like we run the PGO tests? (Maybe we could
trigger them on the same csets as we run PGO; it seems like that would
be useful.)

Phil Ringnalda

unread,

Apr 26, 2013, 11:45:25 AM4/26/13

to

On 4/26/13 8:11 AM, Justin Lebar wrote:
>> So what we're saying is that we are going to completely reverse our
>> previous tree management policy?
>
> Basically, yes.
>
> Although, due to coalescing, do you always have a full run of tests on
> the tip of m-i before merging to m-c?

It's not just coincidence that the tip of most m-i -> m-c merges is a
backout - for finding a mergeable cset in the daytime, you're usually
looking at the last backout during a tree closure, when we sat and
waited to get tests run on it. Otherwise, you pick one that looks
possible, and then figure out what got coalesced up and see how that did
where it got coalesced.

Armen Zambrano G.

unread,

Apr 26, 2013, 11:50:27 AM4/26/13

to

Would we be able to go back to where we disabled 10.7 altogether?
Product (Asa in separate thread) and release drivers (Akeybl) were OK to
the compromise of version specific test coverage being removed completely.

Side note: adding Mac PGO would increase the build load (Besides this we
have to do a large PO as we expect Mac wait times to be showing up as
general load increases).

Not all reducing load approaches are easy to implement (due to the way
that buildbot is designed) and it does not ensure that we would reduce
it enough. It's expensive enough to support 3 different versions of Mac
as is without bringing 10.9 into the table. We have to cut things at times.

One compromise that would be easy to implement and *might* reduce the
load is to disable all debug jobs for 10.7.

cheers,
Armen

Armen Zambrano G.

unread,

Apr 26, 2013, 12:10:07 PM4/26/13

to

Just disabling debug and talos jobs for 10.7 should reduce more than 50%
of the load on 10.7. That might be sufficient for now.

Any objections on this plan?
We can re-visit later on if we need more disabled.

cheers,
Armen

Justin Lebar

unread,

Apr 26, 2013, 12:14:56 PM4/26/13

to Armen Zambrano G., dev-pl...@lists.mozilla.org

> Would we be able to go back to where we disabled 10.7 altogether?

On m-i and try only, or everywhere?

Armen Zambrano G.

unread,

Apr 26, 2013, 1:03:46 PM4/26/13

to

On 2013-04-26 12:14 PM, Justin Lebar wrote:
>> Would we be able to go back to where we disabled 10.7 altogether?
>

> On m-i and try only, or everywhere?

The initial proposal was for disabling everywhere.

We could leave 10.7 opt jobs running everywhere as a compromise and
re-visit after I re-purpose the first batch of machines.

best regards,
Armen

Justin Lebar

unread,

Apr 26, 2013, 1:31:52 PM4/26/13

to Armen Zambrano G., dev-pl...@lists.mozilla.org

I don't think I'm comfortable disabling this platform across the
board, or even disabling debug-only runs across the board.

As jmaher pointed out, there are platform differences here. If we
disable this platform entirely, we lose visibility into rare but, we
seem to believe, possible events.

It seems like the only reason to disable everywhere instead of only on
m-i/try (or running less frequently on m-i, like we do with PGO) is
that the former is easier to implement. It seems like we're proposing
taking a lot of risk here to work around our own failings...

Armen Zambrano G.

unread,

Apr 26, 2013, 2:25:49 PM4/26/13

to

On 2013-04-26 1:31 PM, Justin Lebar wrote:
> I don't think I'm comfortable disabling this platform across the
> board, or even disabling debug-only runs across the board.
>
> As jmaher pointed out, there are platform differences here. If we
> disable this platform entirely, we lose visibility into rare but, we
> seem to believe, possible events.
>

That was a python issue that was related to talos.
It was not a Firefox issue that would have only failed on a specific
version of Mac.

> It seems like the only reason to disable everywhere instead of only on
> m-i/try (or running less frequently on m-i, like we do with PGO) is
> that the former is easier to implement. It seems like we're proposing
> taking a lot of risk here to work around our own failings...
>

Yes, it is lot of work to try to change the way that buildbot works to
try to optimize not-a-standard method of operations.
Just by doing jobs on PGO and not on every checkin it would make the
10.7 platform less than the other versions.

I could also have not even started the thread trying to improve our wait
times for 10.6 and when one day someone complained about wait times on
rev4 I would say "we can not buy more machines".

Just a little before on the thread you were asking "go big or go home"
and asked to disable even 10.6 debug tests. I'm confused about the
different messages.

Armen Zambrano G.

unread,

Apr 26, 2013, 2:34:34 PM4/26/13

to

After re-reading, I'm happy to disable just m-i/try for now.

Modifying to trigger *some* jobs on m-i through would be some decent
amount of work (adding Mac pgo builders) but still different than normal
operations and increase the 10.6/10.8 test load.

On 2013-04-26 1:31 PM, Justin Lebar wrote:

Justin Lebar

unread,

Apr 26, 2013, 2:42:00 PM4/26/13

to Armen Zambrano G., dev-pl...@lists.mozilla.org

I'd be happy if we did the 10.7 and 10.6 tests only when we trigger
Windows PGO builds. That seems like a good compromise here, since the
sheriffs have to wait for PGO coverage on m-i before merging anyway.

Matt Brubeck

unread,

Apr 26, 2013, 7:21:47 PM4/26/13

to

On 4/26/2013 9:10 AM, Armen Zambrano G. wrote:
> Just disabling debug and talos jobs for 10.7 should reduce more than 50%
> of the load on 10.7. That might be sufficient for now.

I'd be happy for us to disable all Talos jobs on 10.7, on all trees.
I've been keeping track of Talos stuff recently and I have not seen any
genuine regressions that are 10.7-specific, so I don't think it's
providing us much benefit to run these benchmarks on three Mac platforms
simultaneously.

In terms of tracking regressions, it would be better to have more
complete data 10.6 alone than to have incomplete data (due to
coalescing) on 10.6 and 10.7.

Armen Zambrano G.

unread,

Oct 29, 2013, 4:31:33 PM10/29/13

to

Hello all,
I would like to re-visit this.

I would like to look into stop running tests and talos for 10.7 and
re-purpose those machines as 10.6 machines.
* We have many more users on 10.6 than on 10.7.
* No new updates have been given to 10.6 since July 2011 [1]
* No new updates have been given to 10.7 since October, 2012 [2]

This will improve our current Mac OSX testing wait times.

On another note, 10.9 has come out and I already started seeing a decent
dip on 10.8 users (since it is a free update).

On another note, I would like to consider stop running jobs on 10.8 and
only run them on 10.9 once we have the infrastructure up and running.

cheers,
Armen

[1] https://en.wikipedia.org/wiki/Mac_OS_X_Snow_Leopard#Release_history
[2] https://en.wikipedia.org/wiki/Mac_OS_X_Lion#Release_history

Alexander Keybl

unread,

Oct 31, 2013, 9:07:10 AM10/31/13

to Armen Zambrano G., dev-pl...@lists.mozilla.org

I think it makes a lot of sense to test the spread. +1

Ryan VanderMeulen

unread,

Oct 31, 2013, 11:14:13 AM10/31/13

to

On 10/29/2013 4:31 PM, Armen Zambrano G. wrote:
>> In order to improve our wait times, I propose that we stop testing on
>> tbpl per-checkin [2] on OS X 10.7 and re-purpose the 10.7 machines as
>> 10.6 to increase our capacity.
>>
>> Please let us know if this plan is unacceptable and needs further
>> discussion.
>>
>> best regards,
>> Armen Zambrano - Mozilla's Release Engineering

+1 to repurposing all rev4s as 10.6 slaves and all rev5s as 10.9!

I guess the only question is how many people are stuck on 10.7 (my
understanding is that some 10.7-supporting hardware configurations
aren't supported on 10.9) and is that population large enough that we
explicitly need to test for them?

My offhand recollection is that the main discrepancies between the
different OSX versions we see in our test infrastructure largely have to
do with what hardware they're running on and whether OMTC is enabled or
not. So IMO, 10.6 on rev4 w/o OMTC and 10.9 on rev5 w/ OMTC is probably
representative enough that we aren't likely to miss any major regressions.

-Ryan