Notes on ways to improve the build/automation turn around time

Clint Talbert

unread,

May 24, 2011, 3:27:43 PM5/24/11

to

After the platform meeting, we talked a bit about how to reduce the
amount of turnaround time it takes to do a full build/test cycle in our
automation.

It's all on the meeting wiki, but for ease in reading, here's the set of
actions we came up with.

Short term
* (joduinn/releng) Try by default will not do anything
* (joduinn/releng) Stop running always failing tests automatically.
* investigate bug 659222 - joduinn

Medium term
* (releng) Test suites in progress should be available on try and
selfserve even when those test suites are not being run automatically.
(This is an addition to the "stop running always failing tests" above)
* (releng) see if we can do anything to add machines before we get a new
colo.

Long term
* Experiment with moving tests into virutalization (bmoss and rsayre to
get team to figure this out). ctalbert volunteers to help
** Rsayre will help with getting engineering help for long term
solution/fixing tests that proove intermittent in virtualization
* Figure out a way to not run tests that always pass on every test run.
(Ateam/Releng)

Thanks,
Clint

Justin Dolske

unread,

May 24, 2011, 4:03:33 PM5/24/11

to

On 5/24/11 12:27 PM, Clint Talbert wrote:

> Short term
...

> * (joduinn/releng) Stop running always failing tests automatically.

What tests would those be?

Also, I see a thread in m.d.tree-management about combining some tests:

"(e.g. a11y and scroll tests are run as separate jobs, and only take a
few minutes of test time, which is pretty inefficient due to time
required to reboot, download and unpack new build and symbols, etc.)"

Sounds like this could also be a short-term and easy fix?

> Long term

> * Figure out a way to not run tests that always pass on every test run.

I don't understand this, conceptually.

Except for intermittent failures (which I don't think are relevant to
this?), every test should be green on every run. Until someone breaks
something, which is the point of having the tests. :)

Justin

Joshua Cranmer

unread,

May 24, 2011, 9:30:02 PM5/24/11

to

On 05/24/2011 04:03 PM, Justin Dolske wrote:
>> Long term
>> * Figure out a way to not run tests that always pass on every test run.
>
> I don't understand this, conceptually.
>
> Except for intermittent failures (which I don't think are relevant to
> this?), every test should be green on every run. Until someone breaks
> something, which is the point of having the tests. :)

Another alternative is to try to figure out which tests "depend" on
which changes and not run them if those don't change.

Justin Lebar

unread,

May 25, 2011, 8:30:44 AM5/25/11

to

There are two separate issues: How long do builds take, and how long do tests take? To address only the first one:

We have an existence proof that there's a lot of headway we could make in terms of build speed. My mac takes 12m to do a debug build from scratch. I have a Linux box which is similarly fast. So there's no question that we could speed up mac builds and debug (i.e. non-PGO) Linux builds by getting faster machines, right?

IIRC, we don't use pymake on Windows builds. If we did, that would be a huge speedup for non-PGO builds, because we could use -j4 or greater.

Ted proposed not running PGO unless we ask for it; that would make release builds appear much faster on Linux and especially Windows.

Kyle Huey

unread,

May 25, 2011, 9:11:34 AM5/25/11

to dev-pl...@lists.mozilla.org

On Wed, May 25, 2011 at 5:30 AM, Justin Lebar <justin...@gmail.com>wrote:

> There are two separate issues: How long do builds take, and how long do
> tests take? To address only the first one:
>

Which is the less interesting one; we're not backing up on the builders
here.

>
> We have an existence proof that there's a lot of headway we could make in
> terms of build speed. My mac takes 12m to do a debug build from scratch. I
> have a Linux box which is similarly fast. So there's no question that we
> could speed up mac builds and debug (i.e. non-PGO) Linux builds by getting
> faster machines, right?
>

It might help a bit. Remember that the builders reboot between every build,
so the build is generally a "cold" build (unless the hg clone pulled most of
the repo into memory). Without having done any measurements myself, I
imagine that IO performance is pretty important here compared to raw cpu
power.

>
> IIRC, we don't use pymake on Windows builds. If we did, that would be a
> huge speedup for non-PGO builds, because we could use -j4 or greater.
>

Timing on build slaves indicates that pymake is not a big win there.

Ted proposed not running PGO unless we ask for it; that would make release
> builds appear much faster on Linux and especially Windows.
>

Yeah, we really should do that.

> _______________________________________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning
>

- Kyle

Mike Hommey

unread,

May 25, 2011, 9:32:05 AM5/25/11

to Kyle Huey, dev-pl...@lists.mozilla.org

On Wed, May 25, 2011 at 06:11:34AM -0700, Kyle Huey wrote:
> It might help a bit. Remember that the builders reboot between every build,
> so the build is generally a "cold" build (unless the hg clone pulled most of
> the repo into memory). Without having done any measurements myself, I
> imagine that IO performance is pretty important here compared to raw cpu
> power.

Why do they reboot between every build? Can't we skip that?

> > IIRC, we don't use pymake on Windows builds. If we did, that would be a
> > huge speedup for non-PGO builds, because we could use -j4 or greater.
> >
>

> Timing on build slaves indicates that pymake is not a big win there.

AFAIK these timings were without -j4.

Mike

Axel Hecht

unread,

May 25, 2011, 9:33:51 AM5/25/11

to

On 25.05.11 15:11, Kyle Huey wrote:
> On Wed, May 25, 2011 at 5:30 AM, Justin Lebar<justin...@gmail.com>wrote:
>
>> There are two separate issues: How long do builds take, and how long do
>> tests take? To address only the first one:
>>
>
> Which is the less interesting one; we're not backing up on the builders
> here.
>
>>
>> We have an existence proof that there's a lot of headway we could make in
>> terms of build speed. My mac takes 12m to do a debug build from scratch. I
>> have a Linux box which is similarly fast. So there's no question that we
>> could speed up mac builds and debug (i.e. non-PGO) Linux builds by getting
>> faster machines, right?
>>
>
> It might help a bit. Remember that the builders reboot between every build,
> so the build is generally a "cold" build (unless the hg clone pulled most of
> the repo into memory). Without having done any measurements myself, I
> imagine that IO performance is pretty important here compared to raw cpu
> power.
>

Do the builders reboot or just the testers?

Also, how much time are we spending on reboot? Is that included in the
"few minutes" setup time that John mentioned?

Asking because a reboot sounds like an expensive and drastic way to work
around problems for which we may have more efficient solutions.

Axel

Armen Zambrano Gasparnian

unread,

May 25, 2011, 10:04:01 AM5/25/11

to Mike Hommey, Kyle Huey, dev-pl...@lists.mozilla.org

We reboot both builders and testers after every job (with few exceptions
like L10n repacks).

Reboots for builders originally had two main purposes:
* clean state for unit tests (when we used to run them there)
* clean state for builds (nothing from previous builds is chewing memory)

There are also configuration management purposes:
* synchronize with puppet/opsi to have the right packages installed
* synchronize with slave allocator to determine with which master the
slave should be talking to

Are we trying to determine how much it buys to build on warn?
I believe catlee and others did some experiments on seeing how much it
would buy us but he is away for few days.

The reboot time is not included on the setup time that joduinn might
have mentioned. Setup time includes checking out repositories and
clobbering.

Even if a slave is rebooting there is generally another slave idle
available to take a job if it happens. The "build" and "try" wait times
say that we generally take jobs soon enough.

cheers,
Armen

Armen Zambrano Gasparnian

unread,

May 25, 2011, 10:04:01 AM5/25/11

to Mike Hommey, Kyle Huey, dev-pl...@lists.mozilla.org

On 11-05-25 9:32 AM, Mike Hommey wrote:

Robert Kaiser

unread,

May 25, 2011, 1:42:03 PM5/25/11

to

Justin Lebar schrieb:

> My mac takes 12m to do a debug build from scratch. I have a Linux box which is similarly fast.

Do those run "make buildsymbols" as well in that time? A significant
portion of the time our builders take is AFAIK for that step (which we
need if we want to be able to gather any sort of meaningful crash stats).

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community should think about. And most of the
time, I even appreciate irony and fun! :)

Clint Talbert

unread,

May 25, 2011, 2:32:26 PM5/25/11

to

On 5/24/2011 1:03 PM, Justin Dolske wrote:
> On 5/24/11 12:27 PM, Clint Talbert wrote:
>
>> Short term
> ...
>> * (joduinn/releng) Stop running always failing tests automatically.
>
> What tests would those be?

At the moment, the only one I know of is Jetpack on Windows. John
seemed to think there were others as well.

>
> Also, I see a thread in m.d.tree-management about combining some tests:
>
> "(e.g. a11y and scroll tests are run as separate jobs, and only take a
> few minutes of test time, which is pretty inefficient due to time
> required to reboot, download and unpack new build and symbols, etc.)"
>
> Sounds like this could also be a short-term and easy fix?
>

Yep, it's being tracked by bug 659328. It's probably a relatively minor
fix, but nonetheless something good to do.

>
>> Long term
>> * Figure out a way to not run tests that always pass on every test run.
>
> I don't understand this, conceptually.
>
> Except for intermittent failures (which I don't think are relevant to
> this?), every test should be green on every run. Until someone breaks
> something, which is the point of having the tests. :)

There are a ton of ways to implement this kind of thing. I prefer some
sort of cycling through the tests, so as an example:
You run full tests every moment the tree is free over a period of time.
You keep track of everything that is green constantly for every run.

Then you take x% of those perma-green tests out of the "on change" runs
for the next week, and you only run full tests on nightlies. That way if
you do break one of these perma green tests, you'll find out about it
from the nightly build.

Each week, you reactivate/deactivate a different percentage of
permagreen tests.

This is a pretty complicated mechanism, but it has a benefit that you're
cycling through all the tests over time, and you have full runs each day.

Another way to do it is to wire it to code areas, and activate tests
based on the checkin's area of effect, but that's hard to measure well.

However we implement it, the purpose is to reduce cycle time by reducing
the set of tests we run on each change, while still running a full run
of tests periodically - either each day or at certain points through the
day - perhaps every 6 hours or some such.

Note that this would not be the case with Talos. We'd always run all of
talos, this would only affect correctness tests.

Does that hand-wavy outline explain the thinking behind the approach?

Clint

Daniel Holbert

unread,

May 25, 2011, 3:14:40 PM5/25/11

to Clint Talbert, dev-pl...@lists.mozilla.org

On 05/25/2011 11:32 AM, Clint Talbert wrote:
> This is a pretty complicated mechanism, but it has a benefit that you're
> cycling through all the tests over time, and you have full runs each day.

So what happens when we get orange in one of these tests? How do we get
to a known-good tree state?

Right now, we have the option of backing out *just* the push that went
orange[1].

With your proposal, we'd have to back out everything *back to the last
time that test was run* (which sounds like it could be up to a day's
worth of pushes) in order to be in a known-good state. That sounds painful.

~Daniel

[1] (and possibly everything since that push, if the orange was an
'aborts test suite' type issue that prevented test coverage for
subsequent pushes)

Justin Lebar

unread,

May 25, 2011, 3:41:50 PM5/25/11

to

> we're not backing up on the builders.

I admit to missing the meeting, but Clint said in the original post:

> After the platform meeting, we talked a bit about how to reduce the
> amount of turnaround time it takes to do a full build/test cycle in our
> automation.

It still takes a long time to get your results even when there's no backlog. Surely faster builds would help with that.

Additionally, although we don't back up on builders, we do occasionally kick builds off on VMs, which appear to be considerably slower than bare hardware. (The Linux-32 build VM I'm currently SSH'ed into has only one CPU!)

> It might help a bit. Remember that the builders reboot between every build,
> so the build is generally a "cold" build (unless the hg clone pulled most of
> the repo into memory). Without having done any measurements myself, I
> imagine that IO performance is pretty important here compared to raw cpu
> power.

To offer an alternative interpretation also not backed up by data: We should be able to hide disk latency by running more jobs than there are CPUs.

But the bigger point is that neither of us has a clue. If it's I/Os that matter, then knowing that would help guide us towards a solution (SSDs, running a different clone script so the source files stay in buffer cache).

My understanding, btw, is that we no longer do a full hg clone, but instead only pull the necessary files for the tip rev. Presumably they all stay in buffer cache. But maybe we only do this on try, or maybe I'm misremembering.

> Timing on build slaves indicates that pymake is not a big win there.

Do you know why that is?

Armen Zambrano Gasparnian

unread,

May 25, 2011, 4:08:56 PM5/25/11

to mozilla.de...@googlegroups.com, Justin Lebar

On 11-05-25 3:41 PM, Justin Lebar wrote:
> But the bigger point is that neither of us has a clue. If it's I/Os that matter, then knowing that would help guide us towards a solution (SSDs, running a different clone script so the source files stay in buffer cache).

We are getting enterprise drives on the IX machines to speed I/O.

>
> My understanding, btw, is that we no longer do a full hg clone, but instead only pull the necessary files for the tip rev. Presumably they all stay in buffer cache. But maybe we only do this on try, or maybe I'm misremembering.

That is right. For try we do full clones; for everything else we just
add missing changesets (unless a clobber is requested).

cheers,
Armen

Ben Hearsum

unread,

May 25, 2011, 4:15:25 PM5/25/11

to Armen Zambrano Gasparnian, mozilla.de...@googlegroups.com, Justin Lebar

Previously, we did limited clones -- pulling in only what was needed for
the revision we cared about.

However, we don't do full clones on try anymore, we keep a read-only
clone of the repository on the machines, and use "hg share" to update
the working copy. Pulling in the incremental changes to the read-only
clone and updating the working copy still takes 5-7 minutes, but it's
much faster than before.

sayrer

unread,

May 26, 2011, 12:54:07 AM5/26/11

to

On Tuesday, May 24, 2011 1:03:33 PM UTC-7, Justin Dolske wrote:
>
> > Long term
> > * Figure out a way to not run tests that always pass on every test run.
>
> I don't understand this, conceptually.
>
> Except for intermittent failures (which I don't think are relevant to
> this?), every test should be green on every run.

We're looking to speed up test cycle times. One way to do that is to evaluate the odds that a test will fail. I'm sure there are, say, W3C DOM Level 2 Core tests that have never failed since they were checked in. Running them on every check-in is a waste of time, cycles, and greenhouse gasses.

What if these tests that nearly always pass were only run once a day? Then you would still catch them in a reasonable amount of time, and it would probably be obvious which check-in did it. Tests that do fail could also re-enter the suite that's always run.

Below, Joshua suggests looking at data to determine which tests to execute. This is another, more sophisticated way to determine which tests might be worth running.

Getting to this point will take some time. Does that rationale make sense?

Mike Hommey

unread,

May 26, 2011, 1:58:46 AM5/26/11

to dev-pl...@lists.mozilla.org

It does make sense, provided we have an easy way to trigger these tests
on intermediate csets when they start failing, to allow to narrow down
to one particular cset doing the regression.

Mike

Ehsan Akhgari

unread,

May 26, 2011, 12:57:17 PM5/26/11

to Mike Hommey, dev-pl...@lists.mozilla.org

On 11-05-26 1:58 AM, Mike Hommey wrote:
> On Wed, May 25, 2011 at 09:54:07PM -0700, sayrer wrote:

> It does make sense, provided we have an easy way to trigger these tests
> on intermediate csets when they start failing, to allow to narrow down
> to one particular cset doing the regression.

Does this proposal also cover the try server? These tests might have
never failed on mozilla-central, but I'm pretty sure that they've
allowed people to catch regressions before hitting m-c.

Ehsan

jmaher

unread,

May 27, 2011, 1:22:46 PM5/27/11

to

what about selectively running the long running tests?

I took a look at mochitest (1-5) and found 191 test_* files which have
a runtime of >10 seconds. Actually 4 tests have >2 minutes. All in
all, on a debug build we could save about an hour of test time by
ignoring these and an opt build would be between 15-20 minutes. Keep
in mind these times would need to be divided by 5 (the number of
chunks we run).

It would require a manifest or some other mechanism to add the runtime
metadata, but all of that is possible. We could run the mSlow tests
on nightly builds or a few times/day.

Boris Zbarsky

unread,

May 27, 2011, 2:22:41 PM5/27/11

to

On 5/27/11 1:22 PM, jmaher wrote:
> what about selectively running the long running tests?
>
> I took a look at mochitest (1-5) and found 191 test_* files which have
> a runtime of>10 seconds. Actually 4 tests have>2 minutes.

Which ones, if I might ask?

(Note that I suspect that the difference here is that the granularity of
test_* files in mochitest is different; some might be "test this one DOM
feature" while others are "test that every CSS property we implement
round-trips correctly".)

-Boris

jmaher

unread,

May 27, 2011, 3:21:43 PM5/27/11

to

The 4 longest tests are:
m1:
35404 INFO TEST-END | /tests/content/base/test/test_websocket.html |
finished in 128442ms
36529 INFO TEST-END | /tests/content/base/test/
test_ws_basic_tests.html | finished in 116905ms
40603 INFO TEST-END | /tests/content/canvas/test/webgl/
test_webgl_conformance_test_suite.html | finished in 133501ms
m4:
72059 INFO TEST-END | /tests/layout/style/test/test_value_cloning.html
| finished in 234695ms

Other lengthy tests are the layout/base (reftests?) inside of
mochitest (from m4 - a sample of about 123):
487 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1b.html |
finished in 11519ms
490 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1c.html |
finished in 22459ms
493 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1d.html |
finished in 22449ms
496 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1e.html |
finished in 24156ms
499 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2a.html |
finished in 19105ms
502 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2b.html |
finished in 19182ms
505 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2c.html |
finished in 23415ms

Boris Zbarsky

unread,

May 27, 2011, 3:33:11 PM5/27/11

to

On 5/27/11 3:21 PM, jmaher wrote:
> 35404 INFO TEST-END | /tests/content/base/test/test_websocket.html |
> finished in 128442ms
> 36529 INFO TEST-END | /tests/content/base/test/
> test_ws_basic_tests.html | finished in 116905ms

As I recall those are just buggy: they use large setTimeouts all over
and stuff. Can we just fix them?

> 40603 INFO TEST-END | /tests/content/canvas/test/webgl/
> test_webgl_conformance_test_suite.html | finished in 133501ms

No idea what the deal is here.

> 72059 INFO TEST-END | /tests/layout/style/test/test_value_cloning.html
> | finished in 234695ms

This test is testing all sorts of stuff about lots of possible values
for each CSS property... but it does this by doing them one after the
other, with a new document being loaded for each test. I think we can
probably improve this. Let me take a look.

> Other lengthy tests are the layout/base (reftests?)

No, reftests is something different.

> 487 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1b.html |
> finished in 11519ms
> 490 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1c.html |
> finished in 22459ms
> 493 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1d.html |
> finished in 22449ms
> 496 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1e.html |
> finished in 24156ms
> 499 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2a.html |
> finished in 19105ms
> 502 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2b.html |
> finished in 19182ms
> 505 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2c.html |
> finished in 23415ms

These tests are exposing a bug in the mochitest harness, effectively:
they change preferences that affect the layout of the harness document
itself, and that document is _huge_... I was sure we had a bug about
this, but can't find it right now.

We should be able to make these tests _much_ faster if we actually want to.

-Boris

Joe Drew

unread,

May 27, 2011, 3:43:03 PM5/27/11

to dev-pl...@lists.mozilla.org

On 2011-05-27 3:33 PM, Boris Zbarsky wrote:
>> 40603 INFO TEST-END | /tests/content/canvas/test/webgl/
>> test_webgl_conformance_test_suite.html | finished in 133501ms
>
> No idea what the deal is here.

"Run the entire WebGL conformance test suite." There are a lot of tests
in there.

Joe

Ehsan Akhgari

unread,

May 27, 2011, 4:04:09 PM5/27/11

to Boris Zbarsky, dev-pl...@lists.mozilla.org

On 11-05-27 3:33 PM, Boris Zbarsky wrote:
> On 5/27/11 3:21 PM, jmaher wrote:
>> 35404 INFO TEST-END | /tests/content/base/test/test_websocket.html |
>> finished in 128442ms
>> 36529 INFO TEST-END | /tests/content/base/test/
>> test_ws_basic_tests.html | finished in 116905ms
>
> As I recall those are just buggy: they use large setTimeouts all over
> and stuff. Can we just fix them?

I think we should. There is no need here for any timeouts, we control
the whole stack and we should be able to figure out what exactly to wait
for. Who's our Web Socket person?

>> 40603 INFO TEST-END | /tests/content/canvas/test/webgl/
>> test_webgl_conformance_test_suite.html | finished in 133501ms
>
> No idea what the deal is here.

Apparently there are a bunch of things which we can do here. Filed bug
660322.

>> 72059 INFO TEST-END | /tests/layout/style/test/test_value_cloning.html
>> | finished in 234695ms
>
> This test is testing all sorts of stuff about lots of possible values
> for each CSS property... but it does this by doing them one after the
> other, with a new document being loaded for each test. I think we can
> probably improve this. Let me take a look.

Is there a bug on file for this?

>> Other lengthy tests are the layout/base (reftests?)
>
> No, reftests is something different.
>
>> 487 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1b.html |
>> finished in 11519ms
>> 490 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1c.html |
>> finished in 22459ms
>> 493 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1d.html |
>> finished in 22449ms
>> 496 INFO TEST-END | /tests/layout/base/tests/test_bug441782-1e.html |
>> finished in 24156ms
>> 499 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2a.html |
>> finished in 19105ms
>> 502 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2b.html |
>> finished in 19182ms
>> 505 INFO TEST-END | /tests/layout/base/tests/test_bug441782-2c.html |
>> finished in 23415ms
>
> These tests are exposing a bug in the mochitest harness, effectively:
> they change preferences that affect the layout of the harness document
> itself, and that document is _huge_... I was sure we had a bug about
> this, but can't find it right now.
>
> We should be able to make these tests _much_ faster if we actually want to.

This is bug 479352. If we did have a way to run a subset of reftests in
privileged mode so that they can change prefs, we would have been able
to avoid the mochitest harness cost for these altogether. ;-)

Ehsan

Armen Zambrano Gasparnian

unread,

May 27, 2011, 4:40:55 PM5/27/11

to Boris Zbarsky

Could we change our harnesses to create a performance summary for each
individual test?
Perhaps a summary of the tests that take the longest and triage them
every once in a while to see if they are going as fast as they should.
Asking for a per test run regression tool might be asking too much but
you guys can say.

cheers,
Armen

smaug

unread,

May 27, 2011, 5:35:25 PM5/27/11

to Ehsan Akhgari, Boris Zbarsky, dev-pl...@lists.mozilla.org

On 05/27/2011 11:04 PM, Ehsan Akhgari wrote:
> On 11-05-27 3:33 PM, Boris Zbarsky wrote:
>> On 5/27/11 3:21 PM, jmaher wrote:
>>> 35404 INFO TEST-END | /tests/content/base/test/test_websocket.html |
>>> finished in 128442ms
>>> 36529 INFO TEST-END | /tests/content/base/test/
>>> test_ws_basic_tests.html | finished in 116905ms
>>
>> As I recall those are just buggy: they use large setTimeouts all over
>> and stuff. Can we just fix them?
>
> I think we should. There is no need here for any timeouts, we control
> the whole stack and we should be able to figure out what exactly to wait
> for. Who's our Web Socket person?

In general there are plenty of reasons for timeouts, especially when we
test networking. For example when you want to test that something does
not happen, or that something happens X times, but not X+1 times.
(Timeouts don't really guarantee either one, but they give quite good
estimate)

smaug

unread,

May 27, 2011, 5:35:25 PM5/27/11

to Ehsan Akhgari, Boris Zbarsky, dev-pl...@lists.mozilla.org

On 05/27/2011 11:04 PM, Ehsan Akhgari wrote:

> On 11-05-27 3:33 PM, Boris Zbarsky wrote:
>> On 5/27/11 3:21 PM, jmaher wrote:
>>> 35404 INFO TEST-END | /tests/content/base/test/test_websocket.html |
>>> finished in 128442ms
>>> 36529 INFO TEST-END | /tests/content/base/test/
>>> test_ws_basic_tests.html | finished in 116905ms
>>
>> As I recall those are just buggy: they use large setTimeouts all over
>> and stuff. Can we just fix them?
>
> I think we should. There is no need here for any timeouts, we control
> the whole stack and we should be able to figure out what exactly to wait
> for. Who's our Web Socket person?

In general there are plenty of reasons for timeouts, especially when we

test networking. For example when you want to test that something does
not happen, or that something happens X times, but not X+1 times.
(Timeouts don't really guarantee either one, but they give quite good
estimate)

>

Boris Zbarsky

unread,

May 27, 2011, 9:41:35 PM5/27/11

to

On 5/27/11 3:33 PM, Boris Zbarsky wrote:

>> 72059 INFO TEST-END | /tests/layout/style/test/test_value_cloning.html
>> | finished in 234695ms

> This test is testing all sorts of stuff about lots of possible values
> for each CSS property... but it does this by doing them one after the
> other, with a new document being loaded for each test. I think we can
> probably improve this. Let me take a look.

https://bugzilla.mozilla.org/show_bug.cgi?id=660398 has a fix. Looks
like my hardware (which the numbers in the bug come from) is a bit
faster than the test machine here, but if things scale linearly this
test should be down to 10s or so in a debug build on the test machine.
Not great, but a lot better.

-Boris

Clint Talbert

unread,

May 28, 2011, 2:12:49 AM5/28/11

to

Mochitest already outputs such a summary. Reftest doesn't: filed bug
660419. We can certainly add this.

Clint

Clint Talbert

unread,

May 28, 2011, 2:16:37 AM5/28/11

to

On 5/26/2011 9:57 AM, Ehsan Akhgari wrote:

>
> Does this proposal also cover the try server? These tests might have
> never failed on mozilla-central, but I'm pretty sure that they've
> allowed people to catch regressions before hitting m-c.
>

Hadn't completely thought through all the details yet of the proposal.
But, I think that I would prefer try to always *by default* mirror
mozilla-central. Back when try didn't mirror mozilla-central it was a
chronic issue.

That said, I'd like to see that while try mirrors mozilla central by
default, we should encourage and enhance try's ability to be
configurable. So there should be some try chooser syntax for you to
"ignore rules about current passing test set" if you want to in your patch.

Clint

Clint Talbert

unread,

May 28, 2011, 2:46:11 AM5/28/11

to

On 5/25/2011 12:14 PM, Daniel Holbert wrote:
> With your proposal, we'd have to back out everything *back to the last
> time that test was run* (which sounds like it could be up to a day's
> worth of pushes) in order to be in a known-good state. That sounds painful.
>
> ~Daniel
>
> [1] (and possibly everything since that push, if the orange was an
> 'aborts test suite' type issue that prevented test coverage for
> subsequent pushes)

Good question. Yes, you run the risk of backing everything out to find
an orange. There are a couple of ways to mitigate that risk. Since
this is really early stages, I don't have data to make a concrete
proposal. But here are some thoughts:

* You optimize by running a full build of everything every X changesets.
* You provide hands-off regression hunting tools to find the changeset
that hit the orange and (possibly) auto-back it out. (the a-team has
already begun work on that regression hunting tool)
* Or as Mike Homney suggested, you provide a means (through an extension
to the self-service build API, for example) to run the orange test on
the intervening changesets that have happened between the last known
good run and the current orange one. This requires a sheriff or though
because the tree-watching time in this instance could be very long,
depending on how often the full runs are performed.

Of course, new intermittent issues that are introduced will throw a
wrench in these plans. Regardless of what we do, I think that
intermittent oranges will be the Achilles heel of any automation
solution to increase build turnaround time. Our best hope is to
continue the war on orange and drive the orangefactor number down to <=1
and keep it there. But outside of the intermittent issues, we do have
some options to make this idea into a viable approach.

I'm glad you're all bringing up these issues and also finding concrete
tests to fix now. That's awesome. We've also started pulling a team
together to crunch the data and develop something more than just vague
bullet points.

Clint

Justin Dolske

unread,

May 29, 2011, 1:35:54 AM5/29/11

to

On 5/25/11 9:54 PM, sayrer wrote:

>>> Long term
>>> * Figure out a way to not run tests that always pass on every test run.
>>
>> I don't understand this, conceptually.
>

> We're looking to speed up test cycle times. One way to do that is to evaluate the odds that a test will fail. I'm sure there are, say, W3C DOM Level 2 Core tests that have never failed since they were checked in. Running them on every check-in is a waste of time, cycles, and greenhouse gasses.

> [...]

> Below, Joshua suggests looking at data to determine which tests to execute. This is another, more sophisticated way to determine which tests might be worth running.

Ah, ok. Hmm.

So, I see potential value in a "dependency" system. Some front-end
changes (like, say, password manager) have well-defined tests that are
useful to run, and the rest are basically useless. I'm sure there are
other areas, though I'd a bit dubious about well we can identify them
and how significant the wins will be for daily m-c activity.
[Conversely, there are other areas (like, say, xpconnect) where we'd
want to run everything, because there are different uses of it all over.]

I'm highly wary about disabling tests based just on probability, though.
I suspect there's a lot of code that, while any particular area is
changed infrequently, has a significant probability of breaking a test
even though on the average that test always passed.

As a hypothetical example: consider a project with 100 independent code
modules, and an incompetent programmer who makes a broken patch to 1
module each day. [Or a competent programmer working with incompetent
code, which might be closer to our situation ;-)] On average, the test
for any particular module will only have a 1% failure rate per day. But
in reality, the tree is always 100% broken.

OTOH, I suppose it's possible there are enough tests that just fail _so_
rarely that the cycle time savings are worth having larger, more complex
regression ranges. EG, it seems like a win if we got 50% faster cycle
times but once every couple of months had to close the tree to figure
out which non-obvious commit broke an infrequently run test.

There's a bright side here, though. We have historical logs -- data! If
we identify a set of tests we want to run less often, it would be
possible to use that data to determine how frequently there would be a
"delayed orange" in that set. As well as gauge how complex identifying
the cause and backing it out would be, based on actual checkins around
that time.

Justin

Robert Kaiser

unread,

May 29, 2011, 3:38:15 PM5/29/11

to

Justin Dolske schrieb:

> So, I see potential value in a "dependency" system. Some front-end
> changes (like, say, password manager) have well-defined tests that are
> useful to run, and the rest are basically useless.

*In theory*, yes. We've at times seen how some change in some part of
front end suddenly broke a seemingly unrelated area that depended on
something from the other without people knowing that. May not be the
usual case, but sometimes happens.

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible

arguments that we as a community needs answers to. And most of the time,

Armen Zambrano Gasparnian

unread,

May 30, 2011, 9:50:06 AM5/30/11

to Clint Talbert

Worth taking note that we have had the problem of try running less
things than mozilla-central but never the other way around.

Justin Lebar

unread,

May 30, 2011, 10:54:39 AM5/30/11

to

> Worth taking note that we have had the problem of try running less
> things than mozilla-central but never the other way around.

Isn't this almost always [1]? That is, try runs the tests, but they don't show up in TBPL.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=590526

Mike Connor

unread,

May 30, 2011, 11:03:07 AM5/30/11

to Clint Talbert, dev-pl...@lists.mozilla.org

On 2011-05-28, at 2:46 AM, Clint Talbert wrote:

> Of course, new intermittent issues that are introduced will throw a wrench in these plans. Regardless of what we do, I think that intermittent oranges will be the Achilles heel of any automation solution to increase build turnaround time. Our best hope is to continue the war on orange and drive the orangefactor number down to <=1 and keep it there. But outside of the intermittent issues, we do have some options to make this idea into a viable approach.

So, before we invest in this approach, it would be good to get some idea of what we feel the benefit will be. I'm finding it hard to assess cost/benefit here, mostly because the benefit is sort of loosely defined. "Wasting cycles" is an easy goal to get behind, but we should argue based on data, as has been pointed out in many recent threads.

What I'd like to see addressed:

* How long are current cycles?
* What is our target cycle time?
* How close can we get simply by improving the test suites to run faster, without sacrificing coverage?
* Is cycle time on unit tests a significant proportion of our overall cycle times?

-- Mike

Chris AtLee

unread,

May 30, 2011, 11:34:25 AM5/30/11

to

Coming in a bit late to this party...but here's my $0.02

On 25/05/11 08:30 AM, Justin Lebar wrote:
> There are two separate issues: How long do builds take, and how long
> do tests take? To address only the first one:
>
> We have an existence proof that there's a lot of headway we could
> make in terms of build speed. My mac takes 12m to do a debug build
> from scratch. I have a Linux box which is similarly fast. So
> there's no question that we could speed up mac builds and debug (i.e.
> non-PGO) Linux builds by getting faster machines, right?

I think our Linux builds are fine as they are. We get about 20 minute
builds for non-PGO. Mac and Windows builds are the real bottleneck here.

> IIRC, we don't use pymake on Windows builds. If we did, that would
> be a huge speedup for non-PGO builds, because we could use -j4 or
> greater.

We do not, correct. I tested a debug build (non-PGO) with gnu make -j1
vs pymake -j4, and the times were within a minute of each other. It'll
be worth testing again once we get the hard drives in the windows
builders replaced.

> Ted proposed not running PGO unless we ask for it; that would make
> release builds appear much faster on Linux and especially Windows.

Totally agree that we should do this. Is there consensus here?

Mike Connor

unread,

May 30, 2011, 11:39:33 AM5/30/11

to Chris AtLee, dev-pl...@lists.mozilla.org

On 2011-05-30, at 11:34 AM, Chris AtLee wrote:

>> Ted proposed not running PGO unless we ask for it; that would make
>> release builds appear much faster on Linux and especially Windows.
>
> Totally agree that we should do this. Is there consensus here?

I believe the sum total is "yes, there is consensus, as long as we still get good coverage from the PGO nightly builds."

-- Mike

Chris AtLee

unread,

May 30, 2011, 11:57:53 AM5/30/11

to

On 30/05/11 11:03 AM, Mike Connor wrote:
> On 2011-05-28, at 2:46 AM, Clint Talbert wrote:
>
>> Of course, new intermittent issues that are introduced will throw a wrench in these plans. Regardless of what we do, I think that intermittent oranges will be the Achilles heel of any automation solution to increase build turnaround time. Our best hope is to continue the war on orange and drive the orangefactor number down to<=1 and keep it there. But outside of the intermittent issues, we do have some options to make this idea into a viable approach.
>
> So, before we invest in this approach, it would be good to get some idea of what we feel the benefit will be. I'm finding it hard to assess cost/benefit here, mostly because the benefit is sort of loosely defined. "Wasting cycles" is an easy goal to get behind, but we should argue based on data, as has been pointed out in many recent threads.
>
> What I'd like to see addressed:
>
> * How long are current cycles?

For builds, the biggest offenders here are:
Windows opt builds (average 3h 4m 21s)
Mac opt builds (average 2h 35m 13s)

NB that Linux opt builds are now up to 1h 23m 37s on average since
enabling PGO.

For tests, debug tests take a long time. e.g.
XP debug mochitest-other (average 1h 24m 22s)
Fedora debug mochitests-4/5 (average 1h 21m 33s)
Win7 debug mochitest-other (average 1h 14m 54s)

Slowest build is windows at 3h4m, and the slowest opt windows test is
win7 xpcshell at 41m, so that brings our cycle time up to 3h45m assuming
we can start all builds/tests promptly.

> * What is our target cycle time?

As fast as possible?

> * How close can we get simply by improving the test suites to run faster, without sacrificing coverage?

For debug builds, tests are 50% of the cycle time. I'd SWAG that for
non-PGO linux and windows opt builds the same would hold true.

> * Is cycle time on unit tests a significant proportion of our overall cycle times?

Yes. Specifically debug unit tests. Of our opt tests, the slowest are:
Win7 talos dromaeo (0h 43m 1s)
WinXP talos dromaeo (0h 41m 51s)
Win7 opt xpcshell (0h 41m 48s) (which is about 2x the time it takes on
other platforms)

Robert Kaiser

unread,

May 30, 2011, 12:09:16 PM5/30/11

to

Chris AtLee schrieb:

> For builds, the biggest offenders here are:
> Windows opt builds (average 3h 4m 21s)

That's probably the PGO cost, as mentioned a number of times in this thread.

> Mac opt builds (average 2h 35m 13s)

Those are universal builds, so actually we are doing two build runs
there. I guess the only things we can do there is either beef up the
hardware or make our build process faster in general (which is probably
quite hard).

> NB that Linux opt builds are now up to 1h 23m 37s on average since
> enabling PGO.

PGO has a cost everywhere as it's doing a second run of things - still
hugely better than Windows here, though.

>> * What is our target cycle time?
>
> As fast as possible?

Sure, but it helps to set a target we really want to be the max of what
we need to wait to have results.

Chris AtLee

unread,

May 30, 2011, 12:15:09 PM5/30/11

to

On 30/05/11 12:09 PM, Robert Kaiser wrote:
> Chris AtLee schrieb:
>> For builds, the biggest offenders here are:
>> Windows opt builds (average 3h 4m 21s)
>
> That's probably the PGO cost, as mentioned a number of times in this
> thread.
>
>> Mac opt builds (average 2h 35m 13s)
>
> Those are universal builds, so actually we are doing two build runs
> there. I guess the only things we can do there is either beef up the
> hardware or make our build process faster in general (which is probably
> quite hard).

There are a few approaches here:
* Beefier build machines. We're already looking at options here.
* Single-pass build instead of two-pass build. This requires a lot of
make/configure/etc. work, but would be awesome to do.
* Build each architecture in parallel on different machines, and unify
them after. Theoretically possible, practically very difficult.

Chris AtLee

unread,

May 30, 2011, 12:17:45 PM5/30/11

to

On 30/05/11 12:15 PM, Chris AtLee wrote:
> On 30/05/11 12:09 PM, Robert Kaiser wrote:
>> Chris AtLee schrieb:
>>> For builds, the biggest offenders here are:
>>> Windows opt builds (average 3h 4m 21s)
>>
>> That's probably the PGO cost, as mentioned a number of times in this
>> thread.
>>
>>> Mac opt builds (average 2h 35m 13s)
>>
>> Those are universal builds, so actually we are doing two build runs
>> there. I guess the only things we can do there is either beef up the
>> hardware or make our build process faster in general (which is probably
>> quite hard).
>
> There are a few approaches here:
> * Beefier build machines. We're already looking at options here.
> * Single-pass build instead of two-pass build. This requires a lot of
> make/configure/etc. work, but would be awesome to do.
> * Build each architecture in parallel on different machines, and unify
> them after. Theoretically possible, practically very difficult.

Actually, just had a thought here. Could we do 64-bit only opt builds
during the day, and have our nightlies and release builds be universal
32/64-bit?

Armen Zambrano Gasparnian

unread,

May 30, 2011, 12:22:35 PM5/30/11

to mozilla.de...@googlegroups.com, Justin Lebar

I was talking that:
* mozilla-centrals' coverage >= try's coverage
but never
* mozilla-centrals' coverage < try's coverage

I will make a comment on the bug for a workaround.

cheers,
Armen

Mike Connor

unread,

May 30, 2011, 12:23:43 PM5/30/11

to Chris AtLee, dev-pl...@lists.mozilla.org

Are we testing on 32 and 64 bit machines, or just 64 bit?

-- Mike

Armen Zambrano Gasparnian

unread,

May 30, 2011, 12:26:41 PM5/30/11

to Mike Connor, Chris AtLee, dev-pl...@lists.mozilla.org

We would loose the ability to do optimized tests on 10.5 testers.
We would still have the 10.5 debug tests.

Armen Zambrano Gasparnian

unread,

May 30, 2011, 12:26:41 PM5/30/11

to Mike Connor, Chris AtLee, dev-pl...@lists.mozilla.org

On 11-05-30 12:23 PM, Mike Connor wrote:
>

Mike Connor

unread,

May 30, 2011, 12:27:54 PM5/30/11

to Armen Zambrano Gasparnian, Chris AtLee, dev-pl...@lists.mozilla.org

Do we have any data on failures that happen on 10.5 but not 10.6?

-- Mike

Zandr Milewski

unread,

May 30, 2011, 12:33:02 PM5/30/11

to dev-pl...@lists.mozilla.org

[oops, meant to send this to the list]

On 5/30/11 9:23 AM, Mike Connor wrote:

>> Actually, just had a thought here. Could we do 64-bit only opt
>> builds during the day, and have our nightlies and release builds be
>> universal 32/64-bit?
>

> Are we testing on 32 and 64 bit machines, or just 64 bit?

All of the test machines are capable of running 64-bit binaries. I
don't know which binaries we actually test on which machines.

Having said that, none of the test machines run a 64-bit kernel. Apple
has only been enabling 64-bit kernels by default very recently.

http://support.apple.com/kb/HT3770

Zack Weinberg

unread,

May 30, 2011, 12:42:05 PM5/30/11

to

On 2011-05-30 9:09 AM, Robert Kaiser wrote:
>>> * What is our target cycle time?
>>
>> As fast as possible?
>
> Sure, but it helps to set a target we really want to be the max of what
> we need to wait to have results.

If a complete cycle, push to all results available, took less than half
an hour, then IMO it would be reasonable to forbid pushes while results
from a previous cycle were pending. And that would render the "what do
we back out when we discover an orange" argument moot. (We would have
to have a landing queue all the time, but I think that's ok.)

So that's my suggestion for target cycle time.

zw

Armen Zambrano Gasparnian

unread,

May 30, 2011, 12:44:12 PM5/30/11

to Mike Connor, Chris AtLee, dev-pl...@lists.mozilla.org

Hold down, I probably replied this incorrectly (from zandr's email).

I was under the assumption that we were testing the 32-bit side of the
Mac bundle but if we are capable of running the 64-bit side of it on the
10.5 machines we should then be fine.

Anyone can confirm this?

-- armenzg

Armen Zambrano Gasparnian

unread,

May 30, 2011, 12:44:12 PM5/30/11

to Mike Connor, Chris AtLee, dev-pl...@lists.mozilla.org

Kyle Huey

unread,

May 30, 2011, 12:56:05 PM5/30/11

to Chris AtLee, dev-pl...@lists.mozilla.org

Replies inline

On Mon, May 30, 2011 at 8:57 AM, Chris AtLee <cat...@mozilla.com> wrote:

> On 30/05/11 11:03 AM, Mike Connor wrote:
>
>> On 2011-05-28, at 2:46 AM, Clint Talbert wrote:
>>
>> Of course, new intermittent issues that are introduced will throw a
>>> wrench in these plans. Regardless of what we do, I think that intermittent
>>> oranges will be the Achilles heel of any automation solution to increase
>>> build turnaround time. Our best hope is to continue the war on orange and
>>> drive the orangefactor number down to<=1 and keep it there. But outside of
>>> the intermittent issues, we do have some options to make this idea into a
>>> viable approach.
>>>
>>
>> So, before we invest in this approach, it would be good to get some idea
>> of what we feel the benefit will be. I'm finding it hard to assess
>> cost/benefit here, mostly because the benefit is sort of loosely defined.
>> "Wasting cycles" is an easy goal to get behind, but we should argue based
>> on data, as has been pointed out in many recent threads.
>>
>> What I'd like to see addressed:
>>
>> * How long are current cycles?
>>
>
> For builds, the biggest offenders here are:
> Windows opt builds (average 3h 4m 21s)
>

There's not much we can do here.

> Mac opt builds (average 2h 35m 13s)
>

Bug 417044 would help *a lot* here.

NB that Linux opt builds are now up to 1h 23m 37s on average since enabling
> PGO.
>
> For tests, debug tests take a long time. e.g.
> XP debug mochitest-other (average 1h 24m 22s)
>

Can you break this out by suite? I wouldn't be surprised if mochitest-a11y
is a large chunk of this for stupid reasons.

> Fedora debug mochitests-4/5 (average 1h 21m 33s)
>

What are the N (5, 10, whatever) tests here that take the longest?

> Win7 debug mochitest-other (average 1h 14m 54s)
>

Same as for XP.

> Slowest build is windows at 3h4m, and the slowest opt windows test is win7
> xpcshell at 41m, so that brings our cycle time up to 3h45m assuming we can
> start all builds/tests promptly.

AIUI xpcshell starts a new process for each test. I wonder if we could run
multiple tests in parallel? I believe the js shell tests do this.

* What is our target cycle time?
>

As fast as possible?

Right.

* How close can we get simply by improving the test suites to run faster,
> without sacrificing coverage?

For debug builds, tests are 50% of the cycle time. I'd SWAG that for
> non-PGO linux and windows opt builds the same would hold true.
>
>
> * Is cycle time on unit tests a significant proportion of our overall
>> cycle times?
>>
>
> Yes. Specifically debug unit tests. Of our opt tests, the slowest are:
> Win7 talos dromaeo (0h 43m 1s)
> WinXP talos dromaeo (0h 41m 51s)
> Win7 opt xpcshell (0h 41m 48s) (which is about 2x the time it takes on
> other platforms)
>

> _______________________________________________
> dev-planning mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-planning
>

- Kyle

Armen Zambrano Gasparnian

unread,

May 30, 2011, 12:59:46 PM5/30/11

to Mike Connor, Chris AtLee, dev-pl...@lists.mozilla.org

I checked with zandr that the Mac build is run on the 10.5 it enforces
the 32-bit side of it (We looked at Activity monitor and checked the
"kind" column which said "Intel").

If we could force the 64-bit side of it on the 10.5 machines would we
care of double building?

-- armenzg

Armen Zambrano Gasparnian

unread,

May 30, 2011, 12:59:46 PM5/30/11

to Chris AtLee, dev-pl...@lists.mozilla.org, Mike Connor

Mike Connor

unread,

May 30, 2011, 1:17:47 PM5/30/11

to Chris AtLee, dev-pl...@lists.mozilla.org

On 2011-05-30, at 11:57 AM, Chris AtLee wrote:

> On 30/05/11 11:03 AM, Mike Connor wrote:
>> On 2011-05-28, at 2:46 AM, Clint Talbert wrote:
>>
>> * What is our target cycle time?
>
> As fast as possible?

I prefer achievable and incremental milestones for any project, since it makes it easier to prioritize well. Taken to the extreme, we could have a single machine for each test, and get cycle time to < 2 minutes.

How about we start by targeting 30 minutes as timeframe for the build-done to test-done? That likely means we need to cut run time to < 25 minutes for every test job. This would be a significant improvement on the current situation, and something we can target for Q3, just by optimizing specific jobs.

-- Mike

Chris AtLee

unread,

May 30, 2011, 2:18:16 PM5/30/11

to

>> For tests, debug tests take a long time. e.g.
>> XP debug mochitest-other (average 1h 24m 22s)
>>
>
> Can you break this out by suite? I wouldn't be surprised if mochitest-a11y
> is a large chunk of this for stupid reasons.

mochitest-chrome is ~15 minutes
mochitest-browser-chrome is ~48 minutes
mochitest-a11y is ~5 minutes (although seems to range a bunch)
mochitest-ipcplugins is 1 minute

>> Fedora debug mochitests-4/5 (average 1h 21m 33s)
>>
>
> What are the N (5, 10, whatever) tests here that take the longest?

I'm not sure how to tell. You can probably find out by looking at some
test logs.

Robert Kaiser

unread,

May 30, 2011, 2:22:06 PM5/30/11

to

Chris AtLee schrieb:

> * Single-pass build instead of two-pass build. This requires a lot of
> make/configure/etc. work, but would be awesome to do.

Right, forgot about that. I wonder what roadblocks there still are, I
know we had some tries on this some time ago and there were problems,
but we changed a number of things since then. And nobody would miss
unify for sure. ;-)

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible

arguments that we as a community should think about. And most of the

Mike Connor

unread,

May 30, 2011, 2:25:47 PM5/30/11

to Chris AtLee, dev-pl...@lists.mozilla.org

On 2011-05-30, at 2:18 PM, Chris AtLee wrote:

>>> For tests, debug tests take a long time. e.g.
>>> XP debug mochitest-other (average 1h 24m 22s)
>>>
>>
>> Can you break this out by suite? I wouldn't be surprised if mochitest-a11y
>> is a large chunk of this for stupid reasons.
>
> mochitest-chrome is ~15 minutes
> mochitest-browser-chrome is ~48 minutes
> mochitest-a11y is ~5 minutes (although seems to range a bunch)
> mochitest-ipcplugins is 1 minute

Can we split -browser-chrome into a separate suite? then we'd have ~21 and ~48 minutes. Doesn't save us actual machine time, but should help end to end times.

-- Mike

Armen Zambrano Gasparnian

unread,

May 30, 2011, 2:40:51 PM5/30/11

to

We are actually trying to unify jobs to reduce the cost of reboots (bug
659328).
We can improve end-to-end time but if we worsen the wait times I don't
think we are winning much.

-- armenzg

Boris Zbarsky

unread,

May 30, 2011, 4:20:07 PM5/30/11

to

On 5/30/11 2:40 PM, Armen Zambrano Gasparnian wrote:
> We can improve end-to-end time but if we worsen the wait times I don't
> think we are winning much.

Given a fixed end-to-end time, wait times can be reduced by increasing
pool size (which we're doing anyway, right?).

So we should be looking into both.

-Boris

Boris Zbarsky

unread,

May 30, 2011, 4:21:03 PM5/30/11

to

On 5/30/11 12:59 PM, Armen Zambrano Gasparnian wrote:
> I checked with zandr that the Mac build is run on the 10.5 it enforces
> the 32-bit side of it (We looked at Activity monitor and checked the
> "kind" column which said "Intel").
>
> If we could force the 64-bit side of it on the 10.5 machines would we
> care of double building?

Running the 64-bit build on 10.5 doesn't work right last I checked due
to OS-level bugs. That's why we default to 32-bit on 10.5 and 64-bit on
10.6.

-Boris

Chris AtLee

unread,

May 30, 2011, 4:22:25 PM5/30/11

to

Correct. But I think we should also look at speeding up the tests,
and/or reducing the number that are run regularly.

Daniel Cater

unread,

May 30, 2011, 4:36:08 PM5/30/11

to

On Monday, 30 May 2011 19:40:51 UTC+1, armenzg wrote:
> On 11-05-30 2:25 PM, Mike Connor wrote:
> > Can we split -browser-chrome into a separate suite? then we'd have ~21 and ~48 minutes. Doesn't save us actual machine time, but should help end to end times.

> We are actually trying to unify jobs to reduce the cost of reboots (bug
> 659328).
> We can improve end-to-end time but if we worsen the wait times I don't
> think we are winning much.

Can you not keep test suites as separate jobs (or split them up further to reduce end-to-end times) *and* win back the cost of reboots by removing the need to reboot between each test suite?

What exactly are the reasons for needing to reboot between each one, or is it just precautionary?

Axel Hecht

unread,

May 30, 2011, 4:46:08 PM5/30/11

to

I don't think that's reasonable.

In particular volunteers can't just sit there and wait for 1.5 hours
because they're the third in line.

This is about making it easier to land, not to make it about as painful.

Axel

Armen Zambrano Gasparnian

unread,

May 30, 2011, 5:02:04 PM5/30/11

to mozilla.de...@googlegroups.com, Daniel Cater

For talos, the performance numbers degrade if reboots do not happen.
We also need the reboots for deploying changes to the machines (they
sync up with puppet and OPSI).

cheers,
Armen

Daniel Cater

unread,

May 30, 2011, 5:42:45 PM5/30/11

to

On Monday, 30 May 2011 22:02:04 UTC+1, armenzg wrote:
> For talos, the performance numbers degrade if reboots do not happen.

OK, but that doesn't mean you need to reboot between different unit-test suites.

> We also need the reboots for deploying changes to the machines (they
> sync up with puppet and OPSI).

That would still happen on the reboots for performance tests.

To make the question more specific: why do you need to reboot between different unit-test/non-performance test suites?

Armen Zambrano Gasparnian

unread,

May 30, 2011, 5:56:50 PM5/30/11

to mozilla.de...@googlegroups.com, Daniel Cater

On 11-05-30 5:42 PM, Daniel Cater wrote:
> On Monday, 30 May 2011 22:02:04 UTC+1, armenzg wrote:
>> For talos, the performance numbers degrade if reboots do not happen.
>
> OK, but that doesn't mean you need to reboot between different unit-test suites.
>
>> We also need the reboots for deploying changes to the machines (they
>> sync up with puppet and OPSI).
>
> That would still happen on the reboots for performance tests.
>

Yes, we just take advantage that before starting buildbot the machine is
idle to receive its packages without affecting any jobs running.

> To make the question more specific: why do you need to reboot between different unit-test/non-performance test suites?

There is not a way to know if after a given job we are going to run a
performance job. Starting from a clean state if what helps getting
consistent numbers.

I believe that starting from a clean state for unit tests is also
beneficial but I cannot give my word on that.

cheers,
Armen

Axel Hecht

unread,

May 30, 2011, 6:50:02 PM5/30/11

to

I've just been thinking about traffic jams. I'll try a hallway-mode
posting about that, because I can't wrap it up right without getting
some echo.

I remember too much about traffic jams and know too little, thus, my
intuition could be completely off. I'll just throw my thoughts out
there, shoot them down. Do some monte-carlo's to have evidence for the
contrary, even ;-).

What we have are traffic jams. Trying to get more cars through a road
than it carries. So far so good. Now, car traffic is operated by humans,
and is a very non-linear process. Which is the thing that may very well
break my thoughts.

Anyway:

- in traffic control, you get more cars through a street if you make
them go slower.

- we've got trucks, and porsches

- also, do we face some variant of a knapsack problem?

Counter args to the first are that that might just be the non-linear
growth in secure distance, plus the non-linearities.

Pro args for the latter -- not really an argument, but the wait times
cluster. That makes me wonder if prioritizing lenghty jobs to run first
might help.

Do we have something that'd show how much capacity we'd need on occasion
to not have wait times? If it actually is on occasion.

Axel

Robert O'Callahan

unread,

May 30, 2011, 7:22:54 PM5/30/11

to Chris AtLee, dev-pl...@lists.mozilla.org

Has anyone done any profiling of general mochitest runs?

I wonder how much time is spent loading and parsing MochiKit/packed.js
(150K) into every single mochitest, or even SimpleTest/SimpleTest.js (28K).
I suspect almost all tests use very little of MochiKit. If it's a
significant amount of time, we could modify tests to not use MochiKit, or we
could even look at doing some kind of optimization so that when we load a
cached script we reuse the bytecode or something.

Rob
--
"Now the Bereans were of more noble character than the Thessalonians, for
they received the message with great eagerness and examined the Scriptures
every day to see if what Paul said was true." [Acts 17:11]

Mike Hommey

unread,

May 31, 2011, 12:37:43 AM5/31/11

to Boris Zbarsky, dev-pl...@lists.mozilla.org

I don't know about OS-level bugs, but we actually do target 10.6 for
64-bits, so in any case, these binaries can't work on 10.5.

Mike

Ted Mielczarek

unread,

May 31, 2011, 7:37:48 AM5/31/11

to rob...@ocallahan.org, Chris AtLee, dev-pl...@lists.mozilla.org

On Mon, May 30, 2011 at 7:22 PM, Robert O'Callahan <rob...@ocallahan.org> wrote:
> Has anyone done any profiling of general mochitest runs?
>
> I wonder how much time is spent loading and parsing MochiKit/packed.js
> (150K) into every single mochitest, or even SimpleTest/SimpleTest.js (28K).
> I suspect almost all tests use very little of MochiKit. If it's a
> significant amount of time, we could modify tests to not use MochiKit, or we
> could even look at doing some kind of optimization so that when we load a
> cached script we reuse the bytecode or something.

Someone noted recently that we use very little of Mochitest, and we
might be better served by just writing a tiny Mochitest shim that
implements the bits we actually use in our tests, and dropping the
rest.

-Ted

Ted Mielczarek

unread,

May 31, 2011, 7:45:56 AM5/31/11

to Kyle Huey, mozilla.dev.planning group, Benjamin Smedberg

On Mon, May 30, 2011 at 12:56 PM, Kyle Huey <m...@kylehuey.com> wrote:
> AIUI xpcshell starts a new process for each test. I wonder if we could run
> multiple tests in parallel? I believe the js shell tests do this.

I thought about this at one point, but bsmedberg told me it wouldn't
work because we wrote compreg.dat in the app dir. I think with
manifest-based component registration this shouldn't be a problem
nowadays. I filed bug 660788 on implementing this.

-Ted

Chris AtLee

unread,

May 31, 2011, 9:53:25 AM5/31/11

to

On 30/05/11 12:27 PM, Mike Connor wrote:
>
> On 2011-05-30, at 12:26 PM, Armen Zambrano Gasparnian wrote:
>
>> On 11-05-30 12:23 PM, Mike Connor wrote:
>>>
>>> On 2011-05-30, at 12:17 PM, Chris AtLee wrote:
>>>>
>>>> Actually, just had a thought here. Could we do 64-bit only opt builds during the day, and have our nightlies and release builds be universal 32/64-bit?
>>>
>>> Are we testing on 32 and 64 bit machines, or just 64 bit?
>>>
>>> -- Mike
>> We would loose the ability to do optimized tests on 10.5 testers.
>> We would still have the 10.5 debug tests.
>
> Do we have any data on failures that happen on 10.5 but not 10.6?

Is this something that the Orange Factor database would have?

To me, this is very similar to doing PGO vs non-PGO on Windows. If
non-PGO catches 99% of the same problems that PGO does in a fraction of
the time, then we win.

Mike Connor

unread,

May 31, 2011, 9:57:38 AM5/31/11

to Chris AtLee, dev-pl...@lists.mozilla.org

That is, precisely, the data which I seek. I'm not sure how I'd feel about <= 90%, but if it's "two, ever" then it's a no-brainer.

-- Mike

Ehsan Akhgari

unread,

May 31, 2011, 10:11:13 AM5/31/11

to Chris AtLee, dev-pl...@lists.mozilla.org

On 11-05-31 9:53 AM, Chris AtLee wrote:
> On 30/05/11 12:27 PM, Mike Connor wrote:
>>
>> On 2011-05-30, at 12:26 PM, Armen Zambrano Gasparnian wrote:
>>
>>> On 11-05-30 12:23 PM, Mike Connor wrote:
>>>>
>>>> On 2011-05-30, at 12:17 PM, Chris AtLee wrote:
>>>>>
>>>>> Actually, just had a thought here. Could we do 64-bit only opt
>>>>> builds during the day, and have our nightlies and release builds be
>>>>> universal 32/64-bit?
>>>>
>>>> Are we testing on 32 and 64 bit machines, or just 64 bit?
>>>>
>>>> -- Mike
>>> We would loose the ability to do optimized tests on 10.5 testers.
>>> We would still have the 10.5 debug tests.
>>
>> Do we have any data on failures that happen on 10.5 but not 10.6?
>
> Is this something that the Orange Factor database would have?

Only for known intermittent oranges. It won't tell you how many
non-intermittent failures have been caught in the 10.5 tests but not
10.6 (or vice versa).

Ehsan

Chris Cooper

unread,

May 31, 2011, 11:23:42 AM5/31/11

to dev-pl...@lists.mozilla.org

On 2011-05-30 12:22 PM, Armen Zambrano Gasparnian wrote:
> On 11-05-30 10:54 AM, Justin Lebar wrote:
>>> Worth taking note that we have had the problem of try running less
>>> things than mozilla-central but never the other way around.
>>
>> Isn't this almost always [1]? That is, try runs the tests, but they
>> don't show up in TBPL.
>>
>> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=590526
> I was talking that:
> * mozilla-centrals' coverage >= try's coverage
> but never
> * mozilla-centrals' coverage < try's coverage

At the platform mtg last week, didn't we talk about actually having
*more* tests available on try, but that you would explicitly have to ask
for them to get them run? The default set of tests on try could then be
greatly reduced.

This would of course be coupled with examining current default test
coverage on m-c with an eye to fixing broken/long-duration tests, and
removing tests that duplicate functionality.

cheers,
--
coop

signature.asc

Steve Fink

unread,

May 31, 2011, 12:42:43 PM5/31/11

to Armen Zambrano Gasparnian, dev-pl...@lists.mozilla.org, Daniel Cater

On 05/30/2011 02:56 PM, Armen Zambrano Gasparnian wrote:
> On 11-05-30 5:42 PM, Daniel Cater wrote:
>> On Monday, 30 May 2011 22:02:04 UTC+1, armenzg wrote:
>>> For talos, the performance numbers degrade if reboots do not happen.
>>
>> OK, but that doesn't mean you need to reboot between different
>> unit-test suites.
>>
>>> We also need the reboots for deploying changes to the machines (they
>>> sync up with puppet and OPSI).
>>
>> That would still happen on the reboots for performance tests.
>>
> Yes, we just take advantage that before starting buildbot the machine
> is idle to receive its packages without affecting any jobs running.
>
>> To make the question more specific: why do you need to reboot between
>> different unit-test/non-performance test suites?
>
> There is not a way to know if after a given job we are going to run a
> performance job. Starting from a clean state if what helps getting
> consistent numbers.

What if reboots were their own jobs, and the scheduler knew that
performance tests required a reboot job similar to how they require a
preceding build job now?

I don't know anything about buildbot, so I may be talking nonsense. At
the very least, you'd need to ensure that the reboot happens *on the
same machine* as the later perf job... ;-)

Also, I recognize that performance numbers require a reboot now, but
does anyone have a guess as to exactly what would be necessary in order
to remove that requirement? The answer is not going to be
cross-platform, I'm sure. But knowing how to put a running machine into
a state where a performance measurement is trustably repeatable would be
useful for more than our automated Talos metrics.

Chris AtLee

unread,

May 31, 2011, 1:01:59 PM5/31/11

to

IMO, we're spending way too much time talking about reboots. The reboot
overhead is minimal in most cases.

We need to be focusing on speeding up builds/tests, or reducing the
amount of work to do.

Daniel Cater

unread,

May 31, 2011, 1:35:13 PM5/31/11

to

On Tuesday, 31 May 2011 18:01:59 UTC+1, Chris AtLee wrote:
> IMO, we're spending way too much time talking about reboots. The reboot
> overhead is minimal in most cases.
>
> We need to be focusing on speeding up builds/tests, or reducing the
> amount of work to do.

Right, but splitting a test suite up and running the parts in parallel could save a large chunk of time and armenzg said "We are actually trying to unify jobs to reduce the cost of reboots". If that's true then avoiding the need to reboot is relevant as it will allow the test suites to be split up further, potentially reducing the end-to-end time by a large amount.

Shawn Wilsher

unread,

May 31, 2011, 1:37:19 PM5/31/11

to dev-pl...@lists.mozilla.org

On 5/31/2011 10:01 AM, Chris AtLee wrote:
> IMO, we're spending way too much time talking about reboots. The reboot
> overhead is minimal in most cases.

This statement seems to be at odds with Armen's earlier in this thread
(re: bug 659328).

> We need to be focusing on speeding up builds/tests, or reducing the
> amount of work to do.

It seems to me that different people appear to have different goals
here, which I think is causing people to talk past each other to some
extent. As I understand things, some folks are looking at doing less
work to reduce turn-around times and wait times. Others only seem to be
focusing on turn-around times once jobs start. These don't have to be
at odds with each other, but I think until we can agree on goals here,
everyone is going to end up talking just a bit past each other, and
that's not going to to get us very far.

Cheers,

Shawn

Chris AtLee

unread,

May 31, 2011, 1:51:13 PM5/31/11

to

On 31/05/11 01:37 PM, Shawn Wilsher wrote:
> On 5/31/2011 10:01 AM, Chris AtLee wrote:
>> IMO, we're spending way too much time talking about reboots. The reboot
>> overhead is minimal in most cases.
> This statement seems to be at odds with Armen's earlier in this thread
> (re: bug 659328).

That bug, and the one it references (586418), are talking about specific
cases where we're running tests that take a only a few minutes, and so
all the overhead (which includes rebooting, downloading builds, symbols
and tests, unpacking, cleanup) is a large % of time for the job.

Moving these tests to run with other suites is more efficient all around
since the overhead costs can be shared.

>> We need to be focusing on speeding up builds/tests, or reducing the
>> amount of work to do.
> It seems to me that different people appear to have different goals
> here, which I think is causing people to talk past each other to some
> extent. As I understand things, some folks are looking at doing less
> work to reduce turn-around times and wait times. Others only seem to be
> focusing on turn-around times once jobs start. These don't have to be at
> odds with each other, but I think until we can agree on goals here,
> everyone is going to end up talking just a bit past each other, and
> that's not going to to get us very far.

Agreed. We need clear goals and owners for pushing those goals.

Who's interested in pursuing this?

Joe Drew

unread,

May 31, 2011, 1:58:23 PM5/31/11

to dev-pl...@lists.mozilla.org

On 2011-05-30 12:44 PM, Armen Zambrano Gasparnian wrote:
> Hold down, I probably replied this incorrectly (from zandr's email).
>
> I was under the assumption that we were testing the 32-bit side of the
> Mac bundle but if we are capable of running the 64-bit side of it on the
> 10.5 machines we should then be fine.

We use 10.6-specific features in our 64-bit builds, so that won't work.
But even if we didn't, if we do a 64-bit only build, we can't run Flash
at all, 10.5 or 10.6.

Why not do a 32-bit only build instead? Or do 32-bit and 64-bit in
parallel on separate machines, then glue the two together?

Joe

Mike Shaver

unread,

May 31, 2011, 2:22:37 PM5/31/11

to Kyle Huey, Chris AtLee, dev-pl...@lists.mozilla.org

On Mon, May 30, 2011 at 9:56 AM, Kyle Huey <m...@kylehuey.com> wrote:
>> For builds, the biggest offenders here are:
>> Windows opt builds (average 3h 4m 21s)
>
> There's not much we can do here.

No offense intended, but I find that a bit hard to believe. There is
no hardware around that can turn a PGO build around in less than 3h
given pymake and -jbignum? I'd find that very surprising, and I'm
willing to go to Fry's and spend $2K to test the hypothesis.

I suspect, in fact, that there are developers with machines today who
can do that. There may be such a machine under my desk, in fact!

Mike

Chris AtLee

unread,

May 31, 2011, 2:33:46 PM5/31/11

to

If you do, keep in mind that the 3h 4m time includes other steps, such
as cloning/updating, make buildsymbols, make package, etc. Need to make
sure we're comparing apples to apples here.

Kyle Huey

unread,

May 31, 2011, 2:37:22 PM5/31/11

to Mike Shaver, Ted Mielczarek, Chris AtLee, dev-pl...@lists.mozilla.org

On Tue, May 31, 2011 at 11:22 AM, Mike Shaver <mike....@gmail.com> wrote:

> On Mon, May 30, 2011 at 9:56 AM, Kyle Huey <m...@kylehuey.com> wrote:
> >> For builds, the biggest offenders here are:
> >> Windows opt builds (average 3h 4m 21s)
> >
> > There's not much we can do here.
>
> No offense intended, but I find that a bit hard to believe. There is
> no hardware around that can turn a PGO build around in less than 3h
> given pymake and -jbignum? I'd find that very surprising, and I'm
> willing to go to Fry's and spend $2K to test the hypothesis.
>

Given that the average is 3h 4m I would believe that getting under 3h is
totally doable ....

Assuming you want to see substantive wins, pymake and -jbignum aren't going
to help you with PGO though. PGO serializes compilation, so no matter what
we do we're bound by the time it takes to compile all of the code that ends
up in xul.dll in series ... twice. Ted timed on a beefy i7 a total build of
about 2 hours. And as catlee points out, that doesn't include setup time,
buildsymbols, etc. Ted is going to respond in more detail.

- Kyle

Ted Mielczarek

unread,

May 31, 2011, 2:38:29 PM5/31/11

to Mike Shaver, Chris AtLee, Kyle Huey, dev-pl...@lists.mozilla.org

On Tue, May 31, 2011 at 2:22 PM, Mike Shaver <mike....@gmail.com> wrote:
> On Mon, May 30, 2011 at 9:56 AM, Kyle Huey <m...@kylehuey.com> wrote:
>>> For builds, the biggest offenders here are:
>>> Windows opt builds (average 3h 4m 21s)
>>
>> There's not much we can do here.
>
> No offense intended, but I find that a bit hard to believe. There is
> no hardware around that can turn a PGO build around in less than 3h
> given pymake and -jbignum? I'd find that very surprising, and I'm
> willing to go to Fry's and spend $2K to test the hypothesis.
>

> I suspect, in fact, that there are developers with machines today who
> can do that. There may be such a machine under my desk, in fact!

Conveniently, I timed this on my desktop machine last week (Core i7,
8GB, SSD, pymake -j8), and it can do a clobber PGO build in 2 hours 5
minutes. I believe catlee said the median time on build slaves was
about 2 hours 45 minutes, so it's possible that we can shave some time
off of that by throwing hardware at it.

The lower bound here is probably the 65 minutes (on my machine) to
link libxul with optimization (the second link), which is
single-threaded and totally CPU-bound.

-Ted

Ted Mielczarek

unread,

May 31, 2011, 2:40:24 PM5/31/11

to Mike Shaver, Chris AtLee, Kyle Huey, dev-pl...@lists.mozilla.org

On Tue, May 31, 2011 at 2:22 PM, Mike Shaver <mike....@gmail.com> wrote:
> On Mon, May 30, 2011 at 9:56 AM, Kyle Huey <m...@kylehuey.com> wrote:
>>> For builds, the biggest offenders here are:
>>> Windows opt builds (average 3h 4m 21s)
>>
>> There's not much we can do here.
>
> No offense intended, but I find that a bit hard to believe. There is
> no hardware around that can turn a PGO build around in less than 3h
> given pymake and -jbignum? I'd find that very surprising, and I'm
> willing to go to Fry's and spend $2K to test the hypothesis.
>
> I suspect, in fact, that there are developers with machines today who
> can do that. There may be such a machine under my desk, in fact!

But you know, this is all sort of irrelevant, we should just fix bug
658313 and drop the cycle time way down.

-Ted

Mike Shaver

unread,

May 31, 2011, 2:55:05 PM5/31/11

to Kyle Huey, Chris AtLee, dev-pl...@lists.mozilla.org, Ted Mielczarek

On Tue, May 31, 2011 at 11:37 AM, Kyle Huey <m...@kylehuey.com> wrote:
> Given that the average is 3h 4m I would believe that getting under 3h is
> totally doable ....

Sorry, I meant 2h, yeah.

> And as catlee points out, that doesn't include setup time,
> buildsymbols, etc.

Seems like we should be able to upload build symbols in parallel with
other work, and reduce setup time in the cases where we're not starved
for slaves, but I'm going to take a look at more of the timing data
that's around before I make specific suggestions.

Mike

Armen Zambrano Gasparnian

unread,

May 31, 2011, 3:01:13 PM5/31/11

to mozilla.de...@googlegroups.com, Daniel Cater

Easy example,
talos a11y takes on average 270 secs to finish.
52 secs are the actual performance run.
This means that only 20% of the whole run is useful plus a reboot.

Imagine we add a11y to another suite, this would mean that we would save
setup time (3.5mins) + 1 reboot per checkin.

On another note, please leave the reboots alone. It is part of keeping
the machines healthy and over-discussing it is just taking my time.
We have already determined places where to improve wasted CPU.

BTW which bug and who will be trying to analyze the actual test runs and
try to reduce their length from development? I can help by providing data.

cheers,
Armen

Daniel Cater

unread,

May 31, 2011, 3:20:16 PM5/31/11

to

On Tuesday, 31 May 2011 20:01:13 UTC+1, armenzg wrote:
> Easy example,
> talos a11y takes on average 270 secs to finish.
> 52 secs are the actual performance run.
> This means that only 20% of the whole run is useful plus a reboot.
>
> Imagine we add a11y to another suite, this would mean that we would save
> setup time (3.5mins) + 1 reboot per checkin.

I said it "could save time", I didn't say it "will always save time". Obviously combining short-running test suites when the overhead is a significant percentage saves time. mconnor presented an example of where the opposite would presumably be true.

> On another note, please leave the reboots alone. It is part of keeping
> the machines healthy and over-discussing it is just taking my time.

I didn't realise that things which take up your time were off-limits for discussion...

I'm also not sure about the "healthiness" of rebooting constantly. If there is interference between one test suite and the next then that is a problem that should be fixed isn't it (aside from cold-start performance stuff)? Users aren't going to reboot between every run of the browser.

Boris Zbarsky

unread,

May 31, 2011, 3:42:00 PM5/31/11

to

On 5/30/11 7:22 PM, Robert O'Callahan wrote:
> Has anyone done any profiling of general mochitest runs?

I just did a bit...

Running mochitest near the beginning of the run, sampled for 10s.... 90%
of the time is in mach_msg_trap (basically idle time, as far as I can
tell). That matches my CPU meter, actually; it never went about 25% for
that process.

> I wonder how much time is spent loading and parsing MochiKit/packed.js
> (150K) into every single mochitest, or even SimpleTest/SimpleTest.js (28K).

This testcase:

<!DOCTYPE html>
<base href="http://mochi.test:8888/">
<script> var s = new Date(); </script>
<script type="text/javascript" src="/MochiKit/packed.js"></script>
<script> var m = new Date(); </script>
<script type="text/javascript"
src="/tests/SimpleTest/SimpleTest.js"></script>
<script> var e = new Date(); alert("packed: " + (m - s) + ",
SimpleTest:" + (e - m)); </script>

shows numbers on the order of 42ms and 1ms respectively on my machine (a
year-old MBP) when loaded in the a browser that's supposed to be running
mochitest.

We have order of 3600 mochitest files, so the total overhead of
packed.js over the whole run is about 2.5 mins.

> or we could even look at doing some kind of optimization so that when we load a
> cached script we reuse the bytecode or something.

https://bugzilla.mozilla.org/show_bug.cgi?id=288473

-Boris

Armen Zambrano Gasparnian

unread,

May 31, 2011, 3:43:12 PM5/31/11

to mozilla.de...@googlegroups.com, Daniel Cater

On 11-05-31 3:20 PM, Daniel Cater wrote:
> On Tuesday, 31 May 2011 20:01:13 UTC+1, armenzg wrote:
>> Easy example,
>> talos a11y takes on average 270 secs to finish.
>> 52 secs are the actual performance run.
>> This means that only 20% of the whole run is useful plus a reboot.
>>
>> Imagine we add a11y to another suite, this would mean that we would save
>> setup time (3.5mins) + 1 reboot per checkin.
>
> I said it "could save time", I didn't say it "will always save time". Obviously combining short-running test suites when the overhead is a significant percentage saves time. mconnor presented an example of where the opposite would presumably be true.
>
>> On another note, please leave the reboots alone. It is part of keeping
>> the machines healthy and over-discussing it is just taking my time.
>
> I didn't realise that things which take up your time were off-limits for discussion...

Can we at least postpone the reboots topic while we fix other things
that will give us immediate benefits?

Besides healthiness it is needed to ensure that within one reboot a
machine will have the latest toolchain.

>
> I'm also not sure about the "healthiness" of rebooting constantly. If there is interference between one test suite and the next then that is a problem that should be fixed isn't it (aside from cold-start performance stuff)? Users aren't going to reboot between every run of the browser.

This sounds that could be taken on by anyone regardless of
infrastructure's load.

cheers,
Armen

Zack Weinberg

unread,

May 31, 2011, 6:24:08 PM5/31/11

to

On 2011-05-30 1:46 PM, Axel Hecht wrote:
> On 30.05.11 18:42, Zack Weinberg wrote:
>> On 2011-05-30 9:09 AM, Robert Kaiser wrote:
>>>>> * What is our target cycle time?
>>>>
>>>> As fast as possible?
>>>
>>> Sure, but it helps to set a target we really want to be the max of what
>>> we need to wait to have results.
>>
>> If a complete cycle, push to all results available, took less than half
>> an hour, then IMO it would be reasonable to forbid pushes while results
>> from a previous cycle were pending. And that would render the "what do
>> we back out when we discover an orange" argument moot. (We would have to
>> have a landing queue all the time, but I think that's ok.)
>>
>> So that's my suggestion for target cycle time.
>
> I don't think that's reasonable.
>
> In particular volunteers can't just sit there and wait for 1.5 hours
> because they're the third in line.

This problem, I would solve with automated test-and-merge-if-green, as I
have described several times in the past.

zw

Zack Weinberg

unread,

May 31, 2011, 6:27:33 PM5/31/11

to

On 2011-05-31 11:38 AM, Ted Mielczarek wrote:
>
> The lower bound here is probably the 65 minutes (on my machine) to
> link libxul with optimization (the second link), which is
> single-threaded and totally CPU-bound.

Have we tried Intel's allegedly drop-in better replacement for MSVC?
http://software.intel.com/en-us/articles/intel-compilers/

zw

Mike Shaver

unread,

May 31, 2011, 6:33:05 PM5/31/11

to Zack Weinberg, dev-pl...@lists.mozilla.org

Several times, yeah.

Mike

Zack Weinberg

unread,

May 31, 2011, 6:40:31 PM5/31/11

to

Just to be 100% sure ... Intel's compilers have the same problem where
the link-time optimization phase is totally single threaded and has to
crunch the entire of libxul? And, like Microsoft, they have expressed
active lack of interest in fixing this?

zw

Lukas Blakk

unread,

May 31, 2011, 7:43:03 PM5/31/11

to dev-pl...@lists.mozilla.org

And that is being worked on, as we speak.
Documenting our process is: https://wiki.mozilla.org/BugzillaAutoLanding
Tracking bug is : https://bugzilla.mozilla.org/show_bug.cgi?id=657828

More blogs/information/proof of concepts as they become available. This
project is largely being worked on by our summer intern and we hope to
have even the simplest (more strict criteria) version up and running
near the end of summer.

Cheers,
Lukas

Mike Shaver

unread,

May 31, 2011, 8:20:01 PM5/31/11

to Zack Weinberg, dev-pl...@lists.mozilla.org

On Tue, May 31, 2011 at 3:40 PM, Zack Weinberg <za...@panix.com> wrote:
> Just to be 100% sure ... Intel's compilers have the same problem where the
> link-time optimization phase is totally single threaded and has to crunch
> the entire of libxul? And, like Microsoft, they have expressed
> active lack of interest in fixing this?

No, for us they just didn't "drop in" (problems with resulting code,
and some compiler crashes), though those issues may have been resolved
now.

Has Microsoft expressed "active lack of interest"? I thought they
were interested in it, based on a blog post I read ages ago, but I've
been skimming this thread.

Mike

Robert O'Callahan

unread,

May 31, 2011, 8:57:29 PM5/31/11

to Boris Zbarsky, dev-pl...@lists.mozilla.org

On Wed, Jun 1, 2011 at 7:42 AM, Boris Zbarsky <bzba...@mit.edu> wrote:

> On 5/30/11 7:22 PM, Robert O'Callahan wrote:
>
>> Has anyone done any profiling of general mochitest runs?
>>
>
> I just did a bit...
>
> Running mochitest near the beginning of the run, sampled for 10s.... 90% of
> the time is in mach_msg_trap (basically idle time, as far as I can tell).
> That matches my CPU meter, actually; it never went about 25% for that
> process.

Hmm, so what is it waiting for? httpd.js over the local network?

> I wonder how much time is spent loading and parsing MochiKit/packed.js
>> (150K) into every single mochitest, or even SimpleTest/SimpleTest.js
>> (28K).
>>
>
> This testcase:
>
> <!DOCTYPE html>
> <base href="http://mochi.test:8888/">
> <script> var s = new Date(); </script>
> <script type="text/javascript" src="/MochiKit/packed.js"></script>
> <script> var m = new Date(); </script>
> <script type="text/javascript"
> src="/tests/SimpleTest/SimpleTest.js"></script>
> <script> var e = new Date(); alert("packed: " + (m - s) + ", SimpleTest:" +
> (e - m)); </script>
>
> shows numbers on the order of 42ms and 1ms respectively on my machine (a
> year-old MBP) when loaded in the a browser that's supposed to be running
> mochitest.
>
> We have order of 3600 mochitest files, so the total overhead of packed.js
> over the whole run is about 2.5 mins.

OK, so the answer is "loading MochiKit doesn't matter (yet)".

Thanks,
Rob
--
"Now the Bereans were of more noble character than the Thessalonians, for
they received the message with great eagerness and examined the Scriptures
every day to see if what Paul said was true." [Acts 17:11]

Zack Weinberg

unread,

May 31, 2011, 11:01:34 PM5/31/11

to

On 05/31/2011 04:43 PM, Lukas Blakk wrote:
> On 11-05-31 3:24 PM, Zack Weinberg wrote:
>> On 2011-05-30 1:46 PM, Axel Hecht wrote:
>>>
>>> In particular volunteers can't just sit there and wait for 1.5 hours
>>> because they're the third in line.
>>
>> This problem, I would solve with automated test-and-merge-if-green, as
>> I have described several times in the past.

...

> More blogs/information/proof of concepts as they become available. This
> project is largely being worked on by our summer intern and we hope to
> have even the simplest (more strict criteria) version up and running
> near the end of summer.

Excellent. *cc:s self on bug*

zw