Reminder on Try usage and infrastructure resources

Stuart Philp

unread,

Sep 14, 2017, 11:36:04 AM9/14/17

to fx-team, dev-pl...@lists.mozilla.org, firefox-ci

Hello all,

As we near 57 the Firefox CI group felt it was important to send out a bit
of a reminder regarding infrastructure usage when you push.

*tl;dr* There is a real cost (both time and $) to using the 'all' flags in
pushes. They are there if you need them, but please remember to think about
what platforms and test suites you need to execute before you push, and
limit the scope of execution if you can.

A bit of background, our build and test infrastructure is a mix of physical
hardware and AWS cloud instances. AWS scales dynamically to our load, but
our physical hardware is limited. Occasionally you might see wait times and
queues build up, this is typically due to our hardware being overwhelmed.
When it gets really bad, we sometimes have to close the trees to allow the
machines to catch up. Obviously, that's not good for anyone. Specifically,
over the last few weeks we have seen a few long backlogs on our OSX
machines, once requiring tree closure. We never want to have to close
trees, it's a last resort, especially this close to beta.

Because of the physical hardware limitation, this is particularly
concerning for performance tests and tests that run on OSX (OSX builds are
now cross-compiled on Linux and not really affected). If you don't need to
run perf or OSX tests, please consider excluding them from your pushes.
ahal sent mail a few weeks ago about the new fuzzy
<https://ahal.ca/blog/2017/mach-try-fuzzy/> matching tool, which can be
useful here to help you figure out what to select.

To give you an idea of scale, we average 1000 pushes per week on
integration branches (excluding try). Our desktop tests alone (excluding
numbers for android, build jobs, and a handful of others) use roughly 900
machine hours per push. 900k machine hours per week combined. Including try
and those other configurations you can roughly double these numbers.
Needless to say that's a lot of machine time, and so any savings we can get
can really add up.

We are continuously monitoring our capacity requirements for today and for
the future (new platforms, updated OSes, new experiments, new tests, etc).
But it's a dynamic problem, and sometimes things pile up. While we accept
that today, it's a problem we want to further limit in the future. There
are a lot of interesting things we're working on here, such as selective
test execution, intermittent reduction strategies, smarter tooling, and
smarter infrastructure allocation that will hopefully go a long way to
reducing these issues. We'll continue to update everyone here as we make
those improvements.

In the mean time, just a reminder to be diligent with what platforms and
test suites you are running.

If you have any questions feel free to reach out.

Thanks!

Marco Bonardo

unread,

Sep 14, 2017, 11:48:58 AM9/14/17

to Stuart Philp, dev-platform, fx-team, firefox-ci

When I need to retrigger a mochitest-browser test multiple times (to
investigate an intermittent), often I end up running all the
mochitest-browser tests, looking at every log until I find the chunk
where the test is, and retrigger just that chunk. The chunk number
changes based on the platform and debug/opt, so it's painful.
Is there a way to trigger only the chunk that will contain a given
test, so I can save running all of the other chunks?

Michael de Boer

unread,

Sep 14, 2017, 11:55:26 AM9/14/17

to Marco Bonardo, Stuart Philp, Mozilla dev-platform mailing list mailing list, fx-team, firefox-ci

This! This! This! I’d love to be able to do this - would making testing possible test failure fixes sooo much easier.

Cheers,

Mike.

Cameron Dawson

unread,

Sep 14, 2017, 11:56:12 AM9/14/17

to Marco Bonardo, Stuart Philp, dev-platform, fx-team, firefox-ci

Marco— I don’t know of a way to do exactly that yet. But that is in the roadmap for the Test-based UI in Treeherder. And the existing UI may help you there.

On any push, click the down arrow (Action Menu) at the far right of the push status line and select “Experimental: Test-Centric UI”
From there you can see the list of tests that failed for that push (at this time, only for tests that log with the structured logging, but they include Mochitest)
For each test, you’ll see a link to the chunk back in Treeherder where that test ran. So you can go BACK to Treeherder to do your retrigger there. This side-UI will be moving back into the main Treeherder repo soon, so you’ll be able to trigger directly from there at some point.

I realize this workflow is a but cumbersome, but perhaps better than poring through logs. :)

I’m actively working on this UI, so please give me any feedback you have in the form of bugs or in #treeherder.

-Cam

> On Sep 14, 2017, at 8:48 AM, Marco Bonardo <mbon...@mozilla.com> wrote:
>
> When I need to retrigger a mochitest-browser test multiple times (to
> investigate an intermittent), often I end up running all the
> mochitest-browser tests, looking at every log until I find the chunk
> where the test is, and retrigger just that chunk. The chunk number
> changes based on the platform and debug/opt, so it's painful.
> Is there a way to trigger only the chunk that will contain a given
> test, so I can save running all of the other chunks?
>

> --
> You received this message because you are subscribed to the Google Groups "firefox-ci" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to firefox-ci+...@mozilla.com.
> To post to this group, send email to firef...@mozilla.com.
> To view this discussion on the web visit https://groups.google.com/a/mozilla.com/d/msgid/firefox-ci/CAPDqYT151ETZSGM83Wo_jdpSj1bHhs57eTpah4bE5PE2BM9ckQ%40mail.gmail.com.

James Graham

unread,

Sep 14, 2017, 11:57:18 AM9/14/17

to dev-pl...@lists.mozilla.org

On 14/09/17 16:48, Marco Bonardo wrote:
> When I need to retrigger a mochitest-browser test multiple times (to
> investigate an intermittent), often I end up running all the
> mochitest-browser tests, looking at every log until I find the chunk
> where the test is, and retrigger just that chunk. The chunk number
> changes based on the platform and debug/opt, so it's painful.
> Is there a way to trigger only the chunk that will contain a given
> test, so I can save running all of the other chunks?

You might be able to use

mach try -p linux64 <path to testdir>

in order to run a single chunk with just the chosen tests.

Marco Bonardo

unread,

Sep 14, 2017, 12:03:38 PM9/14/17

to James Graham, dev-platform

On Thu, Sep 14, 2017 at 5:56 PM, James Graham <ja...@hoppipolla.co.uk> wrote:
> On 14/09/17 16:48, Marco Bonardo wrote:
> mach try -p linux64 <path to testdir>

Afaict, that runs a single folder, but the intermittent may be caused
by interactions across different tests in different folders. I'm not
up-to-date to what we do today, do we restart the test harness per
each folder? That'd basically solve my troubles.

Andrew Halberstadt

unread,

Sep 14, 2017, 12:10:03 PM9/14/17

to Marco Bonardo, Stuart Philp, dev-platform, fx-team, firefox-ci

There's sort of a way to do this with try syntax. I say sort of because it
doesn't support all suites and there seems to be a few bugs with it. But
you can try:

./mach try -b o -p linux64 -u none path/to/dir/or/test

This should only run the directory or test you specified (it'll always show
up as chunk 1). I have vague plans to implement this a bit more robustly
for try_task_config.json based scheduling, but no time frame on when that
work might happen yet.

-Andrew

On Thu, Sep 14, 2017 at 11:48 AM Marco Bonardo <mbon...@mozilla.com> wrote:

> When I need to retrigger a mochitest-browser test multiple times (to
> investigate an intermittent), often I end up running all the
> mochitest-browser tests, looking at every log until I find the chunk
> where the test is, and retrigger just that chunk. The chunk number
> changes based on the platform and debug/opt, so it's painful.
> Is there a way to trigger only the chunk that will contain a given
> test, so I can save running all of the other chunks?
>

Cameron Dawson

unread,

Sep 14, 2017, 12:12:04 PM9/14/17

to Gijs Kruitbosch, Marco Bonardo, Stuart Philp, dev-platform, fx-team, firefox-ci

That’s correct, yeah. If you don’t have a push where it’s failed already, then it won’t show in the Test Centric UI. Though I’ll write up a bug to explore adding this functionality. Perhaps there’s a way to mine Active Data to get this.

-Cam

> On Sep 14, 2017, at 9:05 AM, Gijs Kruitbosch <gkrui...@mozilla.com> wrote:
>
> This only works once you have a run that failed the test you're interested in, right? There's no way to tell the test-centric UI "find me the chunk for test with name X".
>
> ~ Gijs

>
> On 14/09/2017 16:55, Cameron Dawson wrote:
>> Marco— I don’t know of a way to do exactly that yet. But that is in the roadmap for the Test-based UI in Treeherder. And the existing UI may help you there.
>>
>> On any push, click the down arrow (Action Menu) at the far right of the push status line and select “Experimental: Test-Centric UI”
>> From there you can see the list of tests that failed for that push (at this time, only for tests that log with the structured logging, but they include Mochitest)
>> For each test, you’ll see a link to the chunk back in Treeherder where that test ran. So you can go BACK to Treeherder to do your retrigger there. This side-UI will be moving back into the main Treeherder repo soon, so you’ll be able to trigger directly from there at some point.
>>
>> I realize this workflow is a but cumbersome, but perhaps better than poring through logs. :)
>>
>> I’m actively working on this UI, so please give me any feedback you have in the form of bugs or in #treeherder.
>>
>> -Cam
>>
>>

Andrew Halberstadt

unread,

Sep 14, 2017, 12:14:29 PM9/14/17

to Marco Bonardo, James Graham, dev-platform

Yes, all mochitests except Android restart between manifests (which is
usually the same as folders).

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Kyle Lahnakoski

unread,

Sep 14, 2017, 12:24:05 PM9/14/17

to Michael de Boer, Marco Bonardo, Stuart Philp, Mozilla dev-platform mailing list mailing list, fx-team, firefox-ci

You can try ActiveData, which stores all test results from the past few
weeks. Here is an example query that shows the chunk number for each
run/build combo in the past day. ActiveData is sometimes more than a
day behind

https://activedata.allizom.org/tools/query.html#query_id=4HHuBgDu

{
    "from":"unittest",
    "select":[
        {"aggregate":"count"},
        {"value":"action.start_time","aggregate":"max"}
    ],
    "groupby":[
        "run.suite",
        "run.chunk",
        "result.test",
        "build.platform",
        "build.type",
        "run.type"
    ],
    "where":{"and":[
        {"eq":{"build.branch":"mozilla-inbound"}},
        {"prefix":{"run.suite":"moch"}},
        {"gt":{"action.start_time":{"date":"today-day"}}},
        {"regex":{"result.test":".*browser_623779.js.*"}}
    ]},
    "limit":1000
}

On 2017-09-14 11:49, Michael de Boer wrote:

>> On 14 Sep 2017, at 17:48, Marco Bonardo <mbon...@mozilla.com> wrote:
>>
>> When I need to retrigger a mochitest-browser test multiple times (to
>> investigate an intermittent), often I end up running all the
>> mochitest-browser tests, looking at every log until I find the chunk
>> where the test is, and retrigger just that chunk. The chunk number
>> changes based on the platform and debug/opt, so it's painful.
>> Is there a way to trigger only the chunk that will contain a given
>> test, so I can save running all of the other chunks?

Mike Hommey

unread,

Sep 14, 2017, 4:54:51 PM9/14/17

to Stuart Philp, dev-pl...@lists.mozilla.org, fx-team, firefox-ci

On Thu, Sep 14, 2017 at 11:35:53AM -0400, Stuart Philp wrote:
> Hello all,
>
> As we near 57 the Firefox CI group felt it was important to send out a bit
> of a reminder regarding infrastructure usage when you push.
>
> *tl;dr* There is a real cost (both time and $) to using the 'all' flags in
> pushes. They are there if you need them, but please remember to think about
> what platforms and test suites you need to execute before you push, and
> limit the scope of execution if you can.

Maybe it's time to kill the `all` flag, at least for -p. Why? For the
combined reason that you're saying we shouldn't be using it, and that
it's actually *not* running every platform.

Mike

Botond Ballo

unread,

Sep 14, 2017, 6:32:33 PM9/14/17

to Mike Hommey, Stuart Philp, dev-platform, fx-team, firefox-ci

On Thu, Sep 14, 2017 at 4:54 PM, Mike Hommey <m...@glandium.org> wrote:
> Maybe it's time to kill the `all` flag, at least for -p. Why? For the
> combined reason that you're saying we shouldn't be using it, and that
> it's actually *not* running every platform.

I think "-p all" is still useful for "T pushes" (and it sounds like
build jobs aren't the main concern resource-wise).

Cheers,
Botond

Dustin Mitchell

unread,

Sep 14, 2017, 7:53:57 PM9/14/17

to Botond Ballo, Mike Hommey, Stuart Philp, dev-platform, fx-team, firefox-ci

2017-09-14 18:32 GMT-04:00 Botond Ballo <bba...@mozilla.com>:
> I think "-p all" is still useful for "T pushes" (and it sounds like
> build jobs aren't the main concern resource-wise).

Correct -- all builds are in AWS.

I'd like to steer this away from "What legacy syntax should we use
instead?" and "How should we tweak the legacy try syntax?" to:

How can we use the modern tryselect functionality to achieve more
precise try pushes?

(tryselect is the task-selection logic behind ./mach try fuzzy)

Dustin

Kris Maglione

unread,

Sep 14, 2017, 9:07:46 PM9/14/17

to Masayuki Nakano, Michael de Boer, Stuart Philp, Kyle Lahnakoski, firefox-ci, Marco Bonardo, Mozilla dev-platform mailing list mailing list, fx-team

Your best bet is probably to use `mach try` with a specific set
of test directories. It will generate a set of --try-test-paths
flags to restrict tests to those paths, and only run the first
chunk of any group. Without that, groups shift around too much
to be reliable.

On Fri, Sep 15, 2017 at 10:03:00AM +0900, Masayuki Nakano wrote:
>Even when I got the chunk numbers, specifying chunk numbers of
>mochitests wouldn't work, see this log:
>https://treeherder.mozilla.org/#/jobs?repo=try&revision=c09c7046ed0664e89f7224e1de5219c39c94c948
>After that, I needed to rerun mochitests with |-u mochitests|. IIRC, I
>tried to kick the specific chunks with "Add new jobs", but didn't
>work.
>And also, when I try to investigate random oranges which are not
>reproducible on my environments, I want an option like
>|--run-until-failure| and |--repeat REPEAT| in the try syntax. Because
>of no such options, I need to trigger a lot of jobs manually and that
>may/might cause too many oranges.

>--
>Masayuki Nakano <mna...@mozilla.com>
>Software Engineer, Mozilla

--
Kris Maglione
Senior Firefox Add-ons Engineer
Mozilla Corporation

The presence of those seeking the truth is infinitely to be preferred
to the presence of those who think they’ve found it.
--Terry Pratchett

James Graham

unread,

Sep 15, 2017, 5:28:22 AM9/15/17

to dev-pl...@lists.mozilla.org

On 15/09/17 00:53, Dustin Mitchell wrote:
> 2017-09-14 18:32 GMT-04:00 Botond Ballo <bba...@mozilla.com>:
>> I think "-p all" is still useful for "T pushes" (and it sounds like
>> build jobs aren't the main concern resource-wise).
>
> Correct -- all builds are in AWS.
>
> I'd like to steer this away from "What legacy syntax should we use
> instead?" and "How should we tweak the legacy try syntax?" to:
>
> How can we use the modern tryselect functionality to achieve more
> precise try pushes?

I think that's a good discussion to have, but the original motivation
for this thread aiui are recent incidents where there have been 12+ hour
backclogs on try, causing problems across the org. In general we ought
to solve this by being smarter about what's run automatically, but we
aren't there yet. We also don't have full uptake of |mach try fuzzy| and
in any case, people likely to be impacted by this all know try syntax.
So a discussion in those terms seems meaningful.

I think there are some fairly simple rules people can apply to help with
the observed, recurring, problem. These are not official, I'm not in a
position of authority here, but I assume people will correct anything
that's wrong or controversial:

* -p all is generally OK because builds are on cloud machines and we
aren't hardware constrained there. Obviously any unnecessary job,
including builds, does cost money.

* Bare -p all -u all generally isn't OK. In particular it shouldn't be
seen as the default "check before landing" try push. Of course, if you
have a large cross-cutting change that genuinely could affect any test
on any platform, it might be the right choice.

* A combination of selecting specific relevant suites and representative
platforms using -u <suite|all>[platform] is generally a good choice.
|mach try fuzzy| is a better way to schedule this kind of push.

* mach try allows specifying specific paths or directories. This allows
even finer grained test selection where you are interested in specific
tests.

* In general running tests on mac should be avoided if possible. This is
our most hardware constrained regression test platform. People only
using it when they think that their change will affect mac differently
to linux and windows will help a lot.

* If you know your try push failed before all jobs complete, or you land
a patch with jobs still pending, please take a moment to cancel all
pending jobs from treeherder. That is disproportionately helpful for
freeing up resources on backlogged platforms.

* I have no idea about performance tests.

Masayuki Nakano

unread,

Sep 15, 2017, 10:30:51 AM9/15/17

to Kris Maglione, Michael de Boer, Stuart Philp, Kyle Lahnakoski, firefox-ci, Marco Bonardo, Mozilla dev-platform mailing list mailing list, fx-team

I tried to say different point. See the treehearder log, mochitests
didn't run except on Win7 Debug, Android 4.3 API16+ Opt/Debug. So, try
syntax parser or something is really broken. I often meet this kind of bug.

Masayuki Nakano

unread,

Sep 15, 2017, 10:31:28 AM9/15/17

to Kyle Lahnakoski, Michael de Boer, Marco Bonardo, Stuart Philp, Mozilla dev-platform mailing list mailing list, fx-team, firefox-ci

Geoffrey Brown

unread,

Sep 15, 2017, 12:46:46 PM9/15/17

to Masayuki Nakano, Kris Maglione, Michael de Boer, Stuart Philp, Kyle Lahnakoski, firefox-ci, Marco Bonardo, Mozilla dev-platform mailing list mailing list, fx-team

Masayuki, your try push had trouble because you requested
"mochitest-2" instead of "mochitest-e10s-2". Non-e10s mochitests only
run on Android and Windows now. You probably wanted something like:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d68382f17d63f0674c62acc7242a9e406793895f

This is a good example of how a small deviation from "correct" try
syntax can have unexpected and frustrating consequences.

- Geoff

On Thu, Sep 14, 2017 at 7:15 PM, Masayuki Nakano <mna...@mozilla.com> wrote:
> I tried to say different point. See the treehearder log, mochitests didn't
> run except on Win7 Debug, Android 4.3 API16+ Opt/Debug. So, try syntax
> parser or something is really broken. I often meet this kind of bug.
>
>
> On 9/15/2017 10:07 AM, Kris Maglione wrote:
>>
>> Your best bet is probably to use `mach try` with a specific set of test
>> directories. It will generate a set of --try-test-paths flags to restrict
>> tests to those paths, and only run the first chunk of any group. Without
>> that, groups shift around too much to be reliable.
>>
>> On Fri, Sep 15, 2017 at 10:03:00AM +0900, Masayuki Nakano wrote:
>>>

> --
> You received this message because you are subscribed to the Google Groups
> "firefox-ci" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to firefox-ci+...@mozilla.com.
> To post to this group, send email to firef...@mozilla.com.
> To view this discussion on the web visit

> https://groups.google.com/a/mozilla.com/d/msgid/firefox-ci/866a0e06-fbd9-c99b-451e-e20f80a12759%40mozilla.com.

Dan Mosedale

unread,

Sep 15, 2017, 2:27:22 PM9/15/17

to Geoffrey Brown, Kris Maglione, Masayuki Nakano, Michael de Boer, Stuart Philp, Kyle Lahnakoski, firefox-ci, Marco Bonardo, Mozilla dev-platform mailing list mailing list, fx-team

I wonder if this isn't (in large part) a design problem disguised as a
behavior problem. The existing try syntax (even with try chooser) is so
finicky and filled with abbreviations that even after years of working with
it, I still regularly have to look up stuff and sometimes when I've been in
a hurry, I've done something more general than I really needed because it
was just too painful to figure out the exact thing.

I'd be pretty surprised if developers newer to the mozilla infrastructure
than I didn't end up doing this sort of thing substantially more frequently.

https://ahal.ca/blog/2017/mach-try-fuzzy/ seems like a fine step in the
right direction, and maybe that'll be enough.

But I do wonder if the path to saving substantial time and money in the
long run is to invest some real user-research / UX / design time into
designing a try configurator where it requires effort to do the
unnecessarily expensive thing, as opposed to the current situation, where
it requires effort to avoid the expensive thing.

Dan

James Graham

unread,

Sep 15, 2017, 2:27:29 PM9/15/17

to Dan Mosedale, Geoffrey Brown, Kris Maglione, Masayuki Nakano, Michael de Boer, Stuart Philp, Kyle Lahnakoski, firefox-ci, Marco Bonardo, Mozilla dev-platform mailing list mailing list, fx-team

On 15/09/17 18:45, Dan Mosedale wrote:
> I wonder if this isn't (in large part) a design problem disguised as a
> behavior problem. The existing try syntax (even with try chooser) is so
> finicky and filled with abbreviations that even after years of working with
> it, I still regularly have to look up stuff and sometimes when I've been in
> a hurry, I've done something more general than I really needed because it
> was just too painful to figure out the exact thing.
>
> I'd be pretty surprised if developers newer to the mozilla infrastructure
> than I didn't end up doing this sort of thing substantially more frequently.
>
> https://ahal.ca/blog/2017/mach-try-fuzzy/ seems like a fine step in the
> right direction, and maybe that'll be enough.
>
> But I do wonder if the path to saving substantial time and money in the
> long run is to invest some real user-research / UX / design time into
> designing a try configurator where it requires effort to do the
> unnecessarily expensive thing, as opposed to the current situation, where
> it requires effort to avoid the expensive thing.

I think that's a rather uncontroversial opinion. Historically we have
been hampered by the fact that the set of try jobs was basically unknown
and constantly changing, and the code was scattered across many
repositories. Now that taskcluster defines everything in a single place
and the majority of the code is in-tree it will be much easier to
experiment with different frontends that make it easy to select the
right jobs. That's what allowed ahal to write |mach try fuzzy|.

There is also a desire to have better change-based job selection, so
that the default behaviour can be "run the jobs that are most likely to
be affected by the change I just made".

However all of these improvements will take time, and in the meantime
there are problems being caused by too-high backlog, so some changes in
user behaviour will be helpful as we work toward better tools.