Proposal to adjust testing to run on PGO builds only and not test on OPT builds

jmaher

unread,

Jan 3, 2019, 11:17:35 AM1/3/19

to

I would like to propose that we do not run tests on linux64-opt, windows7-opt, and windows10-opt.

Why am I proposing this:
1) All test regressions that were found on trunk are mostly on debug, and in fewer cases on PGO. There are no unique regressions found in the last 6 months (all the data I looked at) that are exclusive to OPT builds.
2) On mozilla-beta, mozilla-release, and ESR, we only build/test PGO builds, we do not run tests on plan OPT builds
3) This will reduce the jobs (about 16%) we run which in turn reduces, cpu time, money spent, turnaround time, intermittents, complexity of the taskgraph.
4) PGO builds are very similar to OPT builds, but we add flags to generate profile data and small adjustments to build scripts behind MOZ_PGO flag in-tree, then we launch the browser, collect data, and repack our binaries for faster performance.
5) We ship PGO builds, not OPT builds

What are the risks associated with this?
1) try server build times will increase as we will be testing on PGO instead of OPT
2) we could miss a regression that only shows up on OPT, but if we only ship PGO and once we leave central we do not build OPT, this is a very low risk.

I would like to hear any concerns you might have on this or other areas which I have overlooked. Assuming there are no risks which block this, I would like to have a decision by January 11th, and make the adjustments on January 28th when Firefox 67 is on trunk.

Tom Ritter

unread,

Jan 3, 2019, 11:26:21 AM1/3/19

to jmaher, Mozilla

Can we set it up so we can manually runs tests on opt builds; but they
aren't by default?

I've had many instances where opt (and pgo) fail; but I can't
reproduce a test failure locally and can only do it on try. Letting me
run that test on the opt build will save the additional pgo build time
(both the cloud-cost time and the developer turn-around time.)

-tom

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

James Graham

unread,

Jan 3, 2019, 11:36:18 AM1/3/19

to dev-pl...@lists.mozilla.org

On 03/01/2019 16:17, jmaher wrote:
> What are the risks associated with this?
> 1) try server build times will increase as we will be testing on PGO instead of OPT
> 2) we could miss a regression that only shows up on OPT, but if we only ship PGO and once we leave central we do not build OPT, this is a very low risk.

Couldn't we leave opt enabled for try and just stop running it on
integration/central branches? That would allow faster/cheaper try but
preserve the benefits you list above without any additional increase in
risk compared to today. I do wonder how that would interact with
artifact builds though; maybe it would be worth running opt *builds*
just not opt *tests* (which I think is your proposal anyway).

Brian Grinstead

unread,

Jan 3, 2019, 11:43:52 AM1/3/19

to jmaher, dev-pl...@lists.mozilla.org

Artifact builds don’t work with PGO, do they? When I do `-p all` on an artifact try push I get busted PGO builds (for example: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f8ead55ca97821c60ef38af4dec01b8bff0fdf3&selectedJob=219655864). What's needed to make it work? Requiring a full build for frontend-only changes would increase the turnaround time and resource savings in (3).

Brian

Andrew Halberstadt

unread,

Jan 3, 2019, 11:44:59 AM1/3/19

to jmaher, dev-platform, Justin Wood

CC Callek

How will this interact with the "shippable builds" project that Callek
posted
about awhile back? My understanding is there's a high probability PGO is
going away. Would it make sense to wait for that to project to wrap up?

-Andrew

Brian Grinstead

unread,

Jan 3, 2019, 11:49:09 AM1/3/19

to jmaher, Mozilla dev-platform mailing list mailing list

Would this apply to talos as well? I’ve wondered before if we should care at all about opt-only talos regressions for platforms where we ship PGO. IME quite a number of talos changes (both improvements and regressions) only show up on one or the other, so dropping one would simplify things.

Brian

> On Jan 3, 2019, at 8:17 AM, jmaher <joel....@gmail.com> wrote:
>

Nicholas Alexander

unread,

Jan 3, 2019, 12:41:47 PM1/3/19

to Brian Grinstead, jmaher, dev-platform

On Thu, Jan 3, 2019 at 8:43 AM Brian Grinstead <bgrin...@mozilla.com>
wrote:

> Artifact builds don’t work with PGO, do they? When I do `-p all` on an
> artifact try push I get busted PGO builds (for example:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f8ead55ca97821c60ef38af4dec01b8bff0fdf3&selectedJob=219655864).
> What's needed to make it work? Requiring a full build for frontend-only
> changes would increase the turnaround time and resource savings in (3).
>

I can partly address this. There are two things at play (at least):

1) automation builds need a special configuration piece in place to
properly support artifact builds. Almost certainly that's not in place for
PGO builds, since it's such an unusual thing to do: "you want to pack PGO
binaries into a development build... why?" But there's really no reason we
can't do that in automation so I've filed
https://bugzilla.mozilla.org/show_bug.cgi?id=15175323 for these things.
It's not high priority but we might as well capture the request; in
general, we always want try pushes to succeed with sensible results if we
can arrange it.

2) locally, we need to teach the artifact code to sniff whatever mozconfig
options say "I'm doing PGO" and fetch the right binaries based on that. I
think that enabling PGO locally is a little delicate, and I know that
chmanchester (and others?) is working hard to make this more robust, so
perhaps this is easy or becomes easy soon. I've filed
https://bugzilla.mozilla.org/show_bug.cgi?id=1517532 to track this.

If I'm wrong about the feasibility of these things, please update the
tickets!

Best,
Nick

Jonathan Kew

unread,

Jan 3, 2019, 12:51:57 PM1/3/19

to dev-pl...@lists.mozilla.org

On 03/01/2019 16:17, jmaher wrote:

> I would like to propose that we do not run tests on linux64-opt, windows7-opt, and windows10-opt.
>
> Why am I proposing this:
> 1) All test regressions that were found on trunk are mostly on debug, and in fewer cases on PGO. There are no unique regressions found in the last 6 months (all the data I looked at) that are exclusive to OPT builds.
> 2) On mozilla-beta, mozilla-release, and ESR, we only build/test PGO builds, we do not run tests on plan OPT builds
> 3) This will reduce the jobs (about 16%) we run which in turn reduces, cpu time, money spent, turnaround time, intermittents, complexity of the taskgraph.
> 4) PGO builds are very similar to OPT builds, but we add flags to generate profile data and small adjustments to build scripts behind MOZ_PGO flag in-tree, then we launch the browser, collect data, and repack our binaries for faster performance.
> 5) We ship PGO builds, not OPT builds
>
> What are the risks associated with this?
> 1) try server build times will increase as we will be testing on PGO instead of OPT
> 2) we could miss a regression that only shows up on OPT, but if we only ship PGO and once we leave central we do not build OPT, this is a very low risk.

It's not just tryserver build times. Presumably this will also tend to
increase the time between a patch landing on inbound or autoland and any
resulting test failures showing up.

This seems like a negative in that it means more patches are likely to
have landed on top of the regressing one in the meantime, potentially
complicating backouts, and the original developer may be less likely to
still be around for a quick investigation/fix.

How long does it typically take for full PGO test results to be
available for a push? How does that compare to full Opt test results?
ISTM that if the increase is quite marginal, this may be OK, but if the
latency becomes substantially greater, there will be a continual cost in
increased developer and/or sheriff pain.

JK

Justin Wood

unread,

Jan 3, 2019, 1:07:21 PM1/3/19

to Andrew Halberstadt, jmaher, dev-platform

I should say that the shippable build proposal (
https://groups.google.com/d/msg/mozilla.dev.planning/JomJmzGOGMY/vytPViZBDgAJ)
doesn't seem to intersect negatively with this.

And in fact I think these two proposals compliment each other quite nicely.

Additionally I have no concerns over this work taking place prior to my
work being complete.

on the specific proposal front I can envision us allowing tests to be run
on non-pgo builds via triggers (so never by default, but always
backfillable/selectable) should someone need to try and bisect an issue
that is discovered... I'm not sure if the code maintenance burden is worth
it for the benefit but I don't hold a strong opinion there.

~Justin Wood (Callek)

On Thu, Jan 3, 2019 at 11:44 AM Andrew Halberstadt <ah...@mozilla.com> wrote:

> CC Callek
>
> How will this interact with the "shippable builds" project that Callek
> posted
> about awhile back? My understanding is there's a high probability PGO is
> going away. Would it make sense to wait for that to project to wrap up?
>
> -Andrew
>
> On Thu, Jan 3, 2019 at 11:20 AM jmaher <joel....@gmail.com> wrote:
>

Steve Fink

unread,

Jan 3, 2019, 1:16:26 PM1/3/19

to Jonathan Kew, dev-pl...@lists.mozilla.org

On 01/03/2019 09:51 AM, Jonathan Kew wrote:
> On 03/01/2019 16:17, jmaher wrote:
>>

>> What are the risks associated with this?
>> 1) try server build times will increase as we will be testing on PGO
>> instead of OPT
>> 2) we could miss a regression that only shows up on OPT, but if we
>> only ship PGO and once we leave central we do not build OPT, this is
>> a very low risk.
>

> It's not just tryserver build times. Presumably this will also tend to
> increase the time between a patch landing on inbound or autoland and
> any resulting test failures showing up.
>
> This seems like a negative in that it means more patches are likely to
> have landed on top of the regressing one in the meantime, potentially
> complicating backouts, and the original developer may be less likely
> to still be around for a quick investigation/fix.
>
> How long does it typically take for full PGO test results to be
> available for a push? How does that compare to full Opt test results?
> ISTM that if the increase is quite marginal, this may be OK, but if
> the latency becomes substantially greater, there will be a continual
> cost in increased developer and/or sheriff pain.

Good points, but given that most failures will show up debug builds, it
seems like a more relevant metric is the difference between time(Opt) vs
min(time(debug), time(PGO)). Though debug builds may run slow enough
that it boils down to what you said?

Looking at Windows 64-bit jobs from a random push (
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=63027ff03effb04ed4bf53bbb0c9aa1bad4b4c9b
), I see:

pgo: build=119min + Wd1=15min
opt: build=55min + Wd1=13min
debug: build=46min + Wd1=22min

So by that, you get opt and debug Wd1 results back at the same time
(67-68min) and pgo Wd1 results take twice as long (134min). I imagine
there are much slower test jobs that make this situation cloudier, but
assuming the general pictures holds then it seems like opt is mostly
redundant with debug.

The majority of your currently opt-triggered backouts will still happen,
just using debug results now. This is assuming debug normally catches a
superset of the problems that opt would, which is asserted in #1 of
jmaher's post.

+1 from me for killing off opt tests.

Steve Fink

unread,

Jan 3, 2019, 1:22:26 PM1/3/19

to Justin Wood, Andrew Halberstadt, dev-platform, jmaher

On 01/03/2019 10:07 AM, Justin Wood wrote:
> on the specific proposal front I can envision us allowing tests to be run
> on non-pgo builds via triggers (so never by default, but always
> backfillable/selectable) should someone need to try and bisect an issue
> that is discovered... I'm not sure if the code maintenance burden is worth
> it for the benefit but I don't hold a strong opinion there.

Is it a lot of maintenance? We have this for some other jobs
(linux64-shell-haz is the one I'm most familiar with, but it's a
standalone job so doesn't have non-toolchain graph dependencies). I get
quite a bit of value out of the resulting faster hack-try-debug cycles;
I would imagine it to be at least as useful to have a turnaround time of
1 hour for opt vs 2 hours for pgo.

James Graham

unread,

Jan 3, 2019, 1:28:27 PM1/3/19

to dev-pl...@lists.mozilla.org

On 03/01/2019 18:16, Steve Fink wrote:

> Good points, but given that most failures will show up debug builds, it
> seems like a more relevant metric is the difference between time(Opt) vs
> min(time(debug), time(PGO)). Though debug builds may run slow enough
> that it boils down to what you said?
>
> Looking at Windows 64-bit jobs from a random push (
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=63027ff03effb04ed4bf53bbb0c9aa1bad4b4c9b
> ), I see:
>
> pgo: build=119min + Wd1=15min
> opt: build=55min + Wd1=13min
> debug: build=46min + Wd1=22min
>
> So by that, you get opt and debug Wd1 results back at the same time
> (67-68min) and pgo Wd1 results take twice as long (134min). I imagine
> there are much slower test jobs that make this situation cloudier, but
> assuming the general pictures holds then it seems like opt is mostly
> redundant with debug.

I think a good rule of thumb is that debug tests are about twice as slow
as opt, with the same chunking. So for a test job taking closer to an
hour on opt (which some do), you can easily be at 45 minutes longer for
opt results than debug. We could of course chunk more, but there's
overhead there that would eat some of the regained capacity.

I wonder if an alternative would be running opt+debug on integration
branches and pgo+debug on central. That would have the obvious
disadvantage that pgo-only failures would be caught much later, but it
would keep current end-to-end times for integration and slightly better
capacity savings. I don't know how common pgo-only failures are compared
to other things that we are only catching on central.

Justin Wood

unread,

Jan 3, 2019, 1:37:10 PM1/3/19

to Steve Fink, Andrew Halberstadt, dev-platform, jmaher

I don't think its much burden, but when we have code complexity it can add
up with a matter of "how useful is this really.." Even if maintenance
burden is low it is still a tradeoff. I'm just saying I suspect its
possible to do this, but not sure if it is useful in the end (and I'm not
looking to make the call on that)

~Justin Wood (Callek)

Jan de Mooij

unread,

Jan 3, 2019, 1:39:55 PM1/3/19

to Steve Fink, Justin Wood, Andrew Halberstadt, jmaher, dev-platform

On Thu, Jan 3, 2019 at 7:22 PM Steve Fink <sf...@mozilla.com> wrote:

> I get
> quite a bit of value out of the resulting faster hack-try-debug cycles;
> I would imagine it to be at least as useful to have a turnaround time of
> 1 hour for opt vs 2 hours for pgo.
>

+1. The past week I've been Try-debugging (1) an intermittent Talos orange
(affected only Win64 opt and pgo, bug 1516679) and (2) an intermittent dt8
orange (affected only Win32 opt and pgo, bug 1516967). This was a pretty
annoying process, but pgo builds would have made this much worse. I'd
really appreciate it if we considered keeping "opt" as an optional
configuration for these use cases - it will save some people a lot of time.

Thanks,
Jan

Chris AtLee

unread,

Jan 3, 2019, 4:46:54 PM1/3/19

to jmaher, dev-platform

Thank you Joel for writing up this proposal!

Are you also proposing that we stop the linux64-opt and win64-opt builds as
well, except for leaving them as an available option on try? If we're not
testing them on integration or release branches, there doesn't seem to be
much purpose in doing the builds.

Henrik Skupin

unread,

Jan 4, 2019, 3:37:50 AM1/4/19

to

Nicholas Alexander wrote on 03.01.19 18:41:

> 1) automation builds need a special configuration piece in place to
> properly support artifact builds. Almost certainly that's not in place for
> PGO builds, since it's such an unusual thing to do: "you want to pack PGO
> binaries into a development build... why?" But there's really no reason we
> can't do that in automation so I've filed
> https://bugzilla.mozilla.org/show_bug.cgi?id=15175323 for these things.

This is actually: https://bugzilla.mozilla.org/show_bug.cgi?id=1517533

Thanks for filing those bugs.

--
Henrik Skupin
Senior Software Engineer
Mozilla Corporation

Nicholas Alexander

unread,

Jan 4, 2019, 11:57:15 AM1/4/19

to Chris AtLee, jmaher, dev-platform

On Thu, Jan 3, 2019 at 1:47 PM Chris AtLee <cat...@mozilla.com> wrote:

> Thank you Joel for writing up this proposal!
>
> Are you also proposing that we stop the linux64-opt and win64-opt builds as
> well, except for leaving them as an available option on try? If we're not
> testing them on integration or release branches, there doesn't seem to be
> much purpose in doing the builds.
>

One reason we might not want to stop producing opt builds: we produce
artifact builds against opt (and debug, with --enable-debug in the local
mozconfig). It'll be very odd to have --enable-artifact-build and
_require_ --enable-pgo or whatever it is in the local mozconfig.

I expect that these opt build platforms will be relatively inexpensive to
preserve, because step one (IIUC) of pgo is to build the same source files
as the opt builds. So with luck we get sccache hits between the jobs.
Perhaps somebody with more knowledge of pgo and sccache can confirm or
refute that assertion?

Nick

Nathan Froyd

unread,

Jan 4, 2019, 12:04:32 PM1/4/19

to Nicholas Alexander, dev-platform

On Fri, Jan 4, 2019 at 11:57 AM Nicholas Alexander
<nalex...@mozilla.com> wrote:
> One reason we might not want to stop producing opt builds: we produce
> artifact builds against opt (and debug, with --enable-debug in the local
> mozconfig). It'll be very odd to have --enable-artifact-build and
> _require_ --enable-pgo or whatever it is in the local mozconfig.

This seems reasonable. (I'm in agreement with the people upthread
that think we should have opt testing, but regardless of that
particular outcome, not requiring people to put goo in their
mozconfigs seems like a noble goal.)

> I expect that these opt build platforms will be relatively inexpensive to
> preserve, because step one (IIUC) of pgo is to build the same source files
> as the opt builds. So with luck we get sccache hits between the jobs.
> Perhaps somebody with more knowledge of pgo and sccache can confirm or
> refute that assertion?

PGO uses different compilation flags than a normal opt build in both
the profiling and the profile use phases (for instrumentation, etc.),
so I'd assume that opt builds and PGO builds would not share compiled
objects.

-Nathan

jmaher

unread,

Jan 4, 2019, 3:24:08 PM1/4/19

to

thanks everyone for your comments on this. It sounds like from a practical standpoint until we can get the runtimes of PGO builds on try and in integration to be less than debug build times this is not a desirable change.

A few common responses:
* artifact opt builds on try are fast for quick iterations, a must have
* can we do artifact builds for PGO? (thanks :nalexander for bug 1517533 and bug 1517532)
* what about talos? we need to investigate this more, I have always argued against pgo only for talos, but maybe we can revisit that (bug 1514829)
* do we turn off builds as well? I had proposed just the tests, if we decide to turn off talos it would make sense to turn off builds.

Thanks all for the quick feedback, when the bugs in this thread are further along, or if I see another simpler solution for reducing the duplication, I will follow up.

Aki Sasaki

unread,

Jan 7, 2019, 11:20:12 AM1/7/19

to Andrew Halberstadt, jmaher, Justin Wood, dev-platform

+1.

The goal of shippable builds is twofold:

1. to make sure opt builds+tests, or similar (artifact builds?) answer the
question "is my commit good?" as fast as possible, and
2. to make sure shippable builds+tests answer the question "are these
binaries correct and ready to ship, if we decide to ship this revision?"

I agree that we should run a full suite of tests against shippable builds,
which probably includes things like performance testing.
We still need some class of builds+tests that answer the question "is my
commit good?" quickly. If debug builds are sufficient for the most part,
and opt builds+tests on try fill in the gaps, then yes. (That appears to be
what this thread is largely about.) If not, I could see us having at least
some subset of tests running against opt or artifact builds.

If we switch talos to PGO now, we'll probably switch them to shippable
builds at some point in the near future.

On Thu, Jan 3, 2019 at 8:45 AM Andrew Halberstadt <ah...@mozilla.com> wrote:

> CC Callek
>
> How will this interact with the "shippable builds" project that Callek
> posted
> about awhile back? My understanding is there's a high probability PGO is
> going away. Would it make sense to wait for that to project to wrap up?
>
> -Andrew
>
> On Thu, Jan 3, 2019 at 11:20 AM jmaher <joel....@gmail.com> wrote:
>

Randell Jesup

unread,

Jan 7, 2019, 4:39:16 PM1/7/19

to

>* do we turn off builds as well? I had proposed just the tests, if we decide to turn off talos it would make sense to turn off builds.

Would turning off opt builds cause problems if you want to mozregression
an opt build? And would this be an issue? (obviously it might be for
opt-only failures, or trying to verify if a regression identified in
mozregression for PGO was a PGO bug or now, though that could be checked
at the cost of a build or 4 even if we don't build opt, probably).

--
Randell Jesup, Mozilla Corp
remove "news" for personal email

Chris M.

unread,

Jan 9, 2019, 2:36:26 AM1/9/19

to jmaher, dev-platform

Earlier today I landed a fix for bug 1517532 that will mean that an
artifact build with MOZ_PGO set will pull artifacts from an automation pgo
build. As a result artifact pgo builds as trigger by a "-p all
--artifact..." will succeed now as well (and consume pgo'd artifacts).

If we end up wanting to turn off opt builds in automation after all we may
be able to pull artifacts from pgo builds for local artifact builds by
default. The behavior of the compiled code shouldn't be different -- this
probably wouldn't matter to people developing front end code locally.

Chris

jmaher

unread,

Jan 17, 2019, 11:42:39 AM1/17/19

to

Following up on this, thanks to Chris we have fast artifact builds for PGO, so the time to develop and use try server is in parity with current opt solutions for many cases (front end development, most bisection cases).

I have also looked in depth at what the impact on the integration branches would be. In the data set from July-December (H2 2018) there were 11 instances of tests that we originally only scheduled in the OPT config and we didn't have PGO or Debug test jobs to point out the regression (this is due to scheduling choices). Worse case scenario is finding the regression on PGO up to 1 hour later 11 times or roughly 2x/month. Backfilling to find the offending patch as we do now 24% of the time would be similar time. In fact running the OPT jobs on Debug instead would result in same time for all 11 instances (due to more chunks on debug and similar runtimes). In short, little to no impact.

Lastly there was a pending question about talos. There is an edge case where we can see a regression on talos that is PGO, but it is unrelated to the code and just a side effect of how PGO works. I looked into that in https://bugzilla.mozilla.org/show_bug.cgi?id=1514829. I found that if we didn't get opt alerts that we would not have missed any regressions. Furthermore, for the regressions, for the ones that were pgo only regressions (very rare) there were many other regressions at the same time (say a build change, or test change, etc.) and usually these were accepted changes, backed out, or investigated on a different test or platform. In the past when we have determined a regression is a PGO artifact we have resolved it as WONTFIX and moved on.

Given this summary, I feel that most concerns around removing testing for OPT are addressed. I would also like to extend the proposal to remove the OPT builds since no unit or perf tests would run on there.

As my original timeline is not realistic, I would like to see if there are comments until next Wednesday- January 23rd, then I can follow up on remaining issues or work towards ensuring we start the process of making this happen and what the right timeline is.

James Graham

unread,

Jan 17, 2019, 12:04:21 PM1/17/19

to dev-pl...@lists.mozilla.org

On 17/01/2019 16:42, jmaher wrote:
> Following up on this, thanks to Chris we have fast artifact builds for PGO, so the time to develop and use try server is in parity with current opt solutions for many cases (front end development, most bisection cases).

Even as someone not making frequent changes to compiled code I
occasionally want to both rebuild and run tests on opt (e.g. because
some test changes also require changes to moz.build files that could
break the build in a way that isn't caught by an artifact build). In
this case adding an extra hour of end-to-end time on try is a pretty
serious regression.

For my specific use case it might be enough if we could schedule
artifact builds for PGO and full builds for debug. But I suspect it's
going to work better for more people — and save more resources overall —
to simply keep the default try configuration as-is and just turn off
non-PGO opt builds (or at least tests) on integration branches / central.

Jan de Mooij

unread,

Jan 17, 2019, 12:52:31 PM1/17/19

to jmaher, dev-platform

Hi Joel,

Can you say more about this point in your original email: "3) This will

reduce the jobs (about 16%) we run which in turn reduces, cpu time, money

spent, turnaround time, intermittents, complexity of the taskgraph." It
seems to me that if we remove non-PGO opt builds even on Try, we might use
more cpu time because there are so many Try pushes requesting opt builds.
Do we have data on this?

Thanks,
Jan

Joel Maher

unread,

Jan 18, 2019, 4:36:24 PM1/18/19

to Jan de Mooij, dev-platform

Thanks for asking Jan. I think 16% is the maximum we can save. In talking
with a few more people, I think a middle of the road proposal would be to:
Turn off linux64/windows7/windows10 opt builds+tests on autoland and
mozilla-inbound. Leave them on for mozilla-central and try.

What this does is allows for try to be faster as needed, continue to offer
peace of mind by running the tests on m-c (and sheriffs can backfill if
needed), and removes confusion about building/testing locally vs try. This
would be similar to what we already see where many people only test opt on
try and land and if a pgo test regresses we would need to backout.

Are there any concerns with this latest proposal?

On Thu, Jan 17, 2019 at 12:52 PM Jan de Mooij <jdem...@mozilla.com> wrote:

> Hi Joel,
>

> Can you say more about this point in your original email: "3) This will

> reduce the jobs (about 16%) we run which in turn reduces, cpu time, money

> spent, turnaround time, intermittents, complexity of the taskgraph." It
> seems to me that if we remove non-PGO opt builds even on Try, we might use
> more cpu time because there are so many Try pushes requesting opt builds.
> Do we have data on this?
>
> Thanks,
> Jan
>
> On Thu, Jan 17, 2019 at 5:45 PM jmaher <joel....@gmail.com> wrote:
>

Jan de Mooij

unread,

Jan 21, 2019, 5:18:36 AM1/21/19

to Joel Maher, dev-platform

On Fri, Jan 18, 2019 at 10:36 PM Joel Maher <joel....@gmail.com> wrote:

> Are there any concerns with this latest proposal?
>

This proposal sounds great to me. Thank you!

Jan

On Thu, Jan 17, 2019 at 12:52 PM Jan de Mooij <jdem...@mozilla.com> wrote:
>
>> Hi Joel,
>>
>> Can you say more about this point in your original email: "3) This will
>> reduce the jobs (about 16%) we run which in turn reduces, cpu time, money
>> spent, turnaround time, intermittents, complexity of the taskgraph." It
>> seems to me that if we remove non-PGO opt builds even on Try, we might use
>> more cpu time because there are so many Try pushes requesting opt builds.
>> Do we have data on this?
>>
>> Thanks,
>> Jan
>>
>> On Thu, Jan 17, 2019 at 5:45 PM jmaher <joel....@gmail.com> wrote:
>>

James Graham

unread,

Jan 21, 2019, 5:38:46 AM1/21/19

to dev-pl...@lists.mozilla.org

On 21/01/2019 10:18, Jan de Mooij wrote:
> On Fri, Jan 18, 2019 at 10:36 PM Joel Maher <joel....@gmail.com> wrote:
>
>> Are there any concerns with this latest proposal?
>>
>
> This proposal sounds great to me. Thank you!

+1. This seems like the right first step to me.

Eric Rahm

unread,

Feb 25, 2019, 12:35:58 PM2/25/19

to dev-platform

Just a heads up, it looks like this landed sometime last week for platforms
that support PGO.

This has an unintended consequence of making it look like perf data for
integration branches went awol, but in fact you need to switch from looking
at "opt" data to "pgo" data. Unfortunately since we didn't used to run PGO
on integration branches you'll also need to include opt data in your view
for continuity. For example I've resorted to using an opt/pgo mashup to
track memory regressions [1]. This seems to work okay for memory, but I can
imagine it won't be a fair comparison for talos tests.

-e

[1]
https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1888424,1,4&series=mozilla-inbound,1890241,1,4&series=autoland,1885459,1,4&series=mozilla-central,1684866,1,4&series=mozilla-inbound,1684808,1,4&series=autoland,1684872,1,4