Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The future of PGO on Windows

468 views
Skip to first unread message

Ehsan Akhgari

unread,
Jan 30, 2013, 11:03:27 PM1/30/13
to dev-pl...@lists.mozilla.org, dev-planning@lists.mozilla.org planning, Andreas Gal, Vladimir Vukicevic, David Mandelin, Johnathan Nightingale
(Follow-ups to dev-platform, please)

Dear all,

This email summarizes the results of our investigation on our options
with regard to the future of PGO optimizations on Windows. I will first
describe the work that happened as part of the investigation, and will
then propose a set of options on what solutions are available to us. If
you're interested in the tl;dr version, please scroll to the bottom.
For the details, see the dependencies of bug 833881.

(Note that we're only talking about PGO for libxul. Anything outside of
libxul, specifically the JS engine, is not going to be affected by the
decision coming out of this thread. And obviously, this discussion is
only about Windows.)

The first thing that we tried to investigate was whether or not
upgrading to Visual Studio 2012 Update 1 makes the memory usage of the
PGO linker drop down by a significant amount. Thanks to the
investigation done by jimm, we know that it will actually increase the
memory usage, and therefore is not an option.

Then, we tried to see how much breathing room we're going to have if we
disabled PGO but not link-time code generation (LTCG), and if we disable
them both together. It turns out that disabling PGO but keeping LTCG
enabled reduces the memory usage by ~200MB, which means that it's not an
effective measure. Disabling both LTCG and PGO brings down the linker's
virtual memory usage to around 1GB, which means that we will not hit the
maximum virtual memory size of 4GB for a *long* time. (Unfortunately,
the Microsoft toolchain cannot perform PGO builds without LTCG.)
Therefore, for the rest of this email, I will talk about disabling both
PGO and LTCG.

We then tried to get a sense of how much of a win the PGO optimizations
are. Thanks to a series of measurements by dmandelin, we know that
disabling PGO/LTCG will result in a regression of about 10-20% on
benchmarks which examine DOM and layout performance such as Dromaeo and
guimark2 (and 40% in one case), but no significant regressions in the
startup time, and gmail interactions. Thanks to a series of telemetry
measurements performed by Vladan on a Nightly build we did last week
which had PGO/LTCG disabled, there are no telemetry probes which show a
significant regression on builds without PGO/LTCG. Vladan is going to
try to get this data out of a Tp5 run tomorrow as well, but we don't
have any evidence to believe that the results of that experiments will
be any different.


Given the above, I'd like to propose the following long-term solutions:

1. Disable PGO/LTCG now. The downsides are that we should take a hit in
microbenchmarks, specifically Dromaeo. But we have no reason to believe
that is going to affect any of the performance characteristics observed
by our users. And it means that engineers can stop worrying about this
problem once and for all.

2. Try to delay disabling PGO/LTCG as much as possible. Given the
tracking implemented in bug 710840, we can now watch those graphs so
that we know when this problem is going to hit next, and come up with a
mitigation strategy. In order to effectively implement this solution,
we're going to need:
* A person to own watching the graphs and report back when we step
inside "the danger zone" again.
* A detailed plan of action on what we'll do to mitigate this problem
the next time as opposed to acting on a firedrill. One possible plan of
action could be disabling PGO for everything except
content/dom/layout/xpcom/gfx, no questions asked.
* A group of engineers to own performing the above action.
* Going back through the historical data over the past year,
determine the causes behind the large spikes in the gradual memory usage
increase, and find solutions to them to buy as much time as possible.

3. Try to delay disabling PGO/LTCG until the next time that we hit the
limit, and disable PGO/LTCG then once and for all. In order to
implement this solution, we're going to need:
* A person to own watching the graphs and report back when we step
inside the danger zone again.
* A build-system patch which makes it possible to disable PGO/LTCG
for libxul by toggling a switch.
* Clear documentation on what that switch is, so that anybody can
toggle it when we need to take action the next time.


I think given the information that we currently have, the best course of
action is #3, followed by #1 and #2. I'd like to explicitly recommend
against #2, because I don't think we have the evidence to support that
spending that much effort will bring any noticeable gains to our users.
This effort is better spent elsewhere.


Please let me know if you have any questions, if I have missed anything,
and do provide your feedback on the above proposal. As we ultimately
need a decision to come out of this thread, and given that it affects
Firefox Desktop, I have asked johnath to be the person who makes the
final call, or to delegate that to someone whom he trusts.

Last but not least, hats off to everyone who helped during this
investigation!

Cheers,
Ehsan

Robert O'Callahan

unread,
Jan 30, 2013, 11:11:51 PM1/30/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
What about leaving PGO/LTCG enabled for a subset of our modules? Is that
not a possible solution?

Rob
--
Jesus called them together and said, “You know that the rulers of the
Gentiles lord it over them, and their high officials exercise authority
over them. Not so with you. Instead, whoever wants to become great among
you must be your servant, and whoever wants to be first must be your
slave — just
as the Son of Man did not come to be served, but to serve, and to give his
life as a ransom for many.” [Matthew 20:25-28]

Ehsan Akhgari

unread,
Jan 30, 2013, 11:34:12 PM1/30/13
to rob...@ocallahan.org, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2013-01-30 11:11 PM, Robert O'Callahan wrote:
> What about leaving PGO/LTCG enabled for a subset of our modules? Is that
> not a possible solution?

I did in fact measure that by disabling PGO/LTCG on all directories
except content, dom, layout and xpcom. I can't seem to find the try
push right now (and bug 836626 doesn't help), but IIRC that bought us
600MB-700MB. To put that number in context, we have increased the
linker memory usage by about 1GB over the past 10 months. Therefore,
this will fall under option #2.

Cheers,
Ehsan

Robert O'Callahan

unread,
Jan 30, 2013, 11:40:38 PM1/30/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
Can we do it at finer granularity than that? There's a lot under content
and dom that aren't critical.

Also, reducing the number of directories that are PGO/LTCG should mean that
the rate of growth decreases proportionally. Even more than proportionally,
if we flip our default for entirely new modules to be non-PGO/LTCG, as I
assume we would.

Ehsan Akhgari

unread,
Jan 30, 2013, 11:49:01 PM1/30/13
to rob...@ocallahan.org, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2013-01-30 11:40 PM, Robert O'Callahan wrote:
> On Thu, Jan 31, 2013 at 5:34 PM, Ehsan Akhgari <ehsan....@gmail.com
> <mailto:ehsan....@gmail.com>> wrote:
>
> On 2013-01-30 11:11 PM, Robert O'Callahan wrote:
>
> What about leaving PGO/LTCG enabled for a subset of our modules?
> Is that
> not a possible solution?
>
>
> I did in fact measure that by disabling PGO/LTCG on all directories
> except content, dom, layout and xpcom. I can't seem to find the try
> push right now (and bug 836626 doesn't help), but IIRC that bought
> us 600MB-700MB. To put that number in context, we have increased
> the linker memory usage by about 1GB over the past 10 months.
> Therefore, this will fall under option #2.
>
>
> Can we do it at finer granularity than that? There's a lot under content
> and dom that aren't critical.

Sure, that is definitely an option. Note that I have already turned off
PGO/LTCG for things like mathml and svg under content, dom and layout
(see the dependencies of bug 832992).

> Also, reducing the number of directories that are PGO/LTCG should mean
> that the rate of growth decreases proportionally. Even more than
> proportionally, if we flip our default for entirely new modules to be
> non-PGO/LTCG, as I assume we would.

The decrease is unfortunately not linear, as it seems like the big
memory eater is LTCG, and unfortunately we cannot opt out of that if we
want to do any PGO.

Cheers,
Ehsan

Mike Hommey

unread,
Jan 31, 2013, 2:32:56 AM1/31/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org, rob...@ocallahan.org
On Wed, Jan 30, 2013 at 11:49:01PM -0500, Ehsan Akhgari wrote:
> The decrease is unfortunately not linear, as it seems like the big
> memory eater is LTCG, and unfortunately we cannot opt out of that if
> we want to do any PGO.

Well, LTCG is only going to compile objects that have been compiled
with -GL.

Mike

Nicholas Nethercote

unread,
Jan 31, 2013, 3:37:41 AM1/31/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On Thu, Jan 31, 2013 at 3:03 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> Given the above, I'd like to propose the following long-term solutions:
>
> 1. Disable PGO/LTCG now.
>
> 2. Try to delay disabling PGO/LTCG as much as possible.
>
> 3. Try to delay disabling PGO/LTCG until the next time that we hit the
> limit, and disable PGO/LTCG then once and for all.

In the long run, 1 and 3 are the same. If we know we're going to turn
it off, why not bite the bullet and do it now? One big advantage of
that is that we'd immediately stop suffering through PGO-only bugs.
(I'm not necessarily advocating this, BTW, just observing that the two
options are basically equivalent.)

Also, stupid question time: is it possible to build on Windows with
GCC and/or clang?

Nick

smaug

unread,
Jan 31, 2013, 4:43:02 AM1/31/13
to Nicholas Nethercote, Ehsan Akhgari, Vladimir Vukicevic, Johnathan Nightingale, David Mandelin, Andreas Gal
On 01/31/2013 10:37 AM, Nicholas Nethercote wrote:
> On Thu, Jan 31, 2013 at 3:03 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> Given the above, I'd like to propose the following long-term solutions:
>>
>> 1. Disable PGO/LTCG now.
>>
>> 2. Try to delay disabling PGO/LTCG as much as possible.
>>
>> 3. Try to delay disabling PGO/LTCG until the next time that we hit the
>> limit, and disable PGO/LTCG then once and for all.
>
> In the long run, 1 and 3 are the same. If we know we're going to turn
> it off, why not bite the bullet and do it now?


Because we're still missing plenty of optimizations in our code
to be fast in microbenchmarks. It would be quite huge pr loss if we suddenly
were 10-20% slower in benchmarks.
But we're getting better (that last spike is because bz managed to effectively optimize out one test).
http://graphs.mozilla.org/graph.html#tests=[[73,1,1]]&sel=none&displayrange=365&datatype=running

Has anyone run other than dromaeo? Peacekeeper perhaps?


-Olli

Neil

unread,
Jan 31, 2013, 5:38:45 AM1/31/13
to
smaug wrote:

> On 01/31/2013 10:37 AM, Nicholas Nethercote wrote:
>
>> If we know we're going to turn it off, why not bite the bullet and do
>> it now?
>
> Because we're still missing plenty of optimizations in our code to be
> fast in microbenchmarks.

Do we know (e.g. via profiling) where these optimisations need to be? (I
don't know how feasible it would be but another approach would be to PGO
less and less code until it starts affecting the microbenchmarks.)

--
Warning: May contain traces of nuts.

jmat...@mozilla.com

unread,
Jan 31, 2013, 6:39:53 AM1/31/13
to dev-pl...@lists.mozilla.org, dev-planning@lists.mozilla.org planning, Andreas Gal, Vladimir Vukicevic, David Mandelin, Johnathan Nightingale
> We then tried to get a sense of how much of a win the PGO optimizations
> are. Thanks to a series of measurements by dmandelin, we know that
> disabling PGO/LTCG will result in a regression of about 10-20% on
> benchmarks which examine DOM and layout performance such as Dromaeo and
> guimark2 (and 40% in one case), but no significant regressions in the
> startup time, and gmail interactions. Thanks to a series of telemetry
> measurements performed by Vladan on a Nightly build we did last week
> which had PGO/LTCG disabled, there are no telemetry probes which show a
> significant regression on builds without PGO/LTCG. Vladan is going to
> try to get this data out of a Tp5 run tomorrow as well, but we don't
> have any evidence to believe that the results of that experiments will
> be any different.


Are the test run stats we're using here published somewhere? We should be tracking all this testing some place (a wiki page maybe?) so people can do their own investigation.

I've always wondered if the tests we run during the pgo phase are sufficient to get good coverage over the entire app. Is it possible that we don't see gains in other areas because our pgo tests don't hit those areas? (I think there was an effort under way to expose these tests so they could be modified in try runs for better experimentation.)

Generally I would make disabling pgo the last option after exhausting all other options.

jmat...@mozilla.com

unread,
Jan 31, 2013, 6:39:53 AM1/31/13
to mozilla.de...@googlegroups.com, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
> We then tried to get a sense of how much of a win the PGO optimizations
> are. Thanks to a series of measurements by dmandelin, we know that
> disabling PGO/LTCG will result in a regression of about 10-20% on
> benchmarks which examine DOM and layout performance such as Dromaeo and
> guimark2 (and 40% in one case), but no significant regressions in the
> startup time, and gmail interactions. Thanks to a series of telemetry
> measurements performed by Vladan on a Nightly build we did last week
> which had PGO/LTCG disabled, there are no telemetry probes which show a
> significant regression on builds without PGO/LTCG. Vladan is going to
> try to get this data out of a Tp5 run tomorrow as well, but we don't
> have any evidence to believe that the results of that experiments will
> be any different.


Ted Mielczarek

unread,
Jan 31, 2013, 8:08:43 AM1/31/13
to dev-pl...@lists.mozilla.org
As a historical note, when we first enabled PGO support for Windows our
profiling scenario was "start Firefox, wait 10 seconds, shut down
Firefox". Enabling PGO with this profiling run provided us with 20-25%
perf improvements in many of our benchmarks on Talos. We later changed
it to the current set of profiling data[1] (Blueprint CSS samples, the
SunSpider benchmark), and there was almost no visible change in the
Talos numbers. I'm sure for very specific benchmarks we could improve
perf by adding those things to the profiling run, but it's a very
delicate art. The optimizer has to balance code size against speed, and
it's not always obvious which way makes things better. (For example, we
historically built with -Os + a few extra optimize flags on Linux
instead of -O2 because producing smaller code generally made us faster
than optimizing everything for speed.)

-Ted

1. https://bugzilla.mozilla.org/show_bug.cgi?id=472706

Joshua Cranmer

unread,
Jan 31, 2013, 8:21:06 AM1/31/13
to
On 1/31/2013 2:37 AM, Nicholas Nethercote wrote:
> Also, stupid question time: is it possible to build on Windows with
> GCC and/or clang?

It's definitely possible to build with Mingw GCC, but that is a major
ABI-breaking change, and I think we lose the ability to compile against
any Microsoft IDL interfaces. Clang has producing a MSVC-compatible ABI
as a long-term goal, but it is not there yet, and I think it may still
have problems parsing <windows.h>.

papa...@gmail.com

unread,
Jan 31, 2013, 8:22:38 AM1/31/13
to dev-pl...@lists.mozilla.org, dev-planning@lists.mozilla.org planning, David Mandelin
How separate the analysis phase from the optimization based on the collected data? How are the results of the PGO runs stored? Can the optimization part be run independently? If yes would it be possible to collect the data through other means, let's say by doing a x86-64 build or only statically building modules?

Gregory Szorc

unread,
Jan 31, 2013, 9:14:01 AM1/31/13
to Ehsan Akhgari, Andreas Gal, Vladimir Vukicevic, dev-pl...@lists.mozilla.org, Johnathan Nightingale, David Mandelin
On 1/30/13 8:03 PM, Ehsan Akhgari wrote:
> We then tried to get a sense of how much of a win the PGO optimizations
> are. Thanks to a series of measurements by dmandelin, we know that
> disabling PGO/LTCG will result in a regression of about 10-20% on
> benchmarks which examine DOM and layout performance such as Dromaeo and
> guimark2 (and 40% in one case), but no significant regressions in the
> startup time, and gmail interactions. Thanks to a series of telemetry
> measurements performed by Vladan on a Nightly build we did last week
> which had PGO/LTCG disabled, there are no telemetry probes which show a
> significant regression on builds without PGO/LTCG. Vladan is going to
> try to get this data out of a Tp5 run tomorrow as well, but we don't
> have any evidence to believe that the results of that experiments will
> be any different.

Given the headaches PGO has caused and will likely continue to cause, I
believe KISS applies and it is up to PGO advocates to justify the
continued use of PGO with data showing a clear benefit.

My reading of Ehsan's summary is that there is no significant *user*
benefit (read: perf win) of PGO.

If there is no *user* benefit, then the only data that remains to
justify PGO are the benchmark results.

Therefore, I believe we should disable PGO unless there is a convincing
argument for the benchmark results that sufficiently offsets the pain
PGO inflicts. Is there?

Ted Mielczarek

unread,
Jan 31, 2013, 9:27:10 AM1/31/13
to dev-pl...@lists.mozilla.org
On 1/31/2013 8:22 AM, papa...@gmail.com wrote:
> How separate the analysis phase from the optimization based on the collected data? How are the results of the PGO runs stored? Can the optimization part be run independently? If yes would it be possible to collect the data through other means, let's say by doing a x86-64 build or only statically building modules?
>
Any crazy toolchain hacks like this are virtually unworkable. The
internals of the PGO optimizer and data it uses are a black box. Without
a doubt it would be less hassle to simply disable PGO than to try to
hack around a deficient closed-source toolchain in this much detail.

-Ted

Ryan VanderMeulen

unread,
Jan 31, 2013, 9:34:39 AM1/31/13
to
On 1/31/2013 9:14 AM, Gregory Szorc wrote:
>
> Given the headaches PGO has caused and will likely continue to cause, I
> believe KISS applies and it is up to PGO advocates to justify the
> continued use of PGO with data showing a clear benefit.
>
> My reading of Ehsan's summary is that there is no significant *user*
> benefit (read: perf win) of PGO.
>
> If there is no *user* benefit, then the only data that remains to
> justify PGO are the benchmark results.
>
> Therefore, I believe we should disable PGO unless there is a convincing
> argument for the benchmark results that sufficiently offsets the pain
> PGO inflicts. Is there?

We should also remind that there would be an infra load win from
disabling Windows PGO builds.

Ehsan Akhgari

unread,
Jan 31, 2013, 10:30:23 AM1/31/13
to Nicholas Nethercote, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2013-01-31 3:37 AM, Nicholas Nethercote wrote:
> On Thu, Jan 31, 2013 at 3:03 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> Given the above, I'd like to propose the following long-term solutions:
>>
>> 1. Disable PGO/LTCG now.
>>
>> 2. Try to delay disabling PGO/LTCG as much as possible.
>>
>> 3. Try to delay disabling PGO/LTCG until the next time that we hit the
>> limit, and disable PGO/LTCG then once and for all.
>
> In the long run, 1 and 3 are the same. If we know we're going to turn
> it off, why not bite the bullet and do it now? One big advantage of
> that is that we'd immediately stop suffering through PGO-only bugs.
> (I'm not necessarily advocating this, BTW, just observing that the two
> options are basically equivalent.)

The PGO miscompilation bugs is a good point, thanks for bringing it up.

> Also, stupid question time: is it possible to build on Windows with
> GCC and/or clang?

I don't have a lot of experience with mingw32, but to the best of my
knowledge, it's based on older versions of gcc (4.6?), and lacks 64-bit
support plus a number of C++ runtime library features, and a number of
MSVC features, such as SEH.

Clang on Windows is further away, with incomplete support for the
Microsoft C++ ABI, inline assembly, and some compiler intrinsics. We
may be able to consider this when the situation improves.

Cheers,
Ehsan

Till Schneidereit

unread,
Jan 31, 2013, 10:33:31 AM1/31/13
to smaug, Andreas Gal, Vladimir Vukicevic, dev-pl...@lists.mozilla.org, Johnathan Nightingale, David Mandelin
>> In the long run, 1 and 3 are the same. If we know we're going to turn
>> it off, why not bite the bullet and do it now?
>
>
>
> Because we're still missing plenty of optimizations in our code
> to be fast in microbenchmarks. It would be quite huge pr loss if we suddenly
> were 10-20% slower in benchmarks.
> But we're getting better (that last spike is because bz managed to
> effectively optimize out one test).
> http://graphs.mozilla.org/graph.html#tests=[[73,1,1]]&sel=none&displayrange=365&datatype=running

Do we think the planned optimizations cause the gains through PGO to
be less pronounced? If not, then slowdown in benchmarks and associated
PR loss would be the same whenever we finally pulled the plug on PGO,
right?

(And thanks to smaug for pointing out my earlier direct-only reply just now.)

Ehsan Akhgari

unread,
Jan 31, 2013, 10:34:08 AM1/31/13
to Neil, dev-pl...@lists.mozilla.org
On 2013-01-31 5:38 AM, Neil wrote:
> smaug wrote:
>
>> On 01/31/2013 10:37 AM, Nicholas Nethercote wrote:
>>
>>> If we know we're going to turn it off, why not bite the bullet and do
>>> it now?
>>
>> Because we're still missing plenty of optimizations in our code to be
>> fast in microbenchmarks.
>
> Do we know (e.g. via profiling) where these optimisations need to be?

No. But imitating the optimizations performed by the PGO compiler will
not be trivial.

> (I
> don't know how feasible it would be but another approach would be to PGO
> less and less code until it starts affecting the microbenchmarks.)

That is kind of what we've been doing so far, and it has proven not to
scale well.

Cheers,
Ehsan

Ehsan Akhgari

unread,
Jan 31, 2013, 10:39:36 AM1/31/13
to jmat...@mozilla.com, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, mozilla.de...@googlegroups.com, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2013-01-31 6:39 AM, jmat...@mozilla.com wrote:
>> We then tried to get a sense of how much of a win the PGO optimizations
>> are. Thanks to a series of measurements by dmandelin, we know that
>> disabling PGO/LTCG will result in a regression of about 10-20% on
>> benchmarks which examine DOM and layout performance such as Dromaeo and
>> guimark2 (and 40% in one case), but no significant regressions in the
>> startup time, and gmail interactions. Thanks to a series of telemetry
>> measurements performed by Vladan on a Nightly build we did last week
>> which had PGO/LTCG disabled, there are no telemetry probes which show a
>> significant regression on builds without PGO/LTCG. Vladan is going to
>> try to get this data out of a Tp5 run tomorrow as well, but we don't
>> have any evidence to believe that the results of that experiments will
>> be any different.
>
>
> Are the test run stats we're using here published somewhere? We should be tracking all this testing some place (a wiki page maybe?) so people can do their own investigation.

See the dependencies of bug 833881 for the details of the measurements.

> I've always wondered if the tests we run during the pgo phase are sufficient to get good coverage over the entire app. Is it possible that we don't see gains in other areas because our pgo tests don't hit those areas? (I think there was an effort under way to expose these tests so they could be modified in try runs for better experimentation.)

Yes, it's definitely possible to get more PGO coverage, but as far as I
can see, the last time we changed the pageset we load during PGO was in
2009 (bug 472706).

Cheers,
Ehsan

Ed Morley

unread,
Jan 31, 2013, 10:40:02 AM1/31/13
to dev-pl...@lists.mozilla.org
----- Original Message -----
> We should also remind that there would be an infra load win from
> disabling Windows PGO builds.

Plus less of a lead time waiting for PGO results before an inbound -> mozilla-central merge can be performed :-D
(even if we keep PGO on other platforms, Windows was always the long pole)

Ed

Ehsan Akhgari

unread,
Jan 31, 2013, 10:42:34 AM1/31/13
to Till Schneidereit, Vladimir Vukicevic, Johnathan Nightingale, smaug, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2013-01-31 10:33 AM, Till Schneidereit wrote:
>>> In the long run, 1 and 3 are the same. If we know we're going to turn
>>> it off, why not bite the bullet and do it now?
>>
>>
>>
>> Because we're still missing plenty of optimizations in our code
>> to be fast in microbenchmarks. It would be quite huge pr loss if we suddenly
>> were 10-20% slower in benchmarks.
>> But we're getting better (that last spike is because bz managed to
>> effectively optimize out one test).
>> http://graphs.mozilla.org/graph.html#tests=[[73,1,1]]&sel=none&displayrange=365&datatype=running
>
> Do we think the planned optimizations cause the gains through PGO to
> be less pronounced? If not, then slowdown in benchmarks and associated
> PR loss would be the same whenever we finally pulled the plug on PGO,
> right?

I'm not sure what you mean. If we get 10% slower in Dromaeo by
disabling PGO and take a patch which makes us 20% faster regardless of
PGO, then we should expect an approximate 10% win as a result. But
generally, the game of trying to beat the compiler in its optimizations
is futile, since they can outsmart most programmers on their worst day. :-)

Ehsan

jmat...@mozilla.com

unread,
Jan 31, 2013, 10:59:43 AM1/31/13
to dev-pl...@lists.mozilla.org
> As a historical note, when we first enabled PGO support for Windows our
> profiling scenario was "start Firefox, wait 10 seconds, shut down
> Firefox". Enabling PGO with this profiling run provided us with 20-25%
> perf improvements in many of our benchmarks on Talos. We later changed
> it to the current set of profiling data[1] (Blueprint CSS samples, the
> SunSpider benchmark), and there was almost no visible change in the
> Talos numbers.

This seems to indicate our current coverage isn't oriented toward
performance gains users will see, and that there are potential
gains to be found. All the more reason to keep pgo around a while
longer and figure out how we can simplify testing with different test
runs.


> We should also remind that there would be an infra load win from
> disabling Windows PGO builds.

IMHO, if it's a choice between infra load and better performance
in the end product, performance should win out.

jmat...@mozilla.com

unread,
Jan 31, 2013, 10:59:43 AM1/31/13
to mozilla.de...@googlegroups.com, dev-pl...@lists.mozilla.org

Till Schneidereit

unread,
Jan 31, 2013, 11:03:14 AM1/31/13
to Ehsan Akhgari, Vladimir Vukicevic, Johnathan Nightingale, smaug, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On Thu, Jan 31, 2013 at 4:42 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> On 2013-01-31 10:33 AM, Till Schneidereit wrote:
>>>>
>>>> In the long run, 1 and 3 are the same. If we know we're going to turn
>>>> it off, why not bite the bullet and do it now?
>>>
>>>
>>>
>>>
>>> Because we're still missing plenty of optimizations in our code
>>> to be fast in microbenchmarks. It would be quite huge pr loss if we
>>> suddenly
>>> were 10-20% slower in benchmarks.
>>> But we're getting better (that last spike is because bz managed to
>>> effectively optimize out one test).
>>>
>>> http://graphs.mozilla.org/graph.html#tests=[[73,1,1]]&sel=none&displayrange=365&datatype=running
>>
>>
>> Do we think the planned optimizations cause the gains through PGO to
>> be less pronounced? If not, then slowdown in benchmarks and associated
>> PR loss would be the same whenever we finally pulled the plug on PGO,
>> right?
>
>
> I'm not sure what you mean. If we get 10% slower in Dromaeo by disabling
> PGO and take a patch which makes us 20% faster regardless of PGO, then we
> should expect an approximate 10% win as a result. But generally, the game
> of trying to beat the compiler in its optimizations is futile, since they
> can outsmart most programmers on their worst day. :-)

Sure. What I was asking is precisely if the optimizations we're
betting on will have the property of not being affected by PGO. If we
take a patch now that makes us 10% faster with PGO enabled and then
later lose 10% by switching off PGO, then the PR effect will be the
same as switching off PGO right now. If, OTOH, we switch off PGO while
at the same time pushing patches that get us back all of the
performance we lost through it, we shouldn't have any PR fallout at
all.

>
> Ehsan
>

papa...@gmail.com

unread,
Jan 31, 2013, 8:22:38 AM1/31/13
to mozilla.de...@googlegroups.com, dev-planning@lists.mozilla.org planning, dev-pl...@lists.mozilla.org, David Mandelin

jmat...@mozilla.com

unread,
Jan 31, 2013, 11:38:42 AM1/31/13
to dev-pl...@lists.mozilla.org
On Thursday, January 31, 2013 8:14:01 AM UTC-6, Gregory Szorc wrote:
> My reading of Ehsan's summary is that there is no significant *user*
> benefit (read: perf win) of PGO.
>
> If there is no *user* benefit, then the only data that remains to
> justify PGO are the benchmark results.
>
> Therefore, I believe we should disable PGO unless there is a convincing
> argument for the benchmark results that sufficiently offsets the pain
> PGO inflicts. Is there?

http://graphs.mozilla.org/graph.html#tests=[[83,94,12],[83,1,12]]&sel=none&displayrange=365&datatype=running

Ts, Paint shows an improvement of 14%. This is with Firefox and Firefox-Non-PGO, which I believe to be mc. Also while I can't seem to find it on pertastic, Tresize appears to enjoy a 9% improvement with pgo on mc.

jmat...@mozilla.com

unread,
Jan 31, 2013, 11:38:42 AM1/31/13
to mozilla.de...@googlegroups.com, dev-pl...@lists.mozilla.org
On Thursday, January 31, 2013 8:14:01 AM UTC-6, Gregory Szorc wrote:
> My reading of Ehsan's summary is that there is no significant *user*
> benefit (read: perf win) of PGO.
>
> If there is no *user* benefit, then the only data that remains to
> justify PGO are the benchmark results.
>
> Therefore, I believe we should disable PGO unless there is a convincing
> argument for the benchmark results that sufficiently offsets the pain
> PGO inflicts. Is there?

Ted Mielczarek

unread,
Jan 31, 2013, 11:41:37 AM1/31/13
to dev-pl...@lists.mozilla.org
On 1/31/2013 11:38 AM, jmat...@mozilla.com wrote:
> http://graphs.mozilla.org/graph.html#tests=[[83,94,12],[83,1,12]]&sel=none&displayrange=365&datatype=running
>
> Ts, Paint shows an improvement of 14%. This is with Firefox and Firefox-Non-PGO, which I believe to be mc. Also while I can't seem to find it on pertastic, Tresize appears to enjoy a 9% improvement with pgo on mc.
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

I've said before that there's ample Talos evidence that PGO is a perf
win. I guess you can argue that none of our Talos results measure
real-world performance, but in that case you are arguing that we aren't
measuring anything that is important to our users, so you're going down
a whole separate rabbit hole...

-Ted

Kyle Huey

unread,
Jan 31, 2013, 11:43:26 AM1/31/13
to Ehsan Akhgari, dev-pl...@lists.mozilla.org
On Wed, Jan 30, 2013 at 8:03 PM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> We then tried to get a sense of how much of a win the PGO optimizations
> are. Thanks to a series of measurements by dmandelin, we know that
> disabling PGO/LTCG will result in a regression of about 10-20% on
> benchmarks which examine DOM and layout performance such as Dromaeo and
> guimark2 (and 40% in one case), but no significant regressions in the
> startup time, and gmail interactions. Thanks to a series of telemetry
> measurements performed by Vladan on a Nightly build we did last week which
> had PGO/LTCG disabled, there are no telemetry probes which show a
> significant regression on builds without PGO/LTCG. Vladan is going to try
> to get this data out of a Tp5 run tomorrow as well, but we don't have any
> evidence to believe that the results of that experiments will be any
> different.
>

Isn't PGO worth something like 15% on Ts?

- Kyle

Ehsan Akhgari

unread,
Jan 31, 2013, 11:48:41 AM1/31/13
to Till Schneidereit, Vladimir Vukicevic, Johnathan Nightingale, smaug, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2013-01-31 11:03 AM, Till Schneidereit wrote:
> On Thu, Jan 31, 2013 at 4:42 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> On 2013-01-31 10:33 AM, Till Schneidereit wrote:
>>>>>
>>>>> In the long run, 1 and 3 are the same. If we know we're going to turn
>>>>> it off, why not bite the bullet and do it now?
>>>>
>>>>
>>>>
>>>>
>>>> Because we're still missing plenty of optimizations in our code
>>>> to be fast in microbenchmarks. It would be quite huge pr loss if we
>>>> suddenly
>>>> were 10-20% slower in benchmarks.
>>>> But we're getting better (that last spike is because bz managed to
>>>> effectively optimize out one test).
>>>>
>>>> http://graphs.mozilla.org/graph.html#tests=[[73,1,1]]&sel=none&displayrange=365&datatype=running
>>>
>>>
>>> Do we think the planned optimizations cause the gains through PGO to
>>> be less pronounced? If not, then slowdown in benchmarks and associated
>>> PR loss would be the same whenever we finally pulled the plug on PGO,
>>> right?
>>
>>
>> I'm not sure what you mean. If we get 10% slower in Dromaeo by disabling
>> PGO and take a patch which makes us 20% faster regardless of PGO, then we
>> should expect an approximate 10% win as a result. But generally, the game
>> of trying to beat the compiler in its optimizations is futile, since they
>> can outsmart most programmers on their worst day. :-)
>
> Sure. What I was asking is precisely if the optimizations we're
> betting on will have the property of not being affected by PGO. If we
> take a patch now that makes us 10% faster with PGO enabled and then
> later lose 10% by switching off PGO, then the PR effect will be the
> same as switching off PGO right now. If, OTOH, we switch off PGO while
> at the same time pushing patches that get us back all of the
> performance we lost through it, we shouldn't have any PR fallout at
> all.

The optimizations that we usually work on are orthogonal to the
optimizations performed by the PGO compiler.

Cheers,
Ehsan

Ehsan Akhgari

unread,
Jan 31, 2013, 11:51:00 AM1/31/13
to Kyle Huey, dev-pl...@lists.mozilla.org
That was what I thought, but local measurements performed by dmandelin
proved otherwise.

Cheers,
Ehsan

chris...@gmail.com

unread,
Jan 31, 2013, 11:53:07 AM1/31/13
to dev-pl...@lists.mozilla.org
> 3. Try to delay disabling PGO/LTCG until the next time that we hit the
> limit, and disable PGO/LTCG then once and for all. In order to
> implement this solution, we're going to need:
> * A person to own watching the graphs and report back when we step
> inside the danger zone again.

I think this could be done directly in the build system. We could hard code our warning threshold into the build system, and turn the build orange once we exceed it.

Ehsan Akhgari

unread,
Jan 31, 2013, 11:54:50 AM1/31/13
to jmat...@mozilla.com, dev-pl...@lists.mozilla.org, mozilla.de...@googlegroups.com
On 2013-01-31 10:59 AM, jmat...@mozilla.com wrote:
>> As a historical note, when we first enabled PGO support for Windows our
>> profiling scenario was "start Firefox, wait 10 seconds, shut down
>> Firefox". Enabling PGO with this profiling run provided us with 20-25%
>> perf improvements in many of our benchmarks on Talos. We later changed
>> it to the current set of profiling data[1] (Blueprint CSS samples, the
>> SunSpider benchmark), and there was almost no visible change in the
>> Talos numbers.
>
> This seems to indicate our current coverage isn't oriented toward
> performance gains users will see, and that there are potential
> gains to be found. All the more reason to keep pgo around a while
> longer and figure out how we can simplify testing with different test
> runs.

There are costs to keeping PGO enabled, and while we can argue that we
_could_ be getting more PGO gain by providing a better profile, one
could counter-argue that engineers can spend more time optimizing other
things if they didn't need to find the perfect profile (which is sort of
a black magic.)

>> We should also remind that there would be an infra load win from
>> disabling Windows PGO builds.
>
> IMHO, if it's a choice between infra load and better performance
> in the end product, performance should win out.

I agree, infrastructure load is not relevant to this conversation.
There are tons of other ways to improve that besides disabling PGO.

Cheers,
Ehsan

Kyle Huey

unread,
Jan 31, 2013, 11:58:05 AM1/31/13
to Ehsan Akhgari, dev-pl...@lists.mozilla.org
Uh, don't we have a bigger problem then?

- Kyle

chris...@gmail.com

unread,
Jan 31, 2013, 11:53:07 AM1/31/13
to mozilla.de...@googlegroups.com, dev-pl...@lists.mozilla.org

Ben Hearsum

unread,
Jan 31, 2013, 12:06:59 PM1/31/13
to
On 01/31/13 10:59 AM, jmat...@mozilla.com wrote:
> IMHO, if it's a choice between infra load and better performance
> in the end product, performance should win out.

We're not talking about infrastructure load here, we're talking about
whether or not we can compile at all.

Ehsan Akhgari

unread,
Jan 31, 2013, 12:09:15 PM1/31/13
to Kyle Huey, dev-pl...@lists.mozilla.org
The problem being?

Ehsan

Boris Zbarsky

unread,
Jan 31, 2013, 12:10:16 PM1/31/13
to
On 1/31/13 10:33 AM, Till Schneidereit wrote:
> Do we think the planned optimizations cause the gains through PGO to
> be less pronounced?

It... depends. There are a few things at play here.

First of all, our current profiling at least for DOM and layout stuff is
largely looking for the wallet where the light is good, that is to say
happening on Mac. Which happens to be the one desktop platform on which
we don't do PGO.

As a result, code micro-optimized based on those profiling results tends
to inline a lot of things (in many cases forcing inlining via
MOZ_ALWAYS_INLINE where the compiler wasn't otherwise doing it) and hope
for the best. This works pretty well for microbenchmarks; how it works
at scale is an interesting question that we don't have a great answer to
yet because we don't have good things to measure. The flip side of that
is that we've been a bit more resistant to the "write the code and let
PGO sort it out" approach some have advocated, so turning off PGO won't
be a total disaster for such code.

Second, in any testcase that involves both jitcode and C++ code, turning
off PGO will only affect the C++ code, of course. So to the extent that
we speed up the C++ parts of the app relative to the jitcode, turning
off PGO becomes less of a hit in testcases that involve both. Of course
as we optimize the JIT the balance swings in the other direction.

The real benefit of PGO is its ability to somewhat easily optimize the
actual workload you care about instead of microbenchmarks.... For
microbenchmarks proper we can always make them faster manually; the
question is at what cost.

> If not, then slowdown in benchmarks and associated
> PR loss would be the same whenever we finally pulled the plug on PGO,
> right?

Even if we posit the slowdown is the same, the PR loss is not.

Say browser A takes time T to run a test, browser B takes time 1.2T and
browser C takes time 0.8T.

Say browsers B and C both suffer a 10% regression on that test. Now the
times are:

A: T, B: 1.32T, C: 0.88T

From a PR point of view, the key part is that browser B is now 30%
slower than A, but C is still 12% faster. So it's not clear to me that
there would be any particularly bad PR for C at all.

-Boris

Kyle Huey

unread,
Jan 31, 2013, 12:12:38 PM1/31/13
to Ehsan Akhgari, dev-pl...@lists.mozilla.org
That Ts is bogus?

- Kyle

Ben Hearsum

unread,
Jan 31, 2013, 12:18:02 PM1/31/13
to
On 01/31/13 12:10 PM, Boris Zbarsky wrote:
> Even if we posit the slowdown is the same, the PR loss is not.
>
> Say browser A takes time T to run a test, browser B takes time 1.2T and
> browser C takes time 0.8T.
>
> Say browsers B and C both suffer a 10% regression on that test. Now the
> times are:
>
> A: T, B: 1.32T, C: 0.88T
>
> From a PR point of view, the key part is that browser B is now 30%
> slower than A, but C is still 12% faster. So it's not clear to me that
> there would be any particularly bad PR for C at all.
>
> -Boris

While not directly relevant to this, it's worth noting that Chrome
doesn't use PGO for exactly the reason we're looking at turning it off.
(https://code.google.com/p/chromium/issues/detail?id=21932)

Joshua Cranmer

unread,
Jan 31, 2013, 12:17:44 PM1/31/13
to
For what it's worth, reading
<https://bugzilla.mozilla.org/show_bug.cgi?id=833890>, I do not get the
impression that dmandelin "proved" otherwise. His startup tests have
very low statistical confidence (n=2, n=3), and someone who disclaims
his own findings. It may be evidence that PGO is not a Ts win, but it is
weak evidence at best. Our Talos results may be measuring imperfect
things, but we have enough datapoints that we can draw statistical
conclusions from them confidently. If you want to argue to me that
they're wrong, you're going to have produce more compelling evidence.

Dave Mandelin

unread,
Jan 31, 2013, 1:05:09 PM1/31/13
to
I could certainly run a larger number of trials to see what happens. In that case, I stopped because the min values for warm startup were about equal (and also happened to be about equal to other warm startup times I had measured recently). For many timed benchmarks, "base value + positive random noise" seems like a good model, in which case mins seem like good things to compare.

> Our Talos results may be measuring imperfect things, but we have
> enough datapoints that we can draw statistical conclusions from
> them confidently.

Statistics doesn't help if you're measuring the wrong things. Whether Ts is measuring the wrong thing, I don't know. It would be possible to learn something about that question by measuring startup with a camera, Telemetry simple measures, and Talos on the same machine and seeing how they compare.

By the way, there is a project (in a very early phase now) to do accurate measurements of startup time, both cold and warm, on machines that model user hardware, etc.

Dave

Ted Mielczarek

unread,
Jan 31, 2013, 1:07:39 PM1/31/13
to dev-pl...@lists.mozilla.org
On 1/30/2013 11:03 PM, Ehsan Akhgari wrote:
> (Follow-ups to dev-platform, please)
>
> Dear all,
>
> This email summarizes the results of our investigation on our options
> with regard to the future of PGO optimizations on Windows. I will
> first describe the work that happened as part of the investigation,
> and will then propose a set of options on what solutions are available
> to us. If you're interested in the tl;dr version, please scroll to
> the bottom. For the details, see the dependencies of bug 833881.
>
> (Note that we're only talking about PGO for libxul. Anything outside
> of libxul, specifically the JS engine, is not going to be affected by
> the decision coming out of this thread. And obviously, this
> discussion is only about Windows.)
>
> The first thing that we tried to investigate was whether or not
> upgrading to Visual Studio 2012 Update 1 makes the memory usage of the
> PGO linker drop down by a significant amount. Thanks to the
> investigation done by jimm, we know that it will actually increase the
> memory usage, and therefore is not an option.
>
> Then, we tried to see how much breathing room we're going to have if
> we disabled PGO but not link-time code generation (LTCG), and if we
> disable them both together. It turns out that disabling PGO but
> keeping LTCG enabled reduces the memory usage by ~200MB, which means
> that it's not an effective measure. Disabling both LTCG and PGO
> brings down the linker's virtual memory usage to around 1GB, which
> means that we will not hit the maximum virtual memory size of 4GB for
> a *long* time. (Unfortunately, the Microsoft toolchain cannot perform
> PGO builds without LTCG.) Therefore, for the rest of this email, I
> will talk about disabling both PGO and LTCG.
>
> We then tried to get a sense of how much of a win the PGO
> optimizations are. Thanks to a series of measurements by dmandelin,
> we know that disabling PGO/LTCG will result in a regression of about
> 10-20% on benchmarks which examine DOM and layout performance such as
> Dromaeo and guimark2 (and 40% in one case), but no significant
> regressions in the startup time, and gmail interactions. Thanks to a
> series of telemetry measurements performed by Vladan on a Nightly
> build we did last week which had PGO/LTCG disabled, there are no
> telemetry probes which show a significant regression on builds without
> PGO/LTCG. Vladan is going to try to get this data out of a Tp5 run
> tomorrow as well, but we don't have any evidence to believe that the
> results of that experiments will be any different.
>
>
> Given the above, I'd like to propose the following long-term solutions:
>
> 1. Disable PGO/LTCG now. The downsides are that we should take a hit
> in microbenchmarks, specifically Dromaeo. But we have no reason to
> believe that is going to affect any of the performance characteristics
> observed by our users. And it means that engineers can stop worrying
> about this problem once and for all.
>
> 2. Try to delay disabling PGO/LTCG as much as possible. Given the
> tracking implemented in bug 710840, we can now watch those graphs so
> that we know when this problem is going to hit next, and come up with
> a mitigation strategy. In order to effectively implement this
> solution, we're going to need:
> * A person to own watching the graphs and report back when we step
> inside "the danger zone" again.
> * A detailed plan of action on what we'll do to mitigate this
> problem the next time as opposed to acting on a firedrill. One
> possible plan of action could be disabling PGO for everything except
> content/dom/layout/xpcom/gfx, no questions asked.
> * A group of engineers to own performing the above action.
> * Going back through the historical data over the past year,
> determine the causes behind the large spikes in the gradual memory
> usage increase, and find solutions to them to buy as much time as
> possible.
>
> 3. Try to delay disabling PGO/LTCG until the next time that we hit the
> limit, and disable PGO/LTCG then once and for all. In order to
> implement this solution, we're going to need:
> * A person to own watching the graphs and report back when we step
> inside the danger zone again.
> * A build-system patch which makes it possible to disable PGO/LTCG
> for libxul by toggling a switch.
> * Clear documentation on what that switch is, so that anybody can
> toggle it when we need to take action the next time.
>
>
> I think given the information that we currently have, the best course
> of action is #3, followed by #1 and #2. I'd like to explicitly
> recommend against #2, because I don't think we have the evidence to
> support that spending that much effort will bring any noticeable gains
> to our users. This effort is better spent elsewhere.

After consideration, I think we ought to just bite the bullet and
disable PGO. We have no other way to fix this issue. All other work we
can do simply pushes it down the road. As our recent history has shown,
we simply don't have the ability to fix this in any long-term sense. If
Microsoft doesn't fix their toolchain, there's nothing we can do.

Related, I think we ought to seriously investigate funding work on
making clang a viable toolchain for building Firefox on Windows. Having
a non-open toolchain makes compiler bugs and limitations much more
painful, where this PGO issue is the extreme example.

-Ted

Mike Hommey

unread,
Jan 31, 2013, 1:18:35 PM1/31/13
to Ted Mielczarek, dev-pl...@lists.mozilla.org
On Thu, Jan 31, 2013 at 01:07:39PM -0500, Ted Mielczarek wrote:
> After consideration, I think we ought to just bite the bullet and
> disable PGO. We have no other way to fix this issue. All other work we
> can do simply pushes it down the road. As our recent history has shown,
> we simply don't have the ability to fix this in any long-term sense. If
> Microsoft doesn't fix their toolchain, there's nothing we can do.

I'd say that if we can keep our benchmark scores by selectively
enabling PGO in some directories (instead of the current scheme of
selectively disabling it), we should go for that.

Mike

L. David Baron

unread,
Jan 31, 2013, 1:31:29 PM1/31/13
to dev-pl...@lists.mozilla.org
Is it possible we might be able to make MOZ_LIKELY and MOZ_UNLIKELY
meaningful on Windows (they currently only do anything on gcc or
clang builds)? If we did, might that get back some of the gain from
turning off PGO?

-David

--
𝄞 L. David Baron http://dbaron.org/ 𝄂
𝄢 Mozilla http://www.mozilla.org/ 𝄂

Ehsan Akhgari

unread,
Jan 31, 2013, 1:51:02 PM1/31/13
to Kyle Huey, dev-pl...@lists.mozilla.org
On Thu, Jan 31, 2013 at 12:12 PM, Kyle Huey <m...@kylehuey.com> wrote:

>
> On Thu, Jan 31, 2013 at 9:09 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:
>
>> On 2013-01-31 11:58 AM, Kyle Huey wrote:
>>
>>> On Thu, Jan 31, 2013 at 8:51 AM, Ehsan Akhgari <ehsan....@gmail.com
>>> <mailto:ehsan.akhgari@gmail.**com <ehsan....@gmail.com>>> wrote:
>>>
>>> On 2013-01-31 11:43 AM, Kyle Huey wrote:
>>>
>>> Isn't PGO worth something like 15% on Ts?
>>>
>>>
>>> That was what I thought, but local measurements performed by
>>> dmandelin proved otherwise.
>>>
>>>
>>> Uh, don't we have a bigger problem then?
>>>
>>
>> The problem being?
>>
>
> That Ts is bogus?
>

Could be, but that is a completely different conversation which is off
topic for this thread.

--
Ehsan
<http://ehsanakhgari.org/>

Nathan Froyd

unread,
Jan 31, 2013, 1:59:16 PM1/31/13
to dev-pl...@lists.mozilla.org
----- Original Message -----
> Is it possible we might be able to make MOZ_LIKELY and MOZ_UNLIKELY
> meaningful on Windows (they currently only do anything on gcc or
> clang builds)? If we did, might that get back some of the gain from
> turning off PGO?

Nope:

http://social.msdn.microsoft.com/forums/en-US/vclanguage/thread/2dbdca4d-c0c0-40a3-993b-dc78817be26e/

A MSVC team member recommended using PGO instead. =/

-Nathan

Ehsan Akhgari

unread,
Jan 31, 2013, 1:59:36 PM1/31/13
to L. David Baron, dev-pl...@lists.mozilla.org
MSVC supports __assume, which is similar but not quite the same. I'm very
skeptical that by simply using __assume we'll regain the benchmark hit
resulting from turning PGO off.

--
Ehsan
<http://ehsanakhgari.org/>


On Thu, Jan 31, 2013 at 1:31 PM, L. David Baron <dba...@dbaron.org> wrote:

> Is it possible we might be able to make MOZ_LIKELY and MOZ_UNLIKELY
> meaningful on Windows (they currently only do anything on gcc or
> clang builds)? If we did, might that get back some of the gain from
> turning off PGO?
>
> -David
>
> --
> 𝄞 L. David Baron http://dbaron.org/ 𝄂
> 𝄢 Mozilla http://www.mozilla.org/ 𝄂

Mike Hommey

unread,
Jan 31, 2013, 2:07:56 PM1/31/13
to Ehsan Akhgari, L. David Baron, dev-pl...@lists.mozilla.org
On Thu, Jan 31, 2013 at 01:59:36PM -0500, Ehsan Akhgari wrote:
> MSVC supports __assume, which is similar but not quite the same. I'm very
> skeptical that by simply using __assume we'll regain the benchmark hit
> resulting from turning PGO off.

__assume is not even close to similar, and it's actually dangerous
because it can make the compiler remove code, which no use of MOZ_LIKELY
is expected to do.

Mike

L. David Baron

unread,
Jan 31, 2013, 2:21:56 PM1/31/13
to Ehsan Akhgari, dev-pl...@lists.mozilla.org
> On Thu, Jan 31, 2013 at 1:31 PM, L. David Baron <dba...@dbaron.org> wrote:
> > Is it possible we might be able to make MOZ_LIKELY and MOZ_UNLIKELY
> > meaningful on Windows (they currently only do anything on gcc or
> > clang builds)? If we did, might that get back some of the gain from
> > turning off PGO?

On Thursday 2013-01-31 13:59 -0500, Ehsan Akhgari wrote:
> MSVC supports __assume, which is similar but not quite the same. I'm very
> skeptical that by simply using __assume we'll regain the benchmark hit
> resulting from turning PGO off.

I certainly wouldn't expect to regain anything close to the whole
benchmark hit, but I could imagine regaining 10% or 20% of it with
something similar (which Mike's post notes __assume isn't).

Joshua Cranmer

unread,
Jan 31, 2013, 2:32:52 PM1/31/13
to
On 1/31/2013 12:05 PM, Dave Mandelin wrote:
> On Thursday, January 31, 2013 9:17:44 AM UTC-8, Joshua Cranmer wrote:
>> For what it's worth, reading
>> <https://bugzilla.mozilla.org/show_bug.cgi?id=833890>, I do not get
>> the impression that dmandelin "proved" otherwise. His startup tests
>> have very low statistical confidence (n=2, n=3), and someone who
>> disclaims his own findings. It may be evidence that PGO is not a Ts
>> win, but it is weak evidence at best.
> I could certainly run a larger number of trials to see what happens. In that case, I stopped because the min values for warm startup were about equal (and also happened to be about equal to other warm startup times I had measured recently). For many timed benchmarks, "base value + positive random noise" seems like a good model, in which case mins seem like good things to compare.
From a statistical hypothesis testing perspective, I think (I haven't
actually done the math) that the given data is unable to reject either
the hypothesis that PGO gives a benefit on startup time or the
hypothesis that it does not. Mostly, I was cringing at ehsan's statement
that your results "proved" the hypothesis. About what the best
statistical criteria are, I don't wish to argue here.

>
>> Our Talos results may be measuring imperfect things, but we have
>> enough datapoints that we can draw statistical conclusions from
>> them confidently.
> Statistics doesn't help if you're measuring the wrong things. Whether Ts is measuring the wrong thing, I don't know. It would be possible to learn something about that question by measuring startup with a camera, Telemetry simple measures, and Talos on the same machine and seeing how they compare.

I should clarify my previous statement: I want to avoid confirmation
bias in this decision. The proper way to do that is to lay out all the
criterion for acceptance or rejection before you run experiments and
measure the results. This, obviously, is impossible at this point, since
we have a mountain of data which has already biased our thought processes.

> By the way, there is a project (in a very early phase now) to do accurate measurements of startup time, both cold and warm, on machines that model user hardware, etc.
>
This is really starting to get off-topic, but I do think we need clear
guidelines on evaluating performance results, which includes things like
ensuring proper statistical testing on results, etc.

Jim Mathies

unread,
Jan 31, 2013, 2:44:28 PM1/31/13
to


> > Our Talos results may be measuring imperfect things, but we have
> > enough datapoints that we can draw statistical conclusions from
> > them confidently.
>
> Statistics doesn't help if you're measuring the wrong things. Whether Ts
> is measuring the wrong thing, I don't know. It would be possible to learn
> something about that question by measuring startup with a camera,
> Telemetry simple measures, and Talos on the same machine and seeing how
> they compare.

"Ts, Paint" measures the time between a call to window.open to the first
MozAfterPaint for that window in a running process. It's analogous to
hitting ctrl-n in Firefox. The page that gets loaded is light weight. It
measures pretty much everything associated with window creation, including
widget, dom, layout, and browser front end window startup and rendering.
It's a very good test IMHO.

Jim

David Anderson

unread,
Jan 31, 2013, 2:49:50 PM1/31/13
to jmat...@mozilla.com, dev-pl...@lists.mozilla.org
On Thursday, January 31, 2013 8:54:50 AM UTC-8, Ehsan Akhgari wrote:
I'm weighing in a little late here, but from the JS team's perspective, PGO is a nightmare. It introduces subtle compiler bugs (often topcrashes) that are extremely difficult to track down. We end up littering the codebase with de-PGO hints. To date I have yet to see a PGO-only crash that manifests in the JS engine, that was not directly caused by PGO.

Related, I don't think MOZ_LIKELY/UNLIKELY are either a good idea or would cover the PGO gap. PGO does way more than just branch prediction, it has all sorts of speculative partial inlining and register allocation tricks. (That, unfortunately, are buggy.)

Anyway, disabling PGO is music to my ears - I'd bet money that our overall crash-stats will improve.

-David

Zack Weinberg

unread,
Jan 31, 2013, 2:58:52 PM1/31/13
to
On 2013-01-31 1:07 PM, Ted Mielczarek wrote:
> After consideration, I think we ought to just bite the bullet and
> disable PGO. We have no other way to fix this issue. All other work we
> can do simply pushes it down the road. As our recent history has shown,
> we simply don't have the ability to fix this in any long-term sense. If
> Microsoft doesn't fix their toolchain, there's nothing we can do.
>
> Related, I think we ought to seriously investigate funding work on
> making clang a viable toolchain for building Firefox on Windows. Having
> a non-open toolchain makes compiler bugs and limitations much more
> painful, where this PGO issue is the extreme example.

I think funding PGO/LTCG improvements in clang and/or gcc is a great
idea, but let's not forget that Windows is a legacy platform that's only
tier-1 because it happens to have some absurdly large proportion of our
users. We should be minimizing the amount of time and effort we put
into it.

zw

Dave Mandelin

unread,
Jan 31, 2013, 2:59:09 PM1/31/13
to
On Thursday, January 31, 2013 11:32:52 AM UTC-8, Joshua Cranmer wrote:
> On 1/31/2013 12:05 PM, Dave Mandelin wrote:
> > On Thursday, January 31, 2013 9:17:44 AM UTC-8, Joshua Cranmer wrote:
> >> For what it's worth, reading
> >> <https://bugzilla.mozilla.org/show_bug.cgi?id=833890>, I do not get
> >> the impression that dmandelin "proved" otherwise. His startup tests
> >> have very low statistical confidence (n=2, n=3), and someone who
> >> disclaims his own findings. It may be evidence that PGO is not a Ts
> >> win, but it is weak evidence at best.
>
> > I could certainly run a larger number of trials to see what happens. In that case, I stopped because the min values for warm startup were about equal (and also happened to be about equal to other warm startup times I had measured recently). For many timed benchmarks, "base value + positive random noise" seems like a good model, in which case mins seem like good things to compare.
>
> From a statistical hypothesis testing perspective, I think (I haven't
> actually done the math) that the given data is unable to reject either
> the hypothesis that PGO gives a benefit on startup time or the
> hypothesis that it does not. Mostly, I was cringing at ehsan's statement
> that your results "proved" the hypothesis. About what the best
> statistical criteria are, I don't wish to argue here.

I don't think statistics ever claims to be able to "prove" anything about the "actual reality", assuming I understood my stats class at all. Instead, you have some assumptions about the distribution of your data (is it normal, exponential, etc.; is the variance the same for all conditions or possibly variable) and based on that you can compute the probability of your experimental outcome. Interpreting that probability is outside the scope of the math and is more of a judgment call. It's all very subtle and complex.

For example, in the SunSpider comparison, I did 10 trials each of PGO and non-PGO and then ran a t test. The t test then said "If the data are normally distributed and PGO and non-PGO have the same variance, then there is a 0.06 probability that the difference in means was equal to or larger than the difference in the averages observed in this experiment". From that plus general knowledge, I judged that there 'probably' was some real difference, but that it's hard to know for sure. SunSpider scores do not have a normal distribution, though, so the 0.06 is talking about a fictional world.

> >> Our Talos results may be measuring imperfect things, but we have
> >> enough datapoints that we can draw statistical conclusions from
> >> them confidently.
>
> > Statistics doesn't help if you're measuring the wrong things. Whether Ts is measuring the wrong thing, I don't know. It would be possible to learn something about that question by measuring startup with a camera, Telemetry simple measures, and Talos on the same machine and seeing how they compare.
>
> I should clarify my previous statement: I want to avoid confirmation
> bias in this decision. The proper way to do that is to lay out all the
> criterion for acceptance or rejection before you run experiments and
> measure the results. This, obviously, is impossible at this point, since
> we have a mountain of data which has already biased our thought processes.

It's rare that you can tell exactly what experiments and criteria you will want to use before you start. In practice, the main cues I use are to be sure to understand the limits of the information I'm picking up, and to try to prove myself wrong when I get a chance.

> > By the way, there is a project (in a very early phase now) to do accurate measurements of startup time, both cold and warm, on machines that model user hardware, etc.
>
> This is really starting to get off-topic, but I do think we need clear
> guidelines on evaluating performance results, which includes things like
> ensuring proper statistical testing on results, etc.

That is a primary goal of the project.

Dave

Dave Mandelin

unread,
Jan 31, 2013, 3:01:38 PM1/31/13
to
I looked at the code and saw that. It looked to me like it was probably measuring about the right thing, although it is measuring the times inside JS/Python code that could conceivably introduce delays. It's part of an entire system, though, and the results are only correct if the whole system works correctly, and I don't know that that's been tested recently.

Dave

Chris Peterson

unread,
Jan 31, 2013, 3:04:12 PM1/31/13
to
On 1/31/13 11:21 AM, L. David Baron wrote:
>> On Thu, Jan 31, 2013 at 1:31 PM, L. David Baron <dba...@dbaron.org> wrote:
>>> Is it possible we might be able to make MOZ_LIKELY and MOZ_UNLIKELY
>>> meaningful on Windows (they currently only do anything on gcc or
>>> clang builds)? If we did, might that get back some of the gain from
>>> turning off PGO?

Patrick McManus benchmarked the benefit of gcc's likely/unlikely macros
on the Linux kernel (where they are very commonly used). He found _no_
measurable differences after redefining likely/unlikely to nops.

Patrick ran his tests in 2008, so perhaps the results would different
with a recent version of gcc. I would also be interested is seeing test
results when reversing the macros' definitions (#define likely <->
unlikely). :)

http://bitsup.blogspot.com/2008/04/measuring-performance-of-linux-kernel.html

chris


Cameron Kaiser

unread,
Jan 31, 2013, 3:29:01 PM1/31/13
to
> http://bitsup.blogspot.com/2008/04/measuring-performance-of-linux-ker...

Drive-by comment: likely/unlikely is of course highly architecture
dependent. On PowerPC, for example, these may change the way the
likely bit is set on the branch, which is used as part of branch
prediction. On that and similar architectures, likely/unlikely can
significantly improve the performance of branchy code. I've started
writing this into TenFourFox in certain places. On cross-platform code
like the Linux kernel, I would hazard to say it makes quite a
difference across the spectrum.

But even where this feature doesn't exist (post-Netburst x86, most
ARM), it is my understanding that it will still cause code generation
to "favour" the likely branch such as making it more likely to stay in
the I-cache, etc.

Cameron Kaiser

Alex Keybl

unread,
Jan 31, 2013, 3:30:42 PM1/31/13
to David Anderson, jmat...@mozilla.com, dev-pl...@lists.mozilla.org
Just to echo what David said, PGO builds cause amorphous stability and even graphics/layout bugs (for instance bug 831296) that we're forced to investigate in engineering and QA for a specific release, even though the issues aren't typically caused by actual in-product regressions. Additionally, inconsistent PGO builds have been known to cause one-off crash spikes (like bug 799118) which prove to be a waste of time for our engineering/stability/QA groups.

It's definitely a world of pain that we should avoid if at all possible. I'm of the opinion that we should nuke it from orbit (option #1) if there's no obvious/significant user performance gains.

-Alex

Ehsan Akhgari

unread,
Jan 31, 2013, 3:40:13 PM1/31/13
to David Anderson, jmat...@mozilla.com, dev-pl...@lists.mozilla.org
As I explained in my first email, this thread does not cover PGO in the
JS engine.

Cheers,
Ehsan

David Anderson

unread,
Jan 31, 2013, 4:11:09 PM1/31/13
to David Anderson, jmat...@mozilla.com, dev-pl...@lists.mozilla.org
I have debugged such crashes outside of JS too. I expect PGO to be unstable everywhere.

-David

Jim Mathies

unread,
Jan 31, 2013, 4:28:14 PM1/31/13
to

cja...@gmail.com

unread,
Jan 31, 2013, 7:00:56 PM1/31/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
> Also, stupid question time: is it possible to build on Windows with
> GCC and/or clang?

Yes, even better, it's possible to build on Linux for Windows using GCC, see:

https://developer.mozilla.org/en-US/docs/Cross_Compile_Mozilla_for_Mingw32

It should be also possible to build on Windows, but AFAICS noone did it for a while, so it would probably require some fixes.

Those builds are not yet feature complete, but it's just a matter of (usually not too hard) fixes.

cja...@gmail.com

unread,
Jan 31, 2013, 7:14:02 PM1/31/13
to
W dniu czwartek, 31 stycznia 2013 14:21:06 UTC+1 użytkownik Joshua Cranmer napisał:
> On 1/31/2013 2:37 AM, Nicholas Nethercote wrote:
>
> > Also, stupid question time: is it possible to build on Windows with
>
> > GCC and/or clang?
>
>
> It's definitely possible to build with Mingw GCC, but that is a major
> ABI-breaking change,

Both extern "C" and XPCOM functions are ABI-compatible. The problem is only with "plain C++" calls, but those are mostly internal calls, so ABI breackage is not really a problem.

> and I think we lose the ability to compile against
> any Microsoft IDL interfaces.

This is a common misunderstanding. Compiling against IDL-based interfaces has always been possible. The problem was with compiling (MS) IDL files itself (which is uncommon thing to do, only accessibility module does that in whole m-c tree) and even that's possible now.

Jacek

cja...@gmail.com

unread,
Jan 31, 2013, 7:21:38 PM1/31/13
to Nicholas Nethercote, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
> I don't have a lot of experience with mingw32, but to the best of my
> knowledge, it's based on older versions of gcc (4.6?),
> and lacks 64-bit support

Currently the best option for mingw is mingw-w64 for that (besides what the name suggests) supports both 32 and 64-bit targets. Also it works with any version of GCC newer than 4.4, AFAIR.

> plus a number of C++ runtime library features,

mingw uses GCC's stdc++ and adds only a few limitations comparing to other stdc++ targets.

> and a number of MSVC features, such as SEH.

Yes, SEH is a problem. FWIW, GCC 4.8 will have limited SEH support for 64-bit targets.


Jacek

Anthony Jones

unread,
Jan 31, 2013, 11:20:44 PM1/31/13
to dev-pl...@lists.mozilla.org
On 31/01/13 17:40, Robert O'Callahan wrote:
> Also, reducing the number of directories that are PGO/LTCG should mean that
> the rate of growth decreases proportionally. Even more than proportionally,
> if we flip our default for entirely new modules to be non-PGO/LTCG, as I
> assume we would.

Profile guided PGO which does PGO only for files that show up as
hotspots in the profile :-P

Jonathan Kew

unread,
Feb 1, 2013, 4:45:26 AM2/1/13
to dev-pl...@lists.mozilla.org
How does the resulting performance compare to our current MSVC builds?
Or to non-PGO MSVC builds?

JK

cja...@gmail.com

unread,
Jan 31, 2013, 7:00:56 PM1/31/13
to mozilla.de...@googlegroups.com, Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org

David Anderson

unread,
Jan 31, 2013, 2:49:50 PM1/31/13
to mozilla.de...@googlegroups.com, dev-pl...@lists.mozilla.org, jmat...@mozilla.com
-David

David Anderson

unread,
Jan 31, 2013, 4:11:09 PM1/31/13
to mozilla.de...@googlegroups.com, David Anderson, jmat...@mozilla.com, dev-pl...@lists.mozilla.org

cja...@gmail.com

unread,
Jan 31, 2013, 7:21:38 PM1/31/13
to mozilla.de...@googlegroups.com, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Nicholas Nethercote, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org

Jean-Marc Desperrier

unread,
Feb 1, 2013, 10:41:11 AM2/1/13
to
cja...@gmail.com a écrit :
>> I don't have a lot of experience with mingw32, but to the best of my
>> >knowledge, it's based on older versions of gcc (4.6?),
>> >and lacks 64-bit support

> Currently the best option for mingw is mingw-w64 for that (besides
> what the name suggests) supports both 32 and 64-bit targets. Also it
> works with any version of GCC newer than 4.4, AFAIR.

The question here is about performance, and gcc compiled code does not
perform as well vc compiled one, by a good margin.


Nathan Froyd

unread,
Feb 1, 2013, 10:50:30 AM2/1/13
to Jean-Marc Desperrier, dev-pl...@lists.mozilla.org
----- Original Message -----
> cja...@gmail.com a écrit :
> > Currently the best option for mingw is mingw-w64 for that (besides
> > what the name suggests) supports both 32 and 64-bit targets. Also
> > it
> > works with any version of GCC newer than 4.4, AFAIR.
>
> The question here is about performance, and gcc compiled code does
> not
> perform as well vc compiled one, by a good margin.

Do you have examples that you can point to? I'm sure the GCC folks would be interested in hearing about concrete examples...

-Nathan

Jean-Marc Desperrier

unread,
Feb 1, 2013, 10:52:57 AM2/1/13
to
Ehsan Akhgari a �crit :
> I don't have a lot of experience with mingw32, but to the best of my
> knowledge, it's based on older versions of gcc (4.6?), and lacks 64-bit
> support

Ehsan, did you forget that there would be no memory problem with 64-bits
MSVC PGO builds ?

The trouble with going 64-bits is that the jit would then see some
significant regression, for cache pressure/instruction set related
reasons (ie gcc won't do it any better, probably worse)

Boris Zbarsky

unread,
Feb 1, 2013, 11:09:34 AM2/1/13
to
On 2/1/13 10:50 AM, Nathan Froyd wrote:
> Do you have examples that you can point to?

This certainly used to be the case for Mozilla code at one point, but it
may be worth remeasuring. Lots of gcc (and MSVC, and Mozilla code)
changes since then.

-Boris

Boris Zbarsky

unread,
Feb 1, 2013, 11:10:56 AM2/1/13
to
On 2/1/13 10:52 AM, Jean-Marc Desperrier wrote:
> The trouble with going 64-bits is that the jit would then see some
> significant regression, for cache pressure/instruction set related
> reasons

Do you have numbers here?

I'm aware of some regressions for things that involve traversing DOM
trees in a microbenchmark with 64-bit builds, but not for the JIT per
se. Well, apart from us having focused a bit more on 32-bit JIT perf
than 64-bit JIT perf, I guess.

-Boris

Jean-Marc Desperrier

unread,
Feb 1, 2013, 11:36:03 AM2/1/13
to
Nathan Froyd a écrit :
> Do you have examples that you can point to? I'm sure the GCC folks
> would be interested in hearing about concrete examples...

OK, there was many examples with older GCC versions, but it's not
guaranteed to be still true with the newest GCC which had significant
enhancements since 4.3
(some people do still find slower results
http://www.g-truc.net/post-0372.html#menu ).

This requires tests to tell for sure.

Jean-Marc Desperrier

unread,
Feb 1, 2013, 11:51:57 AM2/1/13
to
Boris Zbarsky a écrit :
I was just repeating what I had understood you and Mandelin were saying
in October 2011 and last July :-)

Rereading it more carefully, you were actually saying it needed tests,
and could not be assumed to be either faster or slower, becomes some
part could be faster, but some other also slower (for the reasons above).

I realize now what caused back then the 64-bits option to be axed was
probably more the concerns about binary compatibility.

However today, breaking binary compatibility with third-party tools
might be seen as more advantageous than a problem :-)

Ehsan Akhgari

unread,
Feb 1, 2013, 11:54:09 AM2/1/13
to Jean-Marc Desperrier, dev-pl...@lists.mozilla.org
On 2013-02-01 10:52 AM, Jean-Marc Desperrier wrote:
> Ehsan Akhgari a écrit :
>> I don't have a lot of experience with mingw32, but to the best of my
>> knowledge, it's based on older versions of gcc (4.6?), and lacks 64-bit
>> support
>
> Ehsan, did you forget that there would be no memory problem with 64-bits
> MSVC PGO builds ?

No, I didn't. However, we still need to have x86 builds, and deciding
to focus on a 64-bit release is a discussion off topic for this thread.
Please see the previous numerous threads on this on dev.platform.

Cheers,
Ehsan

Ted Mielczarek

unread,
Feb 1, 2013, 12:22:44 PM2/1/13
to dev-pl...@lists.mozilla.org
On 2/1/13 10:52 AM, Jean-Marc Desperrier wrote:
> Ehsan Akhgari a écrit :
This would be great, except for the fact that Microsoft's 64-bit
toolchain is even buggier. You cannot currently do a 64-bit PGO build of
Firefox without hitting an internal compiler error:
https://connect.microsoft.com/VisualStudio/feedback/details/686117

-Ted

Daniel Veditz

unread,
Feb 1, 2013, 1:51:45 PM2/1/13
to Ehsan Akhgari, dev-planning@lists.mozilla.org planning
On 1/30/2013 8:03 PM, Ehsan Akhgari wrote:
> It turns out that disabling PGO but keeping LTCG enabled reduces the
> memory usage by ~200MB, which means that it's not an effective
> measure. Disabling both LTCG and PGO brings down the linker's
> virtual memory usage to around 1GB, which means that we will not hit
> the maximum virtual memory size of 4GB for a *long* time.

Are the MS tools limited to 4GB? if not why aren't we using 8GB or more
machines? They are cheap compared to the cost of engineering time we're
spending trying to fit into a too-small box.

-Dan Veditz

Ted Mielczarek

unread,
Feb 1, 2013, 2:12:58 PM2/1/13
to dev-pl...@lists.mozilla.org
The Microsoft x86 toolchain is only available as x86 binaries, which
means it's limited to a 32-bit address space, thus 4GB of addressable
virtual memory. This is the root cause of this issue.

-Ted

Ehsan Akhgari

unread,
Feb 1, 2013, 2:26:20 PM2/1/13
to dev-pl...@lists.mozilla.org, Andreas Gal, Vladimir Vukicevic, David Mandelin, Johnathan Nightingale
On 2013-01-30 11:03 PM, Ehsan Akhgari wrote:
> We then tried to get a sense of how much of a win the PGO optimizations
> are. Thanks to a series of measurements by dmandelin, we know that
> disabling PGO/LTCG will result in a regression of about 10-20% on
> benchmarks which examine DOM and layout performance such as Dromaeo and
> guimark2 (and 40% in one case), but no significant regressions in the
> startup time, and gmail interactions. Thanks to a series of telemetry
> measurements performed by Vladan on a Nightly build we did last week
> which had PGO/LTCG disabled, there are no telemetry probes which show a
> significant regression on builds without PGO/LTCG. Vladan is going to
> try to get this data out of a Tp5 run tomorrow as well, but we don't
> have any evidence to believe that the results of that experiments will
> be any different.

Vladan performed the analysis on telemetry measures reported out of a
Tp5 run and the results seem to indicate that the performance of several
things such as GC and CC, image decoding, page loading, session restore,
search service initialization, etc. are heavily affected by turning off
PGO. Please see
<https://bugzilla.mozilla.org/show_bug.cgi?id=834003#c8> for the details
of the measurements, but this is new evidence in favor of options #2 and
#3, which unfortunately makes this a harder decision to make.

Cheers,
Ehsan

Nicholas Nethercote

unread,
Feb 1, 2013, 3:55:54 PM2/1/13
to Ehsan Akhgari, Andreas Gal, Vladimir Vukicevic, dev-pl...@lists.mozilla.org, Johnathan Nightingale, David Mandelin
On Sat, Feb 2, 2013 at 6:26 AM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>
> Vladan performed the analysis on telemetry measures reported out of a Tp5
> run and the results seem to indicate that the performance of several things
> such as GC and CC, image decoding, page loading, session restore, search
> service initialization, etc. are heavily affected by turning off PGO.

Has anyone tried running PGO and non-PGO builds of the same changeset
to see how they feel during ordinary browsing?

Nick

Brian Smith

unread,
Feb 1, 2013, 10:19:04 PM2/1/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
Ehsan Akhgari wrote:
> Given the above, I'd like to propose the following long-term
> solutions:

1. Did we try escalating a support request to Microsoft regarding this issue? I know it is kind of an odd thing, but it seems like if you are insistent enough and/or pay enough money, Microsoft engineers from the affected product will get assigned to help you with the problem [1]. Paid support is how Microsoft compensates for being closed-source. I would not be surprised if somebody with knowledge of the internals of the linker/compiler + experience with dealing with other customers' PGO issues could give us some very helpful advise.

2. AFAICT, we did not seriously investigate the possibility of splitting things out of libxul more. So far we've tried cutting things off the top of the dependency tree. Maybe now we need to try cutting things off the bottom of the dependency tree.

3. What is the performance difference between Visual Studio 2012 PGO builds and Visual Studio 2010 builds? IMO, before we decide whether to disable PGO on Windows, we need to get good benchmark results for Visual Studio **2012** PGO builds, to make sure we're not throwing away wins that could come "just" solving this problem in a different way + upgrading the compiler.

Cheers,
Brian

[1] Evidence: https://www.google.com/search?q=inurl%3Ahttp%3A%2F%2Fblogs.msdn.com%2Fb%2Foldnewthing%2F+"a+customer"

Benoit Jacob

unread,
Feb 1, 2013, 11:32:15 PM2/1/13
to Jean-Marc Desperrier, dev-pl...@lists.mozilla.org
As someone who's maintained a c++ scientific library (eigen.tuxfamily.org)
until 2 years ago:
- GCC >= 4.4 generated the fastest code of any compiler I tried when it
came out (MSVC, ICC, Clang); 4.5 was even better; I stopped tracking after
that. ICC had less bad auto-vectorization, but it was still bad enough that
you would still want to write vectorized code yourself anyways.
- But GCC's x86-32bit backend was not great; it's with x86-64bit that GCC
really shined.
- So if we adopted GCC on Windows, that would change the data in the 32bit
vs 64bit debate.
- Time has passed, Clang has improved a lot since I was working on that...

Benoit

2013/2/1 Jean-Marc Desperrier <jmd...@gmail.com>

> Nathan Froyd a écrit :
>
>> Do you have examples that you can point to? I'm sure the GCC folks
>> would be interested in hearing about concrete examples...
>>
>
> OK, there was many examples with older GCC versions, but it's not
> guaranteed to be still true with the newest GCC which had significant
> enhancements since 4.3
> (some people do still find slower results http://www.g-truc.net/post-**
> 0372.html#menu <http://www.g-truc.net/post-0372.html#menu> ).
>
> This requires tests to tell for sure.
>
> ______________________________**_________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/**listinfo/dev-platform<https://lists.mozilla.org/listinfo/dev-platform>
>

Ehsan Akhgari

unread,
Feb 4, 2013, 12:24:08 PM2/4/13
to Brian Smith, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2013-02-04 11:44 AM, Ehsan Akhgari wrote:
> 3. What is the performance difference between Visual Studio 2012 PGO
> builds and Visual Studio 2010 builds? IMO, before we decide whether
> to disable PGO on Windows, we need to get good benchmark results for
> Visual Studio **2012** PGO builds, to make sure we're not throwing
> away wins that could come "just" solving this problem in a different
> way + upgrading the compiler.
>
>
> That's something that we should probably measure as well. Filed bug
> 837724 for that.

Note that I misread this and thought you're talking about VS2010 PGO
builds versus VS2012 non-PGO builds, and that's what bug 837724 is
about. As I've already said in this thread, VS2012 uses more memory for
PGO compilations than VS2010, so upgrading to that for PGO builds is out
of the question.

Cheers,
Ehsan

Brian Smith

unread,
Feb 4, 2013, 4:27:28 PM2/4/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
Ehsan Akhgari wrote:
> Brian Smith wrote:
> > 2. AFAICT, we did not seriously investigate the possibility of
> > splitting things out of libxul more. So far we've tried cutting
> > things off the top of the dependency tree. Maybe now we need to try
> > cutting things off the bottom of the dependency tree.
>
> Can you please give some examples? Let's remember the days before
> libxul. It's hard to always make sure that you're accessing things
> that are properly exported from the target library, deal with
> internal and external APIs, etc.

Any of the non-XPCOM code (most of it imported?) like ipc/ a huge chunk of the underlying code for WebRTC. (I know we already are planning to split out that WebRTC code from libxul.)

Unfortunately, I am mostly a post-libxul Mozillian, so it would be better to have the XPCOM old-timers weigh in on the difficulty of having multiple libraries in Gecko. I guess the problem with splitting is that almost everything depends on XPCOM which is at the bottom of the libxul dependency tree. So, if we try to split things from the bottom, the first thing we'd have to move is XPCOM. But then we have to deal with internal vs. external XPCOM interfaces. Because I didn't have to deal with that issue before (which was pre-libxul), I cannot estimate how much effort and/or how much performance cost (e.g. relocation overhead) that would have. Also, I am not sure how much the internal vs. external issue was to deal with the need for a stable ABI vs. solving other problems like relocation overhead. Definitely we wouldn't need to have the stable ABI requirement for a split libxul, so perhaps the internal vs. external issue wouldn't be as painful as before? Also, from looking at a few parts of the code, it looks like we're already having to deal with this internal vs. external API problem to a certain extent for WebRTC. I do agree that this would be a non-trivial project.

Cheers,
Brian

Ehsan Akhgari

unread,
Feb 4, 2013, 4:38:08 PM2/4/13
to Brian Smith, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2013-02-04 4:27 PM, Brian Smith wrote:
> Ehsan Akhgari wrote:
>> Brian Smith wrote:
>>> 2. AFAICT, we did not seriously investigate the possibility of
>>> splitting things out of libxul more. So far we've tried cutting
>>> things off the top of the dependency tree. Maybe now we need to try
>>> cutting things off the bottom of the dependency tree.
>>
>> Can you please give some examples? Let's remember the days before
>> libxul. It's hard to always make sure that you're accessing things
>> that are properly exported from the target library, deal with
>> internal and external APIs, etc.
>
> Any of the non-XPCOM code (most of it imported?) like ipc/ a huge chunk of the underlying code for WebRTC. (I know we already are planning to split out that WebRTC code from libxul.)

Yeah, part of this code should be "easily" movable to an external
library (by easily, I mean, much eaiser than many other things.)

> Unfortunately, I am mostly a post-libxul Mozillian, so it would be better to have the XPCOM old-timers weigh in on the difficulty of having multiple libraries in Gecko. I guess the problem with splitting is that almost everything depends on XPCOM which is at the bottom of the libxul dependency tree. So, if we try to split things from the bottom, the first thing we'd have to move is XPCOM. But then we have to deal with internal vs. external XPCOM interfaces. Because I didn't have to deal with that issue before (which was pre-libxul), I cannot estimate how much effort and/or how much performance cost (e.g. relocation overhead) that would have. Also, I am not sure how much the internal vs. external issue was to deal with the need for a stable ABI vs. solving other problems like relocation overhead. Definitely we wouldn't need to have the stable ABI requirement for a split libxul, so perhaps the internal vs. external issue wouldn't be as painful as before? Also, from looking at a few pa
rts of the code, it looks like we're already having to deal with this internal vs. external API problem to a certain extent for WebRTC. I do agree that this would be a non-trivial project.

A big part of the problem is that exporting C++ classes across modules
is a pain, so in the past we used to put everything behind XPCOM
interfaces, so that we could call into the code using the vtable at
runtime as opposed to statically. Moving everything inside libxul has
allowed us perform huge cleanups by removing unneeded XPCOM interfaces,
etc. It also lets you benefit from compiler optimizations such as
inlining, devirtualization, etc.

Cheers,
Ehsan

Brian Smith

unread,
Feb 4, 2013, 4:39:46 PM2/4/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
That seems to be assuming that there is nothing reasonable we can do to make VS2012 PGO builds work. However, in order to know what is a reasonable amount of effort, you have to know what the benefits would be. For example, let's say we lived in a magical alternate universe where VS2012 PGO builds cut Firefox's memory usage by 50% and made everything twice as fast compared to VS2010 PGO builds. Then, we would consider even man-years of effort to be reasonable. On the other hand, if Firefox were twice as slow when built with VS2012 PGO builds, then no amount of effort would be reasonable. So, you have to know the performance difference between VS2012 PGO builds and VS2010 PGO builds before we can reject the possibility of VS2012 PGO.

Also, I want to echo khuey's comment: It seems like a lot of the argument against PGO is that, while our benchmarks are faster, users won't actually notice any difference. If that is true, then I agree with khuey that that is a massive systemic failure; we shouldn't be guiding development based on benchmarks that don't correlate positively with user-visible improvement. If all of our benchmarks showing the benefits of PGO are useless and there really isn't any difference between PGO and non-PGO builds, then I'm not going to push for us to continue doing PGO builds any more. But, in that case I hope we also come up with a plan for making better benchmarks.

And, also, if PGO doesn't have a significant positive performance difference, I would be very curious as to why not. Is PGO snake oil in general? Is there something about our codebase that is counter-productive to PGO? And, if the latter, then is there anything we can do to undo that counter-productivity?

Cheers,
Brian

Benjamin Smedberg

unread,
Feb 4, 2013, 5:01:25 PM2/4/13
to Brian Smith, Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On 2/4/2013 4:27 PM, Brian Smith wrote:
> Ehsan Akhgari wrote:
>> Brian Smith wrote:
>>> 2. AFAICT, we did not seriously investigate the possibility of
>>> splitting things out of libxul more. So far we've tried cutting
>>> things off the top of the dependency tree. Maybe now we need to try
>>> cutting things off the bottom of the dependency tree.
>> Can you please give some examples? Let's remember the days before
>> libxul. It's hard to always make sure that you're accessing things
>> that are properly exported from the target library, deal with
>> internal and external APIs, etc.
> Any of the non-XPCOM code (most of it imported?) like ipc/ a huge chunk of the underlying code for WebRTC. (I know we already are planning to split out that WebRTC code from libxul.)
The big problem didn't really have anything to do with XPCOM. It had to
do with the number and kind of relocations and calls. Relocations were
(and I believe still are) pretty expensive on many systems. The details
differ on ELF/Mach-O/PE, but it shows up the most on ELF systems, so
I'll discuss that in most detail. On ELF systems, calls against
hidden-visibility symbols in the same shared library can be made with
relative addressing and do not require any relocation. Calls against
default-visibility symbols in other libraries requires functions to load
the GOT (typically in the function prolog) and then the calls happen via
an indirect jump which the processor typically cannot predict.

libxul reduced the number of PLT jumps from about 400k to about 15k and
made it so that most gecko functions never load the GOT address in their
prolog. It decreased code size by about 6%, and this improved runtime
performance by about 5% (all of these stats are from memory, I'm not
sure whether the original data still exists). It also reduced the number
of relocations by about 50%, improving startup time by 2%.

Both Mach-o and PE are more efficient formats that don't use a PLT, so
they didn't have the same runtime cost associated with inter-library
calls. But the compiler can still generate more efficient code using
relative addressing for intra-library calls; the windows codesize from
libxul improved by about 1.5% and the runtime performance was improved
by about 1%.

If you have modules that are behind a C (not C++) API, the number of
exported symbols is typically minimal (you can count them), and it's
probably relatively safe to move that code to a separate shared library.
But since C++ APIs typically have many more symbols and those symbols
are sometimes small functions used more in more places, the cost of
relocations is far higher.

So basically the line for what we included in libxul was not precisely
"stuff that used XPCOM" but "stuff that used C++ APIs". The internal
XPCOM string API happens to be the most pervasive C++ API, so it became
a natural dividing line for many things.

I haven't read webRTC at all, so I can't comment on the relocation load
from splitting it off, but the IPC code is very C++-heavy and I fear
that we'd end up with lots of functions making cross-library calls
again, and suffer the codesize/performance costs associated with that.

--BDS

Ehsan Akhgari

unread,
Feb 4, 2013, 8:07:27 PM2/4/13
to Brian Smith, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
I'm entirely puzzled by what you're suggesting here... VS2012 PGO
builds work fine. It's just that the linker max vmem usage is higher,
as I have mentioned in my first post to this thread. So for the current
discussion, we don't get any gains from switching to VS2012 (and in fact
we lost ~200MB of memory.) As long as that stands, any other changes in
the characteristics of Firefox compiled with VS2012 (such as Firefox's
memory usage or speed) are irrelevant for the current discussion.

If and when we decide to look into upgrading to VS2012 for other
reasons, those other factors can be investigated.

> Also, I want to echo khuey's comment: It seems like a lot of the
argument against PGO is that, while our benchmarks are faster, users
won't actually notice any difference. If that is true, then I agree with
khuey that that is a massive systemic failure; we shouldn't be guiding
development based on benchmarks that don't correlate positively with
user-visible improvement. If all of our benchmarks showing the benefits
of PGO are useless and there really isn't any difference between PGO and
non-PGO builds, then I'm not going to push for us to continue doing PGO
builds any more. But, in that case I hope we also come up with a plan
for making better benchmarks.

Nobody knows whether that's true or not for a fact. As mentioned
previously in this thread, the measurements that Vladan performed seem
to suggest that PGO builds can have a user visible effect in terms of
some performance characteristics we measure. Measurements performed by
dmandelin based on what appears on the screen shows that the difference
for a typical small work load is smaller than the resolution of the
measurement. These two can happen at the same time, there is no
contradiction between them.

> And, also, if PGO doesn't have a significant positive performance
difference, I would be very curious as to why not. Is PGO snake oil in
general? Is there something about our codebase that is
counter-productive to PGO? And, if the latter, then is there anything we
can do to undo that counter-productivity?

We don't know of anything going particularly bad in PGO builds of our
code base. If you look at the generated PGO code for most functions in
Gecko, you'll see that the compiler has heavily optimizied just about
everything that there is in the non-PGO optimized version of the same
function.

Cheers,
Ehsan

Dave Mandelin

unread,
Feb 4, 2013, 9:59:17 PM2/4/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org, dand...@mozilla.com, Brian Hackett, sst...@mozilla.com
I was talking to Taras and Naveed about this today, and what also came up was:

4. Do the work to make 64-bit JS jit perf as good as 32-bit JS jit perf, and then switch to x64 builds for Windows. There are of course many issues involved with such a switch, but I believe that would fix the linker memory limit problem, and "make x64 jit perf real good" seems much more useful and rewarding long-term than trying to patch up a 32-bit build process that MS may not even be that interested in supporting any more.

Dave

Dave Mandelin

unread,
Feb 4, 2013, 9:59:17 PM2/4/13
to mozilla.de...@googlegroups.com, Ehsan Akhgari, Vladimir Vukicevic, Brian Hackett, sst...@mozilla.com, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org, dand...@mozilla.com
On Friday, February 1, 2013 7:19:04 PM UTC-8, Brian Smith wrote:

Ryan VanderMeulen

unread,
Feb 4, 2013, 10:07:45 PM2/4/13
to
On 2/4/2013 9:59 PM, Dave Mandelin wrote:
> I was talking to Taras and Naveed about this today, and what also came up was:
>
> 4. Do the work to make 64-bit JS jit perf as good as 32-bit JS jit perf, and then switch to x64 builds for Windows. There are of course many issues involved with such a switch, but I believe that would fix the linker memory limit problem, and "make x64 jit perf real good" seems much more useful and rewarding long-term than trying to patch up a 32-bit build process that MS may not even be that interested in supporting any more.
>
> Dave
>

Except that MSVC uses the 32bit linker for 64bit builds too (as was said
earlier in this thread). Not sure if that's the case for MSVC 2012, though.

Dave Mandelin

unread,
Feb 4, 2013, 10:08:29 PM2/4/13
to Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On Monday, February 4, 2013 1:39:46 PM UTC-8, Brian Smith wrote:
> Also, I want to echo khuey's comment: It seems like a lot of the argument against PGO is that, while our benchmarks are faster, users won't actually notice any difference. If that is true, then I agree with khuey that that is a massive systemic failure; we shouldn't be guiding development based on benchmarks that don't correlate positively with user-visible improvement. If all of our benchmarks showing the benefits of PGO are useless and there really isn't any difference between PGO and non-PGO builds, then I'm not going to push for us to continue doing PGO builds any more. But, in that case I hope we also come up with a plan for making better benchmarks.
>
> And, also, if PGO doesn't have a significant positive performance difference, I would be very curious as to why not. Is PGO snake oil in general? Is there something about our codebase that is counter-productive to PGO? And, if the latter, then is there anything we can do to undo that counter-productivity?

My measurements convinced me that PGO [1] is not useless, even if it's not also incredibly useful.

First, benchmarks do matter, even if we think they are not measuring the right thing. Also, benchmarks usually have problems but aren't complete disasters, and optimizing for imperfect benchmarks has yielded real gains.

Second, I did find a page that did dynamic rendering that had a 14% higher frame rate with PGO builds. Interestingly, Chrome, which I'm told does not use PGO for the same reasons we find it difficult to use, matches our PGO frame rate, which suggests there is something we could do in the code to make it faster. (With that code, would PGO make it 14% faster yet, or not?) Admittedly, that page isn't practical, but it is some sort of real page.

One thing I did notice today is that on a MacBook Air rendering time
is nontrivial, so we may be able to observe a CPU usage difference
and/or pageload time difference on similar Windows hardware.

For the most part, at some point we just need to make our choice about the inevitable tradeoffs, see what happens, and revise if necessary. But I'm personally very happy that the discussion is going on for so long, because it seems like more and more ideas for both alternative paths and for figuring out which way is better coming up. Correct me if I'm wrong, but as I see it, this discussion has turned into a great example of collaborative analysis and idea generation.

Dave

[1] as applied in our current setup, to our current code base, yada yada yada...

Dave Mandelin

unread,
Feb 4, 2013, 10:08:29 PM2/4/13
to mozilla.de...@googlegroups.com, Ehsan Akhgari, Vladimir Vukicevic, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org
On Monday, February 4, 2013 1:39:46 PM UTC-8, Brian Smith wrote:
> Also, I want to echo khuey's comment: It seems like a lot of the argument against PGO is that, while our benchmarks are faster, users won't actually notice any difference. If that is true, then I agree with khuey that that is a massive systemic failure; we shouldn't be guiding development based on benchmarks that don't correlate positively with user-visible improvement. If all of our benchmarks showing the benefits of PGO are useless and there really isn't any difference between PGO and non-PGO builds, then I'm not going to push for us to continue doing PGO builds any more. But, in that case I hope we also come up with a plan for making better benchmarks.
>
> And, also, if PGO doesn't have a significant positive performance difference, I would be very curious as to why not. Is PGO snake oil in general? Is there something about our codebase that is counter-productive to PGO? And, if the latter, then is there anything we can do to undo that counter-productivity?

Ehsan Akhgari

unread,
Feb 5, 2013, 10:53:09 AM2/5/13
to Dave Mandelin, Vladimir Vukicevic, Brian Hackett, sst...@mozilla.com, dev-planning@lists.mozilla.org planning, Johnathan Nightingale, mozilla.de...@googlegroups.com, David Mandelin, Andreas Gal, dev-pl...@lists.mozilla.org, dand...@mozilla.com
I agree, as long as we do have concrete plans for everything else which
would be involved in such a switch. The JS jit perf is really a small
piece of the puzzle, and a relatively easy one to solve since we control
all of the ends there.

Cheers,
Ehsan

It is loading more messages.
0 new messages