Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

mozilla-central/inbound closed -- we hit the Windows PGO memory limit

166 views
Skip to first unread message

Ehsan Akhgari

unread,
Jan 21, 2013, 3:47:53 PM1/21/13
to dev-pl...@lists.mozilla.org, JP Rosevear
I'm sorry that I have to be the bearer of bad news about our trees, but
we have finally started to hit the linker maximum memory size when
linking libxul as part of our Windows PGO builds, and as a result,
mozilla-central and inbound are CLOSED for now.

I propose the following:

1. mozilla-central/inbound will remain closed to all check-ins except
the ones that do not add any code which gets linked into libxul on Windows.
2. We need to start moving code out of libxul again. I have filed bug
827985 to move webrtc/trunk out of libxul, and we need a patch for that
*ASAP*. If you have ideas on other code which can be moved out of
libxul, please share them. We need ideas and patches immediately as
this blocks all development on mozilla-central.
3. We fix bug 827985 and any other bugs that will hopefully get proposed
in this thread, and measure the difference in PGO builds, and hopefully
get things to a point where we can reopen the tree temporarily.

Once we do the above, we can start having a conversation about what we
can do about this in the longer run. I'd appreciate if we keep this
thread focused on the immediate problem, and not on the larger problem
of what we can to in order to mitigate this in the long run.

Cheers,
Ehsan

Marco Bonardo

unread,
Jan 21, 2013, 4:23:39 PM1/21/13
to
On 21/01/2013 21:47, Ehsan Akhgari wrote:
> 1. mozilla-central/inbound will remain closed to all check-ins except
> the ones that do not add any code which gets linked into libxul on Windows.

Both inbound and central are APPROVAL REQUIRED as of now, any patches
that don't touch libxul can land with a=nonlibxul

-m

Robert O'Callahan

unread,
Jan 21, 2013, 4:52:39 PM1/21/13
to Ehsan Akhgari, JP Rosevear, dev-pl...@lists.mozilla.org
Exactly what control do we have over what gets PGOed? In particular:

1) Are we able to exclude particular object files or libraries from PGO
when we link libxul, reducing PGO memory usage for the final link? I think
you said "yes" on IRC.

2) Are we able to collect a set of object files, link them with PGO into a
static library, and then link the result into libxul, reducing PGO memory
usage for the final link?

Given the answer to #1 is yes, I think it's worth going through the
contents of libxul and disabling PGO on stuff that we deem unimportant. If
the answer to #2 is yes, we can independently-PGO those bits to limit the
perf hit.

Rob
--
Jesus called them together and said, “You know that the rulers of the
Gentiles lord it over them, and their high officials exercise authority
over them. Not so with you. Instead, whoever wants to become great among
you must be your servant, and whoever wants to be first must be your
slave — just
as the Son of Man did not come to be served, but to serve, and to give his
life as a ransom for many.” [Matthew 20:25-28]

Mike Hommey

unread,
Jan 21, 2013, 4:56:28 PM1/21/13
to Robert O'Callahan, JP Rosevear, Ehsan Akhgari, dev-pl...@lists.mozilla.org
On Tue, Jan 22, 2013 at 10:52:39AM +1300, Robert O'Callahan wrote:
> Exactly what control do we have over what gets PGOed? In particular:
>
> 1) Are we able to exclude particular object files or libraries from PGO
> when we link libxul, reducing PGO memory usage for the final link? I think
> you said "yes" on IRC.

Yes, we can disable PGO for given object files or entire subtrees
(theoretically).

> 2) Are we able to collect a set of object files, link them with PGO into a
> static library, and then link the result into libxul, reducing PGO memory
> usage for the final link?

No, unfortunately.

Mike

Kyle Huey

unread,
Jan 21, 2013, 4:58:36 PM1/21/13
to rob...@ocallahan.org, JP Rosevear, Ehsan Akhgari, dev-pl...@lists.mozilla.org
On Mon, Jan 21, 2013 at 1:52 PM, Robert O'Callahan <rob...@ocallahan.org>wrote:

> Exactly what control do we have over what gets PGOed? In particular:
>
> 1) Are we able to exclude particular object files or libraries from PGO
> when we link libxul, reducing PGO memory usage for the final link? I think
> you said "yes" on IRC.
>
> 2) Are we able to collect a set of object files, link them with PGO into a
> static library, and then link the result into libxul, reducing PGO memory
> usage for the final link?
>
> Given the answer to #1 is yes, I think it's worth going through the
> contents of libxul and disabling PGO on stuff that we deem unimportant. If
> the answer to #2 is yes, we can independently-PGO those bits to limit the
> perf hit.
>

The answer to #2 is definitely no.

- Kyle

Mike Hommey

unread,
Jan 21, 2013, 4:59:09 PM1/21/13
to Robert O'Callahan, JP Rosevear, Ehsan Akhgari, dev-pl...@lists.mozilla.org
On Mon, Jan 21, 2013 at 10:56:28PM +0100, Mike Hommey wrote:
> On Tue, Jan 22, 2013 at 10:52:39AM +1300, Robert O'Callahan wrote:
> > Exactly what control do we have over what gets PGOed? In particular:
> >
> > 1) Are we able to exclude particular object files or libraries from PGO
> > when we link libxul, reducing PGO memory usage for the final link? I think
> > you said "yes" on IRC.
>
> Yes, we can disable PGO for given object files or entire subtrees
> (theoretically).

NO_PROFILE_GUIDED_OPTIMIZE=1 for some files or directories should do it.
Note it's now possible to inherit such configs in subdirectories instead
of adding them to each and every Makefile.in in the subtree: add them in
a defs.mk file in the topmost directory.

Mike

Ehsan Akhgari

unread,
Jan 21, 2013, 5:36:52 PM1/21/13
to dev-pl...@lists.mozilla.org, JP Rosevear
Status update: we have landed three patches on mozilla-inbound which
disable PGO on the following directories (rdf/, image/ and accessible/)
and I have triggered PGO builds on top of them to see how much they can
shave off of the linker's vmem usage. Randel is also working on taking
some webrtc code out of libxul in the mean time.

If all of this proves to be ineffective, we can look into de-PGO-ing
more code.

Cheers,
Ehsan

Ehsan Akhgari

unread,
Jan 21, 2013, 11:32:34 PM1/21/13
to dev-pl...@lists.mozilla.org, JP Rosevear, Randell Jesup
Second status update:

The numbers from disabling PGO on image, accessible and webrtc are in, and
the linker max vmem size is down by only ~200MB, which is quite
disappointing, especially since according to Randell, putting webrtc
outside of libxul should buy us something around 600MB...

So, as desparate times require desparate measures, I went ahead and
disabled PGO on the following components as well: rdf (the original patch
there busted the tree so I backd it out), editor, svg, mathml, xslt,
embedding, storage, and the old HTML parser. I will not be awake long
enough tonight to see what the progress would look like, but those
interested can follow along here: <
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=WINNT%205.2%20.*%20pgo-build
>.

I'm planning to keep the tree APPROVAL REQUIRED for now. I will
re-evaluate the situation tomorrow, but I do expect that we will be able to
temporarily reopen the tree tomorrow. In the mean time, if you can think
about more components which will not be causing a big performance problem
by disabling PGO on them, please file a bug and make it block bug 832992
(and even better, copy a file like this to their top-level directory to
disable PGO on them:
https://hg.mozilla.org/integration/mozilla-inbound/file/357b9a855e10/rdf/defs.mk
).

Thanks!

--
Ehsan
<http://ehsanakhgari.org/>

Mike Hommey

unread,
Jan 22, 2013, 7:30:50 AM1/22/13
to Ehsan Akhgari, JP Rosevear, Randell Jesup, dev-pl...@lists.mozilla.org
On Mon, Jan 21, 2013 at 11:32:34PM -0500, Ehsan Akhgari wrote:
> Second status update:
>
> The numbers from disabling PGO on image, accessible and webrtc are in, and
> the linker max vmem size is down by only ~200MB, which is quite
> disappointing, especially since according to Randell, putting webrtc
> outside of libxul should buy us something around 600MB...

I doubt this is true. Since I was doing windows PGO builds for something
else, I figured I'd try --disable-webgl. The build failed with a
different internal compiler error (reliably, btw, which makes me wonder
if try uses the same msvc version) and linker max vsize was around
3700000000 (the changeset pushed to try being before any PGO disabling)

Mike

Mike Hommey

unread,
Jan 22, 2013, 7:35:14 AM1/22/13
to Ehsan Akhgari, JP Rosevear, Randell Jesup, dev-pl...@lists.mozilla.org
On Tue, Jan 22, 2013 at 01:30:50PM +0100, Mike Hommey wrote:
> On Mon, Jan 21, 2013 at 11:32:34PM -0500, Ehsan Akhgari wrote:
> > Second status update:
> >
> > The numbers from disabling PGO on image, accessible and webrtc are in, and
> > the linker max vmem size is down by only ~200MB, which is quite
> > disappointing, especially since according to Randell, putting webrtc
> > outside of libxul should buy us something around 600MB...
>
> I doubt this is true. Since I was doing windows PGO builds for something
> else, I figured I'd try --disable-webgl.

Err, --disable-webrtc.

Ehsan Akhgari

unread,
Jan 22, 2013, 9:06:07 AM1/22/13
to dev-pl...@lists.mozilla.org, JP Rosevear, Randell Jesup
Status update #3:

It seems like with PGO disabled for all of the above modules, we've now
decreased the linker max vmem size by about 500MB, which is nice. There is
one PGO build bustage <
https://tbpl.mozilla.org/php/getParsedLog.php?id=19006659&tree=Mozilla-Inbound>
which has been re-triggered, and I think we should wait to make sure that
it goes green, but then we should be able to reopen mozilla-inbound
temporarily, with mozilla-central following when we merge inbound to
central the next time. We should get the results of the re-triggered build
in about two hours. Stay tuned!

Cheers,

--
Ehsan
<http://ehsanakhgari.org/>


On Mon, Jan 21, 2013 at 11:32 PM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> Second status update:
>
> The numbers from disabling PGO on image, accessible and webrtc are in, and
> the linker max vmem size is down by only ~200MB, which is quite
> disappointing, especially since according to Randell, putting webrtc
> outside of libxul should buy us something around 600MB...
>

Ehsan Akhgari

unread,
Jan 22, 2013, 9:07:39 AM1/22/13
to Mike Hommey, JP Rosevear, Randell Jesup, dev-pl...@lists.mozilla.org
On Tue, Jan 22, 2013 at 7:35 AM, Mike Hommey <m...@glandium.org> wrote:

> On Tue, Jan 22, 2013 at 01:30:50PM +0100, Mike Hommey wrote:
> > On Mon, Jan 21, 2013 at 11:32:34PM -0500, Ehsan Akhgari wrote:
> > > Second status update:
> > >
> > > The numbers from disabling PGO on image, accessible and webrtc are in,
> and
> > > the linker max vmem size is down by only ~200MB, which is quite
> > > disappointing, especially since according to Randell, putting webrtc
> > > outside of libxul should buy us something around 600MB...
> >
> > I doubt this is true. Since I was doing windows PGO builds for something
> > else, I figured I'd try --disable-webgl.
>
> Err, --disable-webrtc.
>
> > The build failed with a
> > different internal compiler error (reliably, btw, which makes me wonder
> > if try uses the same msvc version) and linker max vsize was around
> > 3700000000 (the changeset pushed to try being before any PGO disabling)
>

Yes, Randell just confirmed this on IRC. It seems like the 600MB number
was incorrect.

Axel Hecht

unread,
Jan 22, 2013, 9:28:20 AM1/22/13
to
How are the perf numbers looking?

One of the reasons for asking is that I expect RDF to be part of the
startup and window-open codepaths, at least.

I'm not overly concerned, but wanted to make sure we look.

Axel

Benjamin Smedberg

unread,
Jan 22, 2013, 9:36:59 AM1/22/13
to Axel Hecht, dev-pl...@lists.mozilla.org
On 1/22/2013 9:28 AM, Axel Hecht wrote:
> How are the perf numbers looking?
>
> One of the reasons for asking is that I expect RDF to be part of the
> startup and window-open codepaths, at least.
I would not expect PGO to optimize any of the RDF code for speed, even
if they were in the startup codepaths.

Boy howdy do we need to get RDF out of the tree, though!

--BDS

Marco Bonardo

unread,
Jan 22, 2013, 9:45:47 AM1/22/13
to
I honestly have the same concerns regarding Storage, unfortunately we
don't have data to tell if disabling PGO on it will affect its many
consumers or not, apart the few talos measures :(
It's possible those won't be affected cause sqlite is built separately
and storage just wraps it, but there's no way to tell if, for example,
the awesomebar or startup may end up being slower until we get enough
telemetry.
-m

Ehsan Akhgari

unread,
Jan 22, 2013, 10:31:30 AM1/22/13
to Marco Bonardo, dev-pl...@lists.mozilla.org
Once we determine that disabling PGO on storage actually regresses the
performance, we can re-enable it. But note that unless a given code
path is examined throughout the profiling phase of a PGO build, PGO will
probably have negligible effect on it, if any. The PGO compiler looks
for hot code paths and tries to optimize those, so for example if the
awesomebar doesn't get examined during the profiling (which it isn't),
it is extremely unlikely that turning off PGO on the code responsible
for it would have any noticeable change on performance.

Cheers,
Ehsan

Ehsan Akhgari

unread,
Jan 22, 2013, 12:52:17 PM1/22/13
to dev-pl...@lists.mozilla.org, JP Rosevear, Randell Jesup
OK, everyone. Both mozilla-central and mozilla-inbound are *temporarily*
reopened now. Please be gentle.

Cheers,

--
Ehsan
<http://ehsanakhgari.org/>


Robert O'Callahan

unread,
Jan 22, 2013, 4:40:15 PM1/22/13
to Ehsan Akhgari, Marco Bonardo, dev-pl...@lists.mozilla.org
On Wed, Jan 23, 2013 at 4:31 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> But note that unless a given code path is examined throughout the
> profiling phase of a PGO build, PGO will probably have negligible effect on
> it, if any. The PGO compiler looks for hot code paths and tries to
> optimize those, so for example if the awesomebar doesn't get examined
> during the profiling (which it isn't), it is extremely unlikely that
> turning off PGO on the code responsible for it would have any noticeable
> change on performance.
>

I don't think this is a safe assumption. Our PGO builds not only do PGO but
also "Link Time Code Generation" which enables cross-module optimizations.
I have seen code being heavily optimized under PGO that I would not have
expected to be significant in our PGO profile.

It wouldn't be that hard to do an experiment to test the impact of PGO/LTCG
on code that's not in the profile.

Ehsan Akhgari

unread,
Jan 22, 2013, 5:09:10 PM1/22/13
to rob...@ocallahan.org, Marco Bonardo, dev-pl...@lists.mozilla.org
On 2013-01-22 4:40 PM, Robert O'Callahan wrote:
> On Wed, Jan 23, 2013 at 4:31 AM, Ehsan Akhgari <ehsan....@gmail.com
> <mailto:ehsan....@gmail.com>> wrote:
>
> But note that unless a given code path is examined throughout the
> profiling phase of a PGO build, PGO will probably have negligible
> effect on it, if any. The PGO compiler looks for hot code paths and
> tries to optimize those, so for example if the awesomebar doesn't
> get examined during the profiling (which it isn't), it is extremely
> unlikely that turning off PGO on the code responsible for it would
> have any noticeable change on performance.
>
>
> I don't think this is a safe assumption. Our PGO builds not only do PGO
> but also "Link Time Code Generation" which enables cross-module
> optimizations. I have seen code being heavily optimized under PGO that I
> would not have expected to be significant in our PGO profile.
>
> It wouldn't be that hard to do an experiment to test the impact of
> PGO/LTCG on code that's not in the profile.

Yeah, that would probably be an interesting experiment.

Cheers,
Ehsan

Mike Hommey

unread,
Jan 22, 2013, 5:27:10 PM1/22/13
to Ehsan Akhgari, Marco Bonardo, dev-pl...@lists.mozilla.org, rob...@ocallahan.org
FWIW, IIRC my experiments last time we had this problem, LTCG alone
accounts for less than a third of the performance boost we get from
PGO on Talos.

Mike

Joe Drew

unread,
Jan 23, 2013, 3:38:05 PM1/23/13
to Mike Hommey, Marco Bonardo, Ehsan Akhgari, dev-pl...@lists.mozilla.org, rob...@ocallahan.org
On 2013-01-22 5:27 PM, Mike Hommey wrote:
> FWIW, IIRC my experiments last time we had this problem, LTCG alone
> accounts for less than a third of the performance boost we get from
> PGO on Talos.

Did you happen to measure how big the linker got in memory doing only LTCG?

joe

Mike Hommey

unread,
Jan 23, 2013, 3:40:21 PM1/23/13
to Joe Drew, Marco Bonardo, Ehsan Akhgari, dev-pl...@lists.mozilla.org, rob...@ocallahan.org
IIRC, it wasn't significantly lower than LTCG+PGO. We could certainly
try again to have accurate figures.

Mike

Ehsan Akhgari

unread,
Jan 23, 2013, 4:46:35 PM1/23/13
to Joe Drew, Marco Bonardo, Mike Hommey, dev-pl...@lists.mozilla.org, rob...@ocallahan.org
On 2013-01-23 3:38 PM, Joe Drew wrote:
> On 2013-01-22 5:27 PM, Mike Hommey wrote:
>> FWIW, IIRC my experiments last time we had this problem, LTCG alone
>> accounts for less than a third of the performance boost we get from
>> PGO on Talos.
>
> Did you happen to measure how big the linker got in memory doing only LTCG?

https://bugzilla.mozilla.org/show_bug.cgi?id=833915#c6 suggests
something around 600-700MB, which is not sustainable in the long run.

Cheers,
Ehsan
0 new messages