Gecko Is Too Big (Or, Why the Tree Is Closed)

1941 views
Skip to first unread message

Kyle Huey

unread,
Dec 11, 2011, 9:27:06 PM12/11/11
to dev-platform, release, dev-tree-management
At the end of last week our Windows PGO builds started failing on
mozilla-inbound (https://bugzilla.mozilla.org/show_bug.cgi?id=709193).
After some investigation we determined that the problem seems to be that
the linker is running out of virtual address space during the optimization
phase.

This is not the first time we've run into this problem (e.g. Bug 543034).
A couple years ago we hit the 2 GB virtual address space limit. The build
machines were changed to use /3GB and that additional GB of address space
bought us some time. This time unfortunately the options aren't as easy as
flipping a switch.

As a temporary measure, we've turned off or ripped out a few new pieces of
code (Graphite, SPDY, libreg) which has brought us back down under the
limit for the moment. We don't really know how much breathing space we
have (but it's probably pretty small).

Our three options at this point:

1) Make libxul smaller - Either by removing code entirely or by splitting
things into separate shared libraries.
2) Move to MSVC 2010 - We know that changesets that reliably failed to link
on MSVC 2005 linked successfully with MSVC 2010. What we don't know is how
much this helps (I expect the answer is somewhere between a lot and a
little). We can't really do this for (at the bare minimum) a couple more
weeks anyways due to product considerations about what OSs we support.
3) Do our 32 bit builds on machines running a 64 bit OS. This will allow
the linker to use 4 GB of address space.

I think we need to pursue a combination of (1) in the short term and (3) in
the slightly less short term. Gal has some ideas on what we can do for (1)
that I'm investigating.

In the mean time, mozilla-inbound is closed, and mozilla-central is
restricted to approvals only. The only things currently allowed to land on
mozilla-central are:

- Test-only/NPOTB changes
- Changes that only touch Spidermonkey (which is not part of libxul on
Windows, and thus not contributing to the problem).
- Changes that only touch other cpp code that doesn't not end up in libxul
(cpp code in browser/, things like sqlite, angle, nss, nspr, etc).
- JS/XUL/HTML changes.

I'm hopeful that we can hack libxul enough to get the tree open
provisionally soon.

- Kyle

Kyle Huey

unread,
Dec 11, 2011, 9:27:52 PM12/11/11
to dev-platform
Also, I forgot to mention that Ed Morley deserves major thanks for all of
the investigation he's done on this.

- Kyle

Benoit Jacob

unread,
Dec 11, 2011, 9:53:33 PM12/11/11
to Kyle Huey, dev-platform
(Replying only to dev-platform)

If needed, WebGL offers some opportunities for splitting stuff away from libxul:
- the ANGLE shader compiler can easily be split to a separate lib.
- so could probably the WebGL implementation itself.

The ANGLE implementation of OpenGL ES2 on top of D3D9 is already separate DLLs.

Notice that external lib's are dlopen'd already when one creates a
WebGL context: libGL.so.1 on linux, the ANGLE GLES2 and
D3DX/D3DCompiler DLLs on Windows, etc. So it wouldn't make a big
difference. The WebGL impl is 180 K:

$ nm -S --radix=d libxul.so | grep -i WebGL | awk '{ SUM += $2} END {
print SUM/1024 }'
181.326

so, adding the ANGLE shader compiler, we'd probably have a library
weighing around 300 K of code (file size would be bigger).

Benoit

2011/12/11 Kyle Huey <m...@kylehuey.com>:
> _______________________________________________
> dev-tree-management mailing list
> dev-tree-...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tree-management

Andreas Gal

unread,
Dec 11, 2011, 9:59:00 PM12/11/11
to Benoit Jacob, Kyle Huey, dev-platform
> so, adding the ANGLE shader compiler, we'd probably have a library
> weighing around 300 K of code (file size would be bigger).

This sounds good. Can you please start on this? We aren't sure how much we have to take out to safely reopen the tree until we have a better fix (64-bit linker).

If any other module owners know of large chunks they can split out without affecting startup, please file bugs.

Andreas

Mike Hommey

unread,
Dec 12, 2011, 3:01:55 AM12/12/11
to Benoit Jacob, Kyle Huey, dev-platform
On Sun, Dec 11, 2011 at 09:53:33PM -0500, Benoit Jacob wrote:
> (Replying only to dev-platform)
>
> If needed, WebGL offers some opportunities for splitting stuff away from libxul:
> - the ANGLE shader compiler can easily be split to a separate lib.
> - so could probably the WebGL implementation itself.
>
> The ANGLE implementation of OpenGL ES2 on top of D3D9 is already separate DLLs.
>
> Notice that external lib's are dlopen'd already when one creates a
> WebGL context: libGL.so.1 on linux, the ANGLE GLES2 and
> D3DX/D3DCompiler DLLs on Windows, etc. So it wouldn't make a big
> difference. The WebGL impl is 180 K:
>
> $ nm -S --radix=d libxul.so | grep -i WebGL | awk '{ SUM += $2} END {
> print SUM/1024 }'
> 181.326
>
> so, adding the ANGLE shader compiler, we'd probably have a library
> weighing around 300 K of code (file size would be bigger).

I'm pretty sure the same could be done with the various media libraries
(vp8, ogg, etc.).

Mike

Robert Kaiser

unread,
Dec 12, 2011, 10:28:55 AM12/12/11
to
Kyle Huey schrieb:
> 1) Make libxul smaller - Either by removing code entirely or by splitting
> things into separate shared libraries.

If we're going with this, we should take a look what code is not in the
hot startup path and split out that. AFAIK, the reason for linking
everything into libxul was that startup is faster if we only need to
open one library instead of multiple. If we split off parts we don't
usually need at startup, we probably even make startup faster because
the library to be loaded is smaller - and we work around the Windows PGO
limit as well.

Robert Kaiser

Chris AtLee

unread,
Dec 12, 2011, 10:36:11 AM12/12/11
to
I'd like to propose

4) Stop doing PGO.

I think it's worth looking at what PGO is buying us these days. It costs
a lot in terms of build times and therefore build machine capacity. It's
also non-deterministic, which scares me a lot.

If we can determine where PGO is helping us, maybe move just those
pieces into a distinct library where we can do PGO.

Cheers,
Chris

Jean-Paul Rosevear

unread,
Dec 12, 2011, 10:47:25 AM12/12/11
to Chris AtLee, dev-pl...@lists.mozilla.org
Whether or not its the right thing to do long term, turning it off short term should let us quickly re-open the tree, correct? We can still go down the splitting out of big libxul chunks path and when we feel that's ready based on try results, we can turn PGO on again if need be.

I'll make this the main discussion topic for the engineering meeting tomorrow if people agree.

-JP

Jonathan Kew

unread,
Dec 12, 2011, 11:12:31 AM12/12/11
to Jean-Paul Rosevear, Chris AtLee, dev-pl...@lists.mozilla.org
On 12 Dec 2011, at 15:47, Jean-Paul Rosevear wrote:

> Whether or not its the right thing to do long term, turning it off short term should let us quickly re-open the tree, correct?

Yes, IMO. This would mean shipping non-PGO nightlies for the time being, which would presumably result in a significant perf regression, but one that we'd expect to recover when we update the build systems and can re-enable PGO.

I think some folk are concerned that we might land stuff in the meantime that seems fine on non-PGO builds/tests, but then fails under PGO when we eventually re-enable it. While that's a risk, I think it's a relatively small one, and we should accept it at this point as better than keeping m-c closed to most C++ development for an extended period.

I assume PGO is continuing to work as expected on mozilla-beta and mozilla-aurora trees, and so we have a breathing space before this problem hits the release channel. We'll need to disable PGO on aurora when the next m-c merge happens (unless we have overcome the problem by then), but I think we could live with that; we should aim to have a solution deployed before mozilla11 hits beta, however, so that we have the beta period to resolve any unexpected PGO-related failures that might crop up before this version goes to release.

So that gives us until the end of January to get the new compiler deployed, move to 64-bit builders, or whatever solution(s) we're going to use, or about 7 weeks, minus the Christmas and New Year holiday season.

JK

Justin Lebar

unread,
Dec 12, 2011, 11:12:37 AM12/12/11
to Jean-Paul Rosevear, Chris AtLee, dev-pl...@lists.mozilla.org
> I think it's worth looking at what PGO is buying us these days. It costs a lot in terms of build times and therefore
> build machine capacity.

Have a look at the Talos results in https://tbpl.mozilla.org/?rev=5c64fb241d4e

PGO is a large performance win. TP5 goes from 400 to 330, a speedup of 1.2x.

> [PGO is] also non-deterministic, which scares me a lot.

Are you sure non-pgo is deterministic? :) Not that you shouldn't be
scared by the extra non-determinism in PGO.

> If we can determine where PGO is helping us, maybe move just those pieces into a distinct library where we can
> do PGO.

This isn't a bad idea, but we need to be careful. "Where PGO is
helping us" doesn't mean "code we can compile without PGO without
causing a regression on our performance tests." Our performance tests
are hardly comprehensive.

On Mon, Dec 12, 2011 at 10:47 AM, Jean-Paul Rosevear <j...@mozilla.com> wrote:
> Whether or not its the right thing to do long term, turning it off short term should let us quickly re-open the tree, correct?  We can still go down the splitting out of big libxul chunks path and when we feel that's ready based on try results, we can turn PGO on again if need be.
>
> I'll make this the main discussion topic for the engineering meeting tomorrow if people agree.
>
> -JP
>
> ----- Original Message -----
> From: "Chris AtLee" <cat...@mozilla.com>
> To: dev-pl...@lists.mozilla.org
> Sent: Monday, December 12, 2011 10:36:11 AM
> Subject: Re: Gecko Is Too Big (Or, Why the Tree Is Closed)
>
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Marco Bonardo

unread,
Dec 12, 2011, 11:25:01 AM12/12/11
to
On 12/12/2011 17:12, Jonathan Kew wrote:
> I think some folk are concerned that we might land stuff in the meantime that seems fine on non-PGO builds/tests, but then fails under PGO when we eventually re-enable it. While that's a risk, I think it's a relatively small one, and we should accept it at this point as better than keeping m-c closed to most C++ development for an extended period.

It's not a so small risk, it happened 3 times in the last 2 months iirc,
that's the original reason philor asked to go back to always pgo, since
it was hard to track back the original changeset causing the problem
with intermittent pgo.
-m

Boris Zbarsky

unread,
Dec 12, 2011, 11:30:53 AM12/12/11
to
On 12/12/11 11:12 AM, Jonathan Kew wrote:
> I think some folk are concerned that we might land stuff in the meantime that seems fine on non-PGO builds/tests, but then fails under PGO when we eventually re-enable it. While that's a risk, I think it's a relatively small one

The data seems to show that such a checkin happens about once a week on
average (see the recent "Proposal to switch mozilla-inbound back to
always doing PGO builds" thread in dev.planning).

So either we think that we'll have PGO builds back up within much less
than a week, or the risk is decidedly not small, right?

> So that gives us until the end of January to get the new compiler deployed, move to 64-bit builders, or whatever solution(s) we're going to use, or about 7 weeks, minus the Christmas and New Year holiday season.

At which point we will need to find the average of 7 checkins that no
longer build with pgo that will land between now and then...

-Boris

Ehsan Akhgari

unread,
Dec 12, 2011, 11:34:32 AM12/12/11
to Justin Lebar, Chris AtLee, Jean-Paul Rosevear, dev-pl...@lists.mozilla.org
Moving code out of libxul is only a band-aid over the problem. Since we
don't have any reason to believe that the memory usage of the linker is
linear in terms of the code size, we can't be sure that removing 10% of the
code in libxul will give us 10% more breathing space. Also, moving code
out of libxul might break the sorts of optimizations that we've been doing
assuming that most of our code lives inside libxul (for example, libxul
preloading, etc.)

I agree with JP that the shortest path to reopening the trees is disabling
PGO builds. But we should also note that we're pretty close to the cut-off
date, which would mean that we would end up in a situation where we would
need to release Firefox 11 for Windows with PGO disabled, unless RelEng can
deploy 64-bit builders in time.

Moving to 64-bit builders gives us 33% more address space, which should be
enough for a while. But there is ultimately a hard limit on how much code
we can have in libxul before we hit the 4GB address space limit of the
linker. That might take a couple of more years, but my pessimistic side
thinks that it's going to happen sooner this time. ;-)

The only real fix is for us to get a 64-bit linker. I remember some folks
mentioning how Microsoft doesn't have plans on shipping one (my memory
might not be serving me well here). But really, we should talk to
Microsoft and find this out. If they're not planning to ship a 64-bit
linker within the next year or so, turning PGO off is just something that
we would have to do at some point in the future.

--
Ehsan
<http://ehsanakhgari.org/>


On Mon, Dec 12, 2011 at 11:12 AM, Justin Lebar <justin...@gmail.com>wrote:

> > I think it's worth looking at what PGO is buying us these days. It costs
> a lot in terms of build times and therefore
> > build machine capacity.
>
> Have a look at the Talos results in
> https://tbpl.mozilla.org/?rev=5c64fb241d4e
>
> PGO is a large performance win. TP5 goes from 400 to 330, a speedup of
> 1.2x.
>
> > [PGO is] also non-deterministic, which scares me a lot.
>
> Are you sure non-pgo is deterministic? :) Not that you shouldn't be
> scared by the extra non-determinism in PGO.
>
> > If we can determine where PGO is helping us, maybe move just those
> pieces into a distinct library where we can
> > do PGO.
>
> This isn't a bad idea, but we need to be careful. "Where PGO is
> helping us" doesn't mean "code we can compile without PGO without
> causing a regression on our performance tests." Our performance tests
> are hardly comprehensive.
>
> On Mon, Dec 12, 2011 at 10:47 AM, Jean-Paul Rosevear <j...@mozilla.com>
> wrote:
> > Whether or not its the right thing to do long term, turning it off short
> term should let us quickly re-open the tree, correct? We can still go down
> the splitting out of big libxul chunks path and when we feel that's ready
> based on try results, we can turn PGO on again if need be.
> >
> > I'll make this the main discussion topic for the engineering meeting
> tomorrow if people agree.
> >
> > -JP
> >
> > ----- Original Message -----
> > From: "Chris AtLee" <cat...@mozilla.com>
> > To: dev-pl...@lists.mozilla.org
> > Sent: Monday, December 12, 2011 10:36:11 AM
> > Subject: Re: Gecko Is Too Big (Or, Why the Tree Is Closed)
> >

Mike Hommey

unread,
Dec 12, 2011, 11:58:36 AM12/12/11
to Ehsan Akhgari, Chris AtLee, Jean-Paul Rosevear, Justin Lebar, dev-pl...@lists.mozilla.org
On Mon, Dec 12, 2011 at 11:34:32AM -0500, Ehsan Akhgari wrote:
> Moving code out of libxul is only a band-aid over the problem. Since we
> don't have any reason to believe that the memory usage of the linker is
> linear in terms of the code size, we can't be sure that removing 10% of the
> code in libxul will give us 10% more breathing space. Also, moving code
> out of libxul might break the sorts of optimizations that we've been doing
> assuming that most of our code lives inside libxul (for example, libxul
> preloading, etc.)
>
> I agree with JP that the shortest path to reopening the trees is disabling
> PGO builds. But we should also note that we're pretty close to the cut-off
> date, which would mean that we would end up in a situation where we would
> need to release Firefox 11 for Windows with PGO disabled, unless RelEng can
> deploy 64-bit builders in time.
>
> Moving to 64-bit builders gives us 33% more address space, which should be
> enough for a while. But there is ultimately a hard limit on how much code
> we can have in libxul before we hit the 4GB address space limit of the
> linker. That might take a couple of more years, but my pessimistic side
> thinks that it's going to happen sooner this time. ;-)
>
> The only real fix is for us to get a 64-bit linker. I remember some folks
> mentioning how Microsoft doesn't have plans on shipping one (my memory
> might not be serving me well here). But really, we should talk to
> Microsoft and find this out. If they're not planning to ship a 64-bit
> linker within the next year or so, turning PGO off is just something that
> we would have to do at some point in the future.

Note that MSVC2010 uses less memory, since it can link with 3GB memory
with PGO enabled.

Mike

Jonathan Kew

unread,
Dec 12, 2011, 12:20:18 PM12/12/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
On 12 Dec 2011, at 16:30, Boris Zbarsky wrote:

> On 12/12/11 11:12 AM, Jonathan Kew wrote:
>> I think some folk are concerned that we might land stuff in the meantime that seems fine on non-PGO builds/tests, but then fails under PGO when we eventually re-enable it. While that's a risk, I think it's a relatively small one
>
> The data seems to show that such a checkin happens about once a week on average (see the recent "Proposal to switch mozilla-inbound back to always doing PGO builds" thread in dev.planning).

Has that always been the case, or is this relatively high frequency a relatively recent phenomenon?

I'm assuming the address-space limit we've hit is not based simply on "raw" codesize (we don't have 3GB of code, do we?) but rather the total of various structures that the compiler/linker builds internally in order to support its optimization and code-gen process. And so it relates somehow to complexity/inter-relationships as well as raw size, and given that we've presumably been fairly close to the breaking point for a while, I'd think it quite possible that some of the "internal compiler error" failures were in fact out-of-address-space failures, due to a checkin modifying code (without necessarily _adding_ much) in a way that happens to be more memory-hungry for the compiler to handle.

So once we raise that ceiling, we may see a reduction in the incidence of PGO failure on apparently-innocent checkins.

> So either we think that we'll have PGO builds back up within much less than a week, or the risk is decidedly not small, right?
>
>> So that gives us until the end of January to get the new compiler deployed, move to 64-bit builders, or whatever solution(s) we're going to use, or about 7 weeks, minus the Christmas and New Year holiday season.
>
> At which point we will need to find the average of 7 checkins that no longer build with pgo that will land between now and then...

I don't doubt that it happens, but I think having to tackle a handful of these on aurora during January and/or beta during February would be better than blocking much C++ development for an extended period - and dealing with the resulting pressure on the tree when it re-opens and everyone wants to land the stuff they've been holding back in the meantime.

And if releng can get us onto VS2010 and/or 64-bit builders more quickly - which I hope is possible, but don't know what's actually involved in making the switch - the number of such problematic checkins will presumably be correspondingly smaller.

JK

Boris Zbarsky

unread,
Dec 12, 2011, 12:34:48 PM12/12/11
to
On 12/12/11 12:20 PM, Jonathan Kew wrote:
> I don't doubt that it happens, but I think having to tackle a handful of these on aurora during January and/or beta during February would be better than blocking much C++ development for an extended period

I'm not actually sure it is. At that point we'll have to first find the
checkins responsible, then figure out how to fix them, possibly backing
them and other things out.

I suspect the net effect will be similar to holding the tree closed for
several days now, but time-shifted into January/February.

If we think we'll need to have the tree closed for longer than a few
days, I agree that disabling PGO temporarily sounds more palatable.

-Boris

Chris AtLee

unread,
Dec 12, 2011, 12:57:23 PM12/12/11
to
On 12/12/11 12:20 PM, Jonathan Kew wrote:
> And if releng can get us onto VS2010 and/or 64-bit builders more quickly - which I hope is possible, but don't know what's actually involved in making the switch - the number of such problematic checkins will presumably be correspondingly smaller.

The 32-bit builders currently have VS2010 installed on them in addition
to VS2005. There are other issues preventing switching over to 2010
however; iirc switching to 2010 breaks firefox on older versions of
windows XP.

Chris AtLee

unread,
Dec 12, 2011, 1:16:35 PM12/12/11
to
On 12/12/11 11:12 AM, Justin Lebar wrote:
>> I think it's worth looking at what PGO is buying us these days. It costs a lot in terms of build times and therefore
>> build machine capacity.
>
> Have a look at the Talos results in https://tbpl.mozilla.org/?rev=5c64fb241d4e
>
> PGO is a large performance win. TP5 goes from 400 to 330, a speedup of 1.2x.

Sure, but things like Dromaeo tests don't seem to be affected at all:

http://graphs-new.mozilla.org/graph.html#tests=[[75,94,1],[75,1,1]]&sel=none&displayrange=7&datatype=running

http://graphs-new.mozilla.org/graph.html#tests=[[76,94,1],[76,1,1]]&sel=none&displayrange=7&datatype=running

But SVG is:

http://graphs-new.mozilla.org/graph.html#tests=[[57,94,1],[57,1,1]]&sel=none&displayrange=7&datatype=running

Boris Zbarsky

unread,
Dec 12, 2011, 1:31:15 PM12/12/11
to
On 12/12/11 1:16 PM, Chris AtLee wrote:
> Sure, but things like Dromaeo tests don't seem to be affected at all:
>
> http://graphs-new.mozilla.org/graph.html#tests=[[75,94,1],[75,1,1]]&sel=none&displayrange=7&datatype=running

That's Dromaeo-Sunspider.
And that's Dromaeo-V8.

Both are pure JS tests. For pure JS tests, time is either spent in
jitcode (not affected by PGO) or in libmozjs (which is compiled with PGO
disabled already on Windows because as far as we can tell VS 2005 PGO
miscompiles it; see https://bugzilla.mozilla.org/show_bug.cgi?id=673518 ).

Try this graph for Dromaeo-DOM:

http://graphs-new.mozilla.org/graph.html#tests=[[73,1,1],[73,94,1]]&sel=none&displayrange=7&datatype=running

It shows the PGO builds doing about 267 runs/s while the non-PGO ones
are doing about 209 runs/s. So about 25% speedup.

(Amusingly,
http://graphs-new.mozilla.org/graph.html#tests=[[72,94,1],[72,1,1]]&sel=none&displayrange=7&datatype=running
also shows no speedup, because contrary to its name Dromaeo-CSS is
largely a JS test in practice.)

-Boris

Benoit Jacob

unread,
Dec 12, 2011, 1:56:47 PM12/12/11
to Ehsan Akhgari, dev-platform
2011/12/12 Ehsan Akhgari <ehsan....@gmail.com>:
> Moving code out of libxul is only a band-aid over the problem.  Since we
> don't have any reason to believe that the memory usage of the linker is
> linear in terms of the code size, we can't be sure that removing 10% of the
> code in libxul will give us 10% more breathing space.  Also, moving code
> out of libxul might break the sorts of optimizations that we've been doing
> assuming that most of our code lives inside libxul (for example, libxul
> preloading, etc.)

This argument, however, doesn't apply equally well to all parts of
libxul. Some parts are relatively self-contained, with critical loops
that are well-identified, don't interact with other parts of libxul,
and already optimized i.e. coded in such a way that PGO won't make
them faster than -O2. I think that WebGL is such an example.

To put it another way, there's a limit to the scale at which PGO makes
sense, or else we should just link all the software on a computed as a
single file...

Benoit

>
> I agree with JP that the shortest path to reopening the trees is disabling
> PGO builds.  But we should also note that we're pretty close to the cut-off
> date, which would mean that we would end up in a situation where we would
> need to release Firefox 11 for Windows with PGO disabled, unless RelEng can
> deploy 64-bit builders in time.
>
> Moving to 64-bit builders gives us 33% more address space, which should be
> enough for a while.  But there is ultimately a hard limit on how much code
> we can have in libxul before we hit the 4GB address space limit of the
> linker.  That might take a couple of more years, but my pessimistic side
> thinks that it's going to happen sooner this time.  ;-)
>
> The only real fix is for us to get a 64-bit linker.  I remember some folks
> mentioning how Microsoft doesn't have plans on shipping one (my memory
> might not be serving me well here).  But really, we should talk to
> Microsoft and find this out.  If they're not planning to ship a 64-bit
> linker within the next year or so, turning PGO off is just something that
> we would have to do at some point in the future.
>
> --
> Ehsan
> <http://ehsanakhgari.org/>
>
>
> On Mon, Dec 12, 2011 at 11:12 AM, Justin Lebar <justin...@gmail.com>wrote:
>
>> > I think it's worth looking at what PGO is buying us these days. It costs
>> a lot in terms of build times and therefore
>> > build machine capacity.
>>
>> Have a look at the Talos results in
>> https://tbpl.mozilla.org/?rev=5c64fb241d4e
>>
>> PGO is a large performance win.  TP5 goes from 400 to 330, a speedup of
>> 1.2x.
>>
>> > [PGO is] also non-deterministic, which scares me a lot.
>>
>> Are you sure non-pgo is deterministic?  :)  Not that you shouldn't be
>> scared by the extra non-determinism in PGO.
>>
>> > If we can determine where PGO is helping us, maybe move just those
>> pieces into a distinct library where we can
>> > do PGO.
>>
>> This isn't a bad idea, but we need to be careful.  "Where PGO is
>> helping us" doesn't mean "code we can compile without PGO without
>> causing a regression on our performance tests."  Our performance tests
>> are hardly comprehensive.
>>
>> On Mon, Dec 12, 2011 at 10:47 AM, Jean-Paul Rosevear <j...@mozilla.com>
>> wrote:
>> > Whether or not its the right thing to do long term, turning it off short
>> term should let us quickly re-open the tree, correct?  We can still go down
>> the splitting out of big libxul chunks path and when we feel that's ready
>> based on try results, we can turn PGO on again if need be.
>> >
>> > I'll make this the main discussion topic for the engineering meeting
>> tomorrow if people agree.
>> >
>> > -JP
>> >
>> > ----- Original Message -----
>> > From: "Chris AtLee" <cat...@mozilla.com>
>> > To: dev-pl...@lists.mozilla.org
>> > Sent: Monday, December 12, 2011 10:36:11 AM
>> > Subject: Re: Gecko Is Too Big (Or, Why the Tree Is Closed)
>> >

Ehsan Akhgari

unread,
Dec 12, 2011, 2:14:58 PM12/12/11
to Benoit Jacob, dev-platform
On Mon, Dec 12, 2011 at 1:56 PM, Benoit Jacob <jacob.b...@gmail.com>wrote:

> 2011/12/12 Ehsan Akhgari <ehsan....@gmail.com>:
> > Moving code out of libxul is only a band-aid over the problem. Since we
> > don't have any reason to believe that the memory usage of the linker is
> > linear in terms of the code size, we can't be sure that removing 10% of
> the
> > code in libxul will give us 10% more breathing space. Also, moving code
> > out of libxul might break the sorts of optimizations that we've been
> doing
> > assuming that most of our code lives inside libxul (for example, libxul
> > preloading, etc.)
>
> This argument, however, doesn't apply equally well to all parts of
> libxul. Some parts are relatively self-contained, with critical loops
> that are well-identified, don't interact with other parts of libxul,
> and already optimized i.e. coded in such a way that PGO won't make
> them faster than -O2. I think that WebGL is such an example.
>

There is also the question of which interfaces the code in question can
use. For example, if the code in question calls a function on an object,
and the code for the said object lives outside of its module, the function
needs to either be virtual or publicly exported.


> To put it another way, there's a limit to the scale at which PGO makes
> sense, or else we should just link all the software on a computed as a
> single file...


I think that's an unfair comparison. Theoretically, if we had linkers
which could use 64-bit address space, we could take advantage of PGO
without needing to put all of the code inside a single source file.
Problem is, we don't have those linkers for now. :(

Cheers,
--
Ehsan
<http://ehsanakhgari.org/>

Kyle Huey

unread,
Dec 12, 2011, 2:18:17 PM12/12/11
to dev-platform
Status update:

We have two patches in hand (Bug 709657 and Bug 709721) to split out a
couple chunks of libxul. I tested one of them last night and it got the
final xul.dll size below the size of mozilla-beta's xul.dll by a couple
hundred kilobytes.

If we're willing to make the assumption that final binary size and peak
linker memory consumption are somewhat correlated then these two bugs
should buy us a fair amount of time (or code size, I suppose).

- Kyle

Jonathan Kew

unread,
Dec 12, 2011, 2:25:18 PM12/12/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
But if we expect we'll be able to re-open (with PGO) within a few days anyway, then we'll only be dealing with a few days' worth of non-PGO'd checkins that might have problems that need to be tracked down once PGO is back.

So I don't see much benefit to holding the tree mostly-closed at this point. Either we can get the PGO builds working again soon, in which case the odds are pretty good that they'll "just work" with whatever patches have landed - it's not like we break them on a daily basis - or it's going to take "longer than a few days", in which case we really can't afford to block development while we wait for it.

JK

Ted Mielczarek

unread,
Dec 12, 2011, 2:32:14 PM12/12/11
to Benoit Jacob, Ehsan Akhgari, dev-platform
On Mon, Dec 12, 2011 at 1:56 PM, Benoit Jacob <jacob.b...@gmail.com> wrote:
> This argument, however, doesn't apply equally well to all parts of
> libxul. Some parts are relatively self-contained, with critical loops
> that are well-identified, don't interact with other parts of libxul,
> and already optimized i.e. coded in such a way that PGO won't make
> them faster than -O2. I think that WebGL is such an example.

This is an almost impossible statement to make. Even highly optimized
code can be made faster by the PGO optimizer, because it does
optimizations like:
* massive inlining
* speculative virtual call inlining
* hot+cold function block separation

which are incredibly hard to replicate without hand-crafting unreadable code.

> To put it another way, there's a limit to the scale at which PGO makes
> sense, or else we should just link all the software on a computed as a
> single file...

This is probably false. If the compiler could inline your system
library calls and things like that, your software would likely be
faster. It's only because of API boundaries that things like that
don't happen.

-Ted

Ehsan Akhgari

unread,
Dec 12, 2011, 2:58:14 PM12/12/11
to Kyle Huey, dev-platform
I have an idea which might enable us to use VS2010 to build binaries that
will run with Win2k, XP and XP SP1. We were going to switch to VS2010 for
Gecko 12 anyways, so if we can get this to work, we can switch to VS2010
today and we wouldn't need to rip out anything either.

I will have useful results in a few hours.

Cheers,
--
Ehsan
<http://ehsanakhgari.org/>


Zack Weinberg

unread,
Dec 12, 2011, 3:02:34 PM12/12/11
to
On 2011-12-12 7:28 AM, Robert Kaiser wrote:
> Kyle Huey schrieb:
>> 1) Make libxul smaller - Either by removing code entirely or by splitting
>> things into separate shared libraries.
>
> If we're going with this, we should take a look what code is not in the
> hot startup path and split out that. AFAIK, the reason for linking
> everything into libxul was that startup is faster if we only need to
> open one library instead of multiple.

It also allows more aggressive deCOMtamination, although there might be
a way to work around that. I don't remember exactly what the problem
was, but I know I had to postpone some deCOM patches till we went
libxul-only because they caused link errors in a non-libxul build. bz
or bsmedberg probably know why.

zw

Benoit Jacob

unread,
Dec 12, 2011, 3:35:27 PM12/12/11
to Ted Mielczarek, Ehsan Akhgari, dev-platform
2011/12/12 Ted Mielczarek <t...@mielczarek.org>:
> On Mon, Dec 12, 2011 at 1:56 PM, Benoit Jacob <jacob.b...@gmail.com> wrote:
>> This argument, however, doesn't apply equally well to all parts of
>> libxul. Some parts are relatively self-contained, with critical loops
>> that are well-identified, don't interact with other parts of libxul,
>> and already optimized i.e. coded in such a way that PGO won't make
>> them faster than -O2. I think that WebGL is such an example.
>
> This is an almost impossible statement to make. Even highly optimized
> code can be made faster by the PGO optimizer, because it does
> optimizations like:
> * massive inlining

It's not hard to get the compiler to inline the functions that you
specifically know need to be inlined; and for the cases where we know
that compilers are getting it wrong even with inline keywords, we just
need to fix NS_ALWAYS_INLINE (
https://bugzilla.mozilla.org/show_bug.cgi?id=697810 ) and use that
macro. WebGL is already using such a macro (
https://bugzilla.mozilla.org/show_bug.cgi?id=697450 )

> * speculative virtual call inlining

The kind of optimized code that I'm talking about, doesn't use virtual
functions for stuff that needs to be inlined. I dont deny that there
are cases where you'd really need that, but that's not the case of a
majority of 'hot' code.

> * hot+cold function block separation

If you just write carefully hot code such that -O2 will compile it
well, you're fine. PGO is more useful for when you have a big codebase
and you either dont know where the hot parts are, or know that you
don't have time to carefully optimize all of them...

>
> which are incredibly hard to replicate without hand-crafting unreadable code.
>
>> To put it another way, there's a limit to the scale at which PGO makes
>> sense, or else we should just link all the software on a computed as a
>> single file...
>
> This is probably false. If the compiler could inline your system
> library calls and things like that, your software would likely be
> faster. It's only because of API boundaries that things like that
> don't happen.

Yes it could be faster, but would that be desirable? Diminishing returns etc.

Benoit

Brian Smith

unread,
Dec 12, 2011, 5:43:42 PM12/12/11
to Ehsan Akhgari, dev-pl...@lists.mozilla.org, necko...@mozilla.org
Ehsan Akhgari wrote:
> The only real fix is for us to get a 64-bit linker. I remember some
> folks mentioning how Microsoft doesn't have plans on shipping one
> (my memory might not be serving me well here). But really, we should
> talk to Microsoft and find this out. If they're not planning to ship
> a 64-bit linker within the next year or so, turning PGO off is just
> something that we would have to do at some point in the future.

We could, and probably will have to, permanently split libxul (differently than we already do).

Microsoft is not going to provide us with a 64-bit linker in any reasonable timeframe. We cannot switch wholesale to any other linker because we need the optimizations (including PGO) in Microsoft's.

Note that we effectively split libxul into multiple pieces already: libxul, mozjs, libssl3, libnss, libnssutil, libsmime, libsoftokn, libfreebl, libnspr4, libplds4, libplc4, etc. We have bugs on file to combine these together, to improve startup time, which would add 4MB (FOUR MEGABYTES) of code to libxul. We also have WebRTC and other major features that will add, on their own, hundreds of kilobytes of code, at least. I wouldn't be surprised if, this time next year, libxul would be 33-50% larger than it is now if we follow through with these plans. If we assume that the compiler/linker requires AT LEAST a linear increase in memory relative to code size, then even 3GB -> 4GB address space is going to be insufficient.

So, IMO, short-term fixes that move non-startup code out of libxul *that does NOT depend on NSPR, NSS, or mozjs* is the best solution because that would seem to be the long-term solution too.

Unfortunately, this means that Necko and PSM are actually very poor candidates (long-term) for splitting out of libxul, if we plan ahead for the time where NSPR and NSS are linked into libxul for startup performance reasons.

- Brian

Robert O'Callahan

unread,
Dec 12, 2011, 6:04:08 PM12/12/11
to Ehsan Akhgari, Kyle Huey, dev-platform
On Tue, Dec 13, 2011 at 8:58 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> I have an idea which might enable us to use VS2010 to build binaries that
> will run with Win2k, XP and XP SP1. We were going to switch to VS2010 for
> Gecko 12 anyways, so if we can get this to work, we can switch to VS2010
> today and we wouldn't need to rip out anything either.
>
> I will have useful results in a few hours.
>

What's the idea? I gather from #developers it didn't work, but I'd still
like to know :-).

Rob
--
"If we claim to be without sin, we deceive ourselves and the truth is not
in us. If we confess our sins, he is faithful and just and will forgive us
our sins and purify us from all unrighteousness. If we claim we have not
sinned, we make him out to be a liar and his word is not in us." [1 John
1:8-10]

Robert O'Callahan

unread,
Dec 12, 2011, 6:29:32 PM12/12/11
to Benoit Jacob, dev-platform, Ehsan Akhgari, Ted Mielczarek
On Tue, Dec 13, 2011 at 9:35 AM, Benoit Jacob <jacob.b...@gmail.com>wrote:

> > * hot+cold function block separation
> If you just write carefully hot code such that -O2 will compile it
> well, you're fine.
>

You can't manually separate all the hot from cold code, e.g. error handling
paths within a hot function.

Robert O'Callahan

unread,
Dec 12, 2011, 6:33:48 PM12/12/11
to Ehsan Akhgari, Benoit Jacob, dev-platform
On Tue, Dec 13, 2011 at 8:14 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> I think that's an unfair comparison. Theoretically, if we had linkers
> which could use 64-bit address space, we could take advantage of PGO
> without needing to put all of the code inside a single source file.
> Problem is, we don't have those linkers for now. :(
>

There's the 64-bit linker that builds 64-bit binaries.

So we should always be able to ship 64-bit single-libxul PGO builds. We
should do that. (Along with the other efforts for 32-bit builds.)

Jean-Paul Rosevear

unread,
Dec 12, 2011, 6:53:13 PM12/12/11
to Kyle Huey, dev-platform
Further summary.

Please check the assertions, actions and questions below. If you disagree with an assertion, please call it out. If you have an action attributed to you and you aren't the right person, please call it out. If you know the answer to open questions, call out.

tl;dr;

Goal is to re-open tree and limit risk to future shipping products.

Slicing out stuff from libxul gets re-opening the tree and through shipping FF11. After that we switch to VS2010 or we get VS2005 on 64bit builders.

=====

* Assertions:
1) We can't link PGO Windows builds because we are OOM at 3GB:
https://bugzilla.mozilla.org/show_bug.cgi?id=709193

2) VS2005 is not installed on our 64bit builders. If it was and worked we would buy some time with 4GB of memory available instead of 3GB.

3) VS2010 has a more memory efficient linker on 32bit that buys us time and would allow us to build on 64bit builders only:
https://bugzilla.mozilla.org/show_bug.cgi?id=709480

4) VS2010 builds as currently set up will force us to drop Windows 2000 and XP w/o SP and XP+SP1.

5) We do not want to switch to the VS2010 1 week before the FF11 migration to Aurora, this limits the time we have to find compiler bugs and/or binary compatibility issues.

6) Product team is not certain that its viable to drop Windows 2000 and XP w/o SP and XP+SP1. We'll have more OS usage data about 2 weeks after FF9 release:
https://bugzilla.mozilla.org/show_bug.cgi?id=668436

Once we collect data about support pack versions.

7) We backed out graphite and SPDY to get PGO builds green again. Product team says graphite not required for FF11 release, it was going to be preffed off, so it can land post migration on Dec 20. SPDY is requested because it was publicly communicated.

8) We believe turning off PGO has too much of a performance impact to consider turning it off as a short term solution. Slicing out items is a better approach for now.

* Investigations/Actions:
1) Kyle Huey, Mike Hommey and Jeff Gilbert are slicing out items from libxul to reduce the linker memory consumption:
https://bugzilla.mozilla.org/show_bug.cgi?id=709657
https://bugzilla.mozilla.org/show_bug.cgi?id=709721
https://bugzilla.mozilla.org/show_bug.cgi?id=709914

2) Ehsan is testing a work around for VS2010 not supporting Windows 2000 and XP w/o SP and XP+SP1.

3) RelEng will investigate if we can get a VS2005 (32bit) running on a Win64 builder machine? If this 32bit compiler *is* runable on win64, does this solve the problem by allowing the linker to use >3gb space?

* Open Questions:
1) Can we ship with things sliced out? How does this impact us?

-JP

Ehsan Akhgari

unread,
Dec 12, 2011, 8:18:37 PM12/12/11
to rob...@ocallahan.org, Kyle Huey, dev-platform
On Mon, Dec 12, 2011 at 6:04 PM, Robert O'Callahan <rob...@ocallahan.org>wrote:

> On Tue, Dec 13, 2011 at 8:58 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:
>
>> I have an idea which might enable us to use VS2010 to build binaries that
>> will run with Win2k, XP and XP SP1. We were going to switch to VS2010 for
>> Gecko 12 anyways, so if we can get this to work, we can switch to VS2010
>> today and we wouldn't need to rip out anything either.
>>
>> I will have useful results in a few hours.
>>
>
> What's the idea? I gather from #developers it didn't work, but I'd still
> like to know :-).
>

This might sound a bit crazy, but please bear with me.

If we had a way to provide msvcr100.dll with our own
EncodePointer/DecodePointer implementation (those functions are easy to
implement), everything would be fine on those platforms. I originally had
a few ideas, but none of them worked because of different reasons (we don't
link to crt statically, we can't ship modified versions of the binary,
etc). But I now have a solution which I think will work (I've just got the
hard part working):

I looked at the CRT source code, and it calls EncodePointer very early
during the startup (in its DllMain function, called _CRT_INIT). So firstly
we need a way to run our code before _CRT_INIT. I have got this part
working by generating a DLL which does not link against the CRT and only
imports functions from kernel32.dll (which itself does not link against the
CRT), and adding my code to the DllMain function inside that DLL. For this
to work correctly, the library for that DLL needs to come before msvcrt.lib
in the import table entries, something which can be achieved by passing
/nodefaultlib to the linker reordering the libraries on the command line
accordingly.

Once we have code which runs before the CRT initialization, we can look to
see if kernel32.dll includes the two functions we're interested in. If it
does not, we can parse the PE header of kernel32.dll, get to its
EXPORT_IMAGE_DIRECTORY table, copy the table while interleaving our two
entries to point to our own versions of those functions, and overwrite the
kernel32.dll PE header to point to our new EXPORT_IMAGE_DIRECTORY.

This way, when the loader attempts to load msvcr100.dll, it can
successfully find entries for EncodePointer/DecodePointer. This is sort of
similar to how we currently intercept LdrLoadDll on Windows for DLL
blacklisting, so while it's sort of hackish, it relies on documented stuff
from Windows. Therefore, I suspect that it will work just fine.

I'm currently working on the second half of the implementation.

Robert O'Callahan

unread,
Dec 12, 2011, 8:59:38 PM12/12/11
to Jean-Paul Rosevear, Kyle Huey, dev-platform
On Tue, Dec 13, 2011 at 12:53 PM, Jean-Paul Rosevear <j...@mozilla.com>wrote:

> 3) VS2010 has a more memory efficient linker on 32bit that buys us time
> and would allow us to build on 64bit builders only:
> https://bugzilla.mozilla.org/show_bug.cgi?id=709480
>

https://bugzilla.mozilla.org/show_bug.cgi?id=709193#c44 suggests that
VS2010 alone doesn't buy us very much. Someone should repeat that
experiment.

3) RelEng will investigate if we can get a VS2005 (32bit) running on a
> Win64 builder machine? If this 32bit compiler *is* runable on win64, does
> this solve the problem by allowing the linker to use >3gb space?
>

It should. I think someone still needs to do that experiment to make sure.

Mike Hommey

unread,
Dec 13, 2011, 2:17:46 AM12/13/11