Gecko Is Too Big (Or, Why the Tree Is Closed)

1932 views
Skip to first unread message

Kyle Huey

unread,
Dec 11, 2011, 9:27:06 PM12/11/11
to dev-platform, release, dev-tree-management
At the end of last week our Windows PGO builds started failing on
mozilla-inbound (https://bugzilla.mozilla.org/show_bug.cgi?id=709193).
After some investigation we determined that the problem seems to be that
the linker is running out of virtual address space during the optimization
phase.

This is not the first time we've run into this problem (e.g. Bug 543034).
A couple years ago we hit the 2 GB virtual address space limit. The build
machines were changed to use /3GB and that additional GB of address space
bought us some time. This time unfortunately the options aren't as easy as
flipping a switch.

As a temporary measure, we've turned off or ripped out a few new pieces of
code (Graphite, SPDY, libreg) which has brought us back down under the
limit for the moment. We don't really know how much breathing space we
have (but it's probably pretty small).

Our three options at this point:

1) Make libxul smaller - Either by removing code entirely or by splitting
things into separate shared libraries.
2) Move to MSVC 2010 - We know that changesets that reliably failed to link
on MSVC 2005 linked successfully with MSVC 2010. What we don't know is how
much this helps (I expect the answer is somewhere between a lot and a
little). We can't really do this for (at the bare minimum) a couple more
weeks anyways due to product considerations about what OSs we support.
3) Do our 32 bit builds on machines running a 64 bit OS. This will allow
the linker to use 4 GB of address space.

I think we need to pursue a combination of (1) in the short term and (3) in
the slightly less short term. Gal has some ideas on what we can do for (1)
that I'm investigating.

In the mean time, mozilla-inbound is closed, and mozilla-central is
restricted to approvals only. The only things currently allowed to land on
mozilla-central are:

- Test-only/NPOTB changes
- Changes that only touch Spidermonkey (which is not part of libxul on
Windows, and thus not contributing to the problem).
- Changes that only touch other cpp code that doesn't not end up in libxul
(cpp code in browser/, things like sqlite, angle, nss, nspr, etc).
- JS/XUL/HTML changes.

I'm hopeful that we can hack libxul enough to get the tree open
provisionally soon.

- Kyle

Kyle Huey

unread,
Dec 11, 2011, 9:27:52 PM12/11/11
to dev-platform
Also, I forgot to mention that Ed Morley deserves major thanks for all of
the investigation he's done on this.

- Kyle

Benoit Jacob

unread,
Dec 11, 2011, 9:53:33 PM12/11/11
to Kyle Huey, dev-platform
(Replying only to dev-platform)

If needed, WebGL offers some opportunities for splitting stuff away from libxul:
- the ANGLE shader compiler can easily be split to a separate lib.
- so could probably the WebGL implementation itself.

The ANGLE implementation of OpenGL ES2 on top of D3D9 is already separate DLLs.

Notice that external lib's are dlopen'd already when one creates a
WebGL context: libGL.so.1 on linux, the ANGLE GLES2 and
D3DX/D3DCompiler DLLs on Windows, etc. So it wouldn't make a big
difference. The WebGL impl is 180 K:

$ nm -S --radix=d libxul.so | grep -i WebGL | awk '{ SUM += $2} END {
print SUM/1024 }'
181.326

so, adding the ANGLE shader compiler, we'd probably have a library
weighing around 300 K of code (file size would be bigger).

Benoit

2011/12/11 Kyle Huey <m...@kylehuey.com>:
> _______________________________________________
> dev-tree-management mailing list
> dev-tree-...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tree-management

Andreas Gal

unread,
Dec 11, 2011, 9:59:00 PM12/11/11
to Benoit Jacob, Kyle Huey, dev-platform
> so, adding the ANGLE shader compiler, we'd probably have a library
> weighing around 300 K of code (file size would be bigger).

This sounds good. Can you please start on this? We aren't sure how much we have to take out to safely reopen the tree until we have a better fix (64-bit linker).

If any other module owners know of large chunks they can split out without affecting startup, please file bugs.

Andreas

Mike Hommey

unread,
Dec 12, 2011, 3:01:55 AM12/12/11
to Benoit Jacob, Kyle Huey, dev-platform
On Sun, Dec 11, 2011 at 09:53:33PM -0500, Benoit Jacob wrote:
> (Replying only to dev-platform)
>
> If needed, WebGL offers some opportunities for splitting stuff away from libxul:
> - the ANGLE shader compiler can easily be split to a separate lib.
> - so could probably the WebGL implementation itself.
>
> The ANGLE implementation of OpenGL ES2 on top of D3D9 is already separate DLLs.
>
> Notice that external lib's are dlopen'd already when one creates a
> WebGL context: libGL.so.1 on linux, the ANGLE GLES2 and
> D3DX/D3DCompiler DLLs on Windows, etc. So it wouldn't make a big
> difference. The WebGL impl is 180 K:
>
> $ nm -S --radix=d libxul.so | grep -i WebGL | awk '{ SUM += $2} END {
> print SUM/1024 }'
> 181.326
>
> so, adding the ANGLE shader compiler, we'd probably have a library
> weighing around 300 K of code (file size would be bigger).

I'm pretty sure the same could be done with the various media libraries
(vp8, ogg, etc.).

Mike

Robert Kaiser

unread,
Dec 12, 2011, 10:28:55 AM12/12/11
to
Kyle Huey schrieb:
> 1) Make libxul smaller - Either by removing code entirely or by splitting
> things into separate shared libraries.

If we're going with this, we should take a look what code is not in the
hot startup path and split out that. AFAIK, the reason for linking
everything into libxul was that startup is faster if we only need to
open one library instead of multiple. If we split off parts we don't
usually need at startup, we probably even make startup faster because
the library to be loaded is smaller - and we work around the Windows PGO
limit as well.

Robert Kaiser

Chris AtLee

unread,
Dec 12, 2011, 10:36:11 AM12/12/11
to
I'd like to propose

4) Stop doing PGO.

I think it's worth looking at what PGO is buying us these days. It costs
a lot in terms of build times and therefore build machine capacity. It's
also non-deterministic, which scares me a lot.

If we can determine where PGO is helping us, maybe move just those
pieces into a distinct library where we can do PGO.

Cheers,
Chris

Jean-Paul Rosevear

unread,
Dec 12, 2011, 10:47:25 AM12/12/11
to Chris AtLee, dev-pl...@lists.mozilla.org
Whether or not its the right thing to do long term, turning it off short term should let us quickly re-open the tree, correct? We can still go down the splitting out of big libxul chunks path and when we feel that's ready based on try results, we can turn PGO on again if need be.

I'll make this the main discussion topic for the engineering meeting tomorrow if people agree.

-JP

Jonathan Kew

unread,
Dec 12, 2011, 11:12:31 AM12/12/11
to Jean-Paul Rosevear, Chris AtLee, dev-pl...@lists.mozilla.org
On 12 Dec 2011, at 15:47, Jean-Paul Rosevear wrote:

> Whether or not its the right thing to do long term, turning it off short term should let us quickly re-open the tree, correct?

Yes, IMO. This would mean shipping non-PGO nightlies for the time being, which would presumably result in a significant perf regression, but one that we'd expect to recover when we update the build systems and can re-enable PGO.

I think some folk are concerned that we might land stuff in the meantime that seems fine on non-PGO builds/tests, but then fails under PGO when we eventually re-enable it. While that's a risk, I think it's a relatively small one, and we should accept it at this point as better than keeping m-c closed to most C++ development for an extended period.

I assume PGO is continuing to work as expected on mozilla-beta and mozilla-aurora trees, and so we have a breathing space before this problem hits the release channel. We'll need to disable PGO on aurora when the next m-c merge happens (unless we have overcome the problem by then), but I think we could live with that; we should aim to have a solution deployed before mozilla11 hits beta, however, so that we have the beta period to resolve any unexpected PGO-related failures that might crop up before this version goes to release.

So that gives us until the end of January to get the new compiler deployed, move to 64-bit builders, or whatever solution(s) we're going to use, or about 7 weeks, minus the Christmas and New Year holiday season.

JK

Justin Lebar

unread,
Dec 12, 2011, 11:12:37 AM12/12/11
to Jean-Paul Rosevear, Chris AtLee, dev-pl...@lists.mozilla.org
> I think it's worth looking at what PGO is buying us these days. It costs a lot in terms of build times and therefore
> build machine capacity.

Have a look at the Talos results in https://tbpl.mozilla.org/?rev=5c64fb241d4e

PGO is a large performance win. TP5 goes from 400 to 330, a speedup of 1.2x.

> [PGO is] also non-deterministic, which scares me a lot.

Are you sure non-pgo is deterministic? :) Not that you shouldn't be
scared by the extra non-determinism in PGO.

> If we can determine where PGO is helping us, maybe move just those pieces into a distinct library where we can
> do PGO.

This isn't a bad idea, but we need to be careful. "Where PGO is
helping us" doesn't mean "code we can compile without PGO without
causing a regression on our performance tests." Our performance tests
are hardly comprehensive.

On Mon, Dec 12, 2011 at 10:47 AM, Jean-Paul Rosevear <j...@mozilla.com> wrote:
> Whether or not its the right thing to do long term, turning it off short term should let us quickly re-open the tree, correct?  We can still go down the splitting out of big libxul chunks path and when we feel that's ready based on try results, we can turn PGO on again if need be.
>
> I'll make this the main discussion topic for the engineering meeting tomorrow if people agree.
>
> -JP
>
> ----- Original Message -----
> From: "Chris AtLee" <cat...@mozilla.com>
> To: dev-pl...@lists.mozilla.org
> Sent: Monday, December 12, 2011 10:36:11 AM
> Subject: Re: Gecko Is Too Big (Or, Why the Tree Is Closed)
>
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Marco Bonardo

unread,
Dec 12, 2011, 11:25:01 AM12/12/11
to
On 12/12/2011 17:12, Jonathan Kew wrote:
> I think some folk are concerned that we might land stuff in the meantime that seems fine on non-PGO builds/tests, but then fails under PGO when we eventually re-enable it. While that's a risk, I think it's a relatively small one, and we should accept it at this point as better than keeping m-c closed to most C++ development for an extended period.

It's not a so small risk, it happened 3 times in the last 2 months iirc,
that's the original reason philor asked to go back to always pgo, since
it was hard to track back the original changeset causing the problem
with intermittent pgo.
-m

Boris Zbarsky

unread,
Dec 12, 2011, 11:30:53 AM12/12/11
to
On 12/12/11 11:12 AM, Jonathan Kew wrote:
> I think some folk are concerned that we might land stuff in the meantime that seems fine on non-PGO builds/tests, but then fails under PGO when we eventually re-enable it. While that's a risk, I think it's a relatively small one

The data seems to show that such a checkin happens about once a week on
average (see the recent "Proposal to switch mozilla-inbound back to
always doing PGO builds" thread in dev.planning).

So either we think that we'll have PGO builds back up within much less
than a week, or the risk is decidedly not small, right?

> So that gives us until the end of January to get the new compiler deployed, move to 64-bit builders, or whatever solution(s) we're going to use, or about 7 weeks, minus the Christmas and New Year holiday season.

At which point we will need to find the average of 7 checkins that no
longer build with pgo that will land between now and then...

-Boris

Ehsan Akhgari

unread,
Dec 12, 2011, 11:34:32 AM12/12/11
to Justin Lebar, Chris AtLee, Jean-Paul Rosevear, dev-pl...@lists.mozilla.org
Moving code out of libxul is only a band-aid over the problem. Since we
don't have any reason to believe that the memory usage of the linker is
linear in terms of the code size, we can't be sure that removing 10% of the
code in libxul will give us 10% more breathing space. Also, moving code
out of libxul might break the sorts of optimizations that we've been doing
assuming that most of our code lives inside libxul (for example, libxul
preloading, etc.)

I agree with JP that the shortest path to reopening the trees is disabling
PGO builds. But we should also note that we're pretty close to the cut-off
date, which would mean that we would end up in a situation where we would
need to release Firefox 11 for Windows with PGO disabled, unless RelEng can
deploy 64-bit builders in time.

Moving to 64-bit builders gives us 33% more address space, which should be
enough for a while. But there is ultimately a hard limit on how much code
we can have in libxul before we hit the 4GB address space limit of the
linker. That might take a couple of more years, but my pessimistic side
thinks that it's going to happen sooner this time. ;-)

The only real fix is for us to get a 64-bit linker. I remember some folks
mentioning how Microsoft doesn't have plans on shipping one (my memory
might not be serving me well here). But really, we should talk to
Microsoft and find this out. If they're not planning to ship a 64-bit
linker within the next year or so, turning PGO off is just something that
we would have to do at some point in the future.

--
Ehsan
<http://ehsanakhgari.org/>


On Mon, Dec 12, 2011 at 11:12 AM, Justin Lebar <justin...@gmail.com>wrote:

> > I think it's worth looking at what PGO is buying us these days. It costs
> a lot in terms of build times and therefore
> > build machine capacity.
>
> Have a look at the Talos results in
> https://tbpl.mozilla.org/?rev=5c64fb241d4e
>
> PGO is a large performance win. TP5 goes from 400 to 330, a speedup of
> 1.2x.
>
> > [PGO is] also non-deterministic, which scares me a lot.
>
> Are you sure non-pgo is deterministic? :) Not that you shouldn't be
> scared by the extra non-determinism in PGO.
>
> > If we can determine where PGO is helping us, maybe move just those
> pieces into a distinct library where we can
> > do PGO.
>
> This isn't a bad idea, but we need to be careful. "Where PGO is
> helping us" doesn't mean "code we can compile without PGO without
> causing a regression on our performance tests." Our performance tests
> are hardly comprehensive.
>
> On Mon, Dec 12, 2011 at 10:47 AM, Jean-Paul Rosevear <j...@mozilla.com>
> wrote:
> > Whether or not its the right thing to do long term, turning it off short
> term should let us quickly re-open the tree, correct? We can still go down
> the splitting out of big libxul chunks path and when we feel that's ready
> based on try results, we can turn PGO on again if need be.
> >
> > I'll make this the main discussion topic for the engineering meeting
> tomorrow if people agree.
> >
> > -JP
> >
> > ----- Original Message -----
> > From: "Chris AtLee" <cat...@mozilla.com>
> > To: dev-pl...@lists.mozilla.org
> > Sent: Monday, December 12, 2011 10:36:11 AM
> > Subject: Re: Gecko Is Too Big (Or, Why the Tree Is Closed)
> >

Mike Hommey

unread,
Dec 12, 2011, 11:58:36 AM12/12/11
to Ehsan Akhgari, Chris AtLee, Jean-Paul Rosevear, Justin Lebar, dev-pl...@lists.mozilla.org
On Mon, Dec 12, 2011 at 11:34:32AM -0500, Ehsan Akhgari wrote:
> Moving code out of libxul is only a band-aid over the problem. Since we
> don't have any reason to believe that the memory usage of the linker is
> linear in terms of the code size, we can't be sure that removing 10% of the
> code in libxul will give us 10% more breathing space. Also, moving code
> out of libxul might break the sorts of optimizations that we've been doing
> assuming that most of our code lives inside libxul (for example, libxul
> preloading, etc.)
>
> I agree with JP that the shortest path to reopening the trees is disabling
> PGO builds. But we should also note that we're pretty close to the cut-off
> date, which would mean that we would end up in a situation where we would
> need to release Firefox 11 for Windows with PGO disabled, unless RelEng can
> deploy 64-bit builders in time.
>
> Moving to 64-bit builders gives us 33% more address space, which should be
> enough for a while. But there is ultimately a hard limit on how much code
> we can have in libxul before we hit the 4GB address space limit of the
> linker. That might take a couple of more years, but my pessimistic side
> thinks that it's going to happen sooner this time. ;-)
>
> The only real fix is for us to get a 64-bit linker. I remember some folks
> mentioning how Microsoft doesn't have plans on shipping one (my memory
> might not be serving me well here). But really, we should talk to
> Microsoft and find this out. If they're not planning to ship a 64-bit
> linker within the next year or so, turning PGO off is just something that
> we would have to do at some point in the future.

Note that MSVC2010 uses less memory, since it can link with 3GB memory
with PGO enabled.

Mike

Jonathan Kew

unread,
Dec 12, 2011, 12:20:18 PM12/12/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
On 12 Dec 2011, at 16:30, Boris Zbarsky wrote:

> On 12/12/11 11:12 AM, Jonathan Kew wrote:
>> I think some folk are concerned that we might land stuff in the meantime that seems fine on non-PGO builds/tests, but then fails under PGO when we eventually re-enable it. While that's a risk, I think it's a relatively small one
>
> The data seems to show that such a checkin happens about once a week on average (see the recent "Proposal to switch mozilla-inbound back to always doing PGO builds" thread in dev.planning).

Has that always been the case, or is this relatively high frequency a relatively recent phenomenon?

I'm assuming the address-space limit we've hit is not based simply on "raw" codesize (we don't have 3GB of code, do we?) but rather the total of various structures that the compiler/linker builds internally in order to support its optimization and code-gen process. And so it relates somehow to complexity/inter-relationships as well as raw size, and given that we've presumably been fairly close to the breaking point for a while, I'd think it quite possible that some of the "internal compiler error" failures were in fact out-of-address-space failures, due to a checkin modifying code (without necessarily _adding_ much) in a way that happens to be more memory-hungry for the compiler to handle.

So once we raise that ceiling, we may see a reduction in the incidence of PGO failure on apparently-innocent checkins.

> So either we think that we'll have PGO builds back up within much less than a week, or the risk is decidedly not small, right?
>
>> So that gives us until the end of January to get the new compiler deployed, move to 64-bit builders, or whatever solution(s) we're going to use, or about 7 weeks, minus the Christmas and New Year holiday season.
>
> At which point we will need to find the average of 7 checkins that no longer build with pgo that will land between now and then...

I don't doubt that it happens, but I think having to tackle a handful of these on aurora during January and/or beta during February would be better than blocking much C++ development for an extended period - and dealing with the resulting pressure on the tree when it re-opens and everyone wants to land the stuff they've been holding back in the meantime.

And if releng can get us onto VS2010 and/or 64-bit builders more quickly - which I hope is possible, but don't know what's actually involved in making the switch - the number of such problematic checkins will presumably be correspondingly smaller.

JK

Boris Zbarsky

unread,
Dec 12, 2011, 12:34:48 PM12/12/11
to
On 12/12/11 12:20 PM, Jonathan Kew wrote:
> I don't doubt that it happens, but I think having to tackle a handful of these on aurora during January and/or beta during February would be better than blocking much C++ development for an extended period

I'm not actually sure it is. At that point we'll have to first find the
checkins responsible, then figure out how to fix them, possibly backing
them and other things out.

I suspect the net effect will be similar to holding the tree closed for
several days now, but time-shifted into January/February.

If we think we'll need to have the tree closed for longer than a few
days, I agree that disabling PGO temporarily sounds more palatable.

-Boris

Chris AtLee

unread,
Dec 12, 2011, 12:57:23 PM12/12/11
to
On 12/12/11 12:20 PM, Jonathan Kew wrote:
> And if releng can get us onto VS2010 and/or 64-bit builders more quickly - which I hope is possible, but don't know what's actually involved in making the switch - the number of such problematic checkins will presumably be correspondingly smaller.

The 32-bit builders currently have VS2010 installed on them in addition
to VS2005. There are other issues preventing switching over to 2010
however; iirc switching to 2010 breaks firefox on older versions of
windows XP.

Chris AtLee

unread,
Dec 12, 2011, 1:16:35 PM12/12/11
to
On 12/12/11 11:12 AM, Justin Lebar wrote:
>> I think it's worth looking at what PGO is buying us these days. It costs a lot in terms of build times and therefore
>> build machine capacity.
>
> Have a look at the Talos results in https://tbpl.mozilla.org/?rev=5c64fb241d4e
>
> PGO is a large performance win. TP5 goes from 400 to 330, a speedup of 1.2x.

Sure, but things like Dromaeo tests don't seem to be affected at all:

http://graphs-new.mozilla.org/graph.html#tests=[[75,94,1],[75,1,1]]&sel=none&displayrange=7&datatype=running

http://graphs-new.mozilla.org/graph.html#tests=[[76,94,1],[76,1,1]]&sel=none&displayrange=7&datatype=running

But SVG is:

http://graphs-new.mozilla.org/graph.html#tests=[[57,94,1],[57,1,1]]&sel=none&displayrange=7&datatype=running

Boris Zbarsky

unread,
Dec 12, 2011, 1:31:15 PM12/12/11
to
On 12/12/11 1:16 PM, Chris AtLee wrote:
> Sure, but things like Dromaeo tests don't seem to be affected at all:
>
> http://graphs-new.mozilla.org/graph.html#tests=[[75,94,1],[75,1,1]]&sel=none&displayrange=7&datatype=running

That's Dromaeo-Sunspider.
And that's Dromaeo-V8.

Both are pure JS tests. For pure JS tests, time is either spent in
jitcode (not affected by PGO) or in libmozjs (which is compiled with PGO
disabled already on Windows because as far as we can tell VS 2005 PGO
miscompiles it; see https://bugzilla.mozilla.org/show_bug.cgi?id=673518 ).

Try this graph for Dromaeo-DOM:

http://graphs-new.mozilla.org/graph.html#tests=[[73,1,1],[73,94,1]]&sel=none&displayrange=7&datatype=running

It shows the PGO builds doing about 267 runs/s while the non-PGO ones
are doing about 209 runs/s. So about 25% speedup.

(Amusingly,
http://graphs-new.mozilla.org/graph.html#tests=[[72,94,1],[72,1,1]]&sel=none&displayrange=7&datatype=running
also shows no speedup, because contrary to its name Dromaeo-CSS is
largely a JS test in practice.)

-Boris

Benoit Jacob

unread,
Dec 12, 2011, 1:56:47 PM12/12/11
to Ehsan Akhgari, dev-platform
2011/12/12 Ehsan Akhgari <ehsan....@gmail.com>:
> Moving code out of libxul is only a band-aid over the problem.  Since we
> don't have any reason to believe that the memory usage of the linker is
> linear in terms of the code size, we can't be sure that removing 10% of the
> code in libxul will give us 10% more breathing space.  Also, moving code
> out of libxul might break the sorts of optimizations that we've been doing
> assuming that most of our code lives inside libxul (for example, libxul
> preloading, etc.)

This argument, however, doesn't apply equally well to all parts of
libxul. Some parts are relatively self-contained, with critical loops
that are well-identified, don't interact with other parts of libxul,
and already optimized i.e. coded in such a way that PGO won't make
them faster than -O2. I think that WebGL is such an example.

To put it another way, there's a limit to the scale at which PGO makes
sense, or else we should just link all the software on a computed as a
single file...

Benoit

>
> I agree with JP that the shortest path to reopening the trees is disabling
> PGO builds.  But we should also note that we're pretty close to the cut-off
> date, which would mean that we would end up in a situation where we would
> need to release Firefox 11 for Windows with PGO disabled, unless RelEng can
> deploy 64-bit builders in time.
>
> Moving to 64-bit builders gives us 33% more address space, which should be
> enough for a while.  But there is ultimately a hard limit on how much code
> we can have in libxul before we hit the 4GB address space limit of the
> linker.  That might take a couple of more years, but my pessimistic side
> thinks that it's going to happen sooner this time.  ;-)
>
> The only real fix is for us to get a 64-bit linker.  I remember some folks
> mentioning how Microsoft doesn't have plans on shipping one (my memory
> might not be serving me well here).  But really, we should talk to
> Microsoft and find this out.  If they're not planning to ship a 64-bit
> linker within the next year or so, turning PGO off is just something that
> we would have to do at some point in the future.
>
> --
> Ehsan
> <http://ehsanakhgari.org/>
>
>
> On Mon, Dec 12, 2011 at 11:12 AM, Justin Lebar <justin...@gmail.com>wrote:
>
>> > I think it's worth looking at what PGO is buying us these days. It costs
>> a lot in terms of build times and therefore
>> > build machine capacity.
>>
>> Have a look at the Talos results in
>> https://tbpl.mozilla.org/?rev=5c64fb241d4e
>>
>> PGO is a large performance win.  TP5 goes from 400 to 330, a speedup of
>> 1.2x.
>>
>> > [PGO is] also non-deterministic, which scares me a lot.
>>
>> Are you sure non-pgo is deterministic?  :)  Not that you shouldn't be
>> scared by the extra non-determinism in PGO.
>>
>> > If we can determine where PGO is helping us, maybe move just those
>> pieces into a distinct library where we can
>> > do PGO.
>>
>> This isn't a bad idea, but we need to be careful.  "Where PGO is
>> helping us" doesn't mean "code we can compile without PGO without
>> causing a regression on our performance tests."  Our performance tests
>> are hardly comprehensive.
>>
>> On Mon, Dec 12, 2011 at 10:47 AM, Jean-Paul Rosevear <j...@mozilla.com>
>> wrote:
>> > Whether or not its the right thing to do long term, turning it off short
>> term should let us quickly re-open the tree, correct?  We can still go down
>> the splitting out of big libxul chunks path and when we feel that's ready
>> based on try results, we can turn PGO on again if need be.
>> >
>> > I'll make this the main discussion topic for the engineering meeting
>> tomorrow if people agree.
>> >
>> > -JP
>> >
>> > ----- Original Message -----
>> > From: "Chris AtLee" <cat...@mozilla.com>
>> > To: dev-pl...@lists.mozilla.org
>> > Sent: Monday, December 12, 2011 10:36:11 AM
>> > Subject: Re: Gecko Is Too Big (Or, Why the Tree Is Closed)
>> >

Ehsan Akhgari

unread,
Dec 12, 2011, 2:14:58 PM12/12/11
to Benoit Jacob, dev-platform
On Mon, Dec 12, 2011 at 1:56 PM, Benoit Jacob <jacob.b...@gmail.com>wrote:

> 2011/12/12 Ehsan Akhgari <ehsan....@gmail.com>:
> > Moving code out of libxul is only a band-aid over the problem. Since we
> > don't have any reason to believe that the memory usage of the linker is
> > linear in terms of the code size, we can't be sure that removing 10% of
> the
> > code in libxul will give us 10% more breathing space. Also, moving code
> > out of libxul might break the sorts of optimizations that we've been
> doing
> > assuming that most of our code lives inside libxul (for example, libxul
> > preloading, etc.)
>
> This argument, however, doesn't apply equally well to all parts of
> libxul. Some parts are relatively self-contained, with critical loops
> that are well-identified, don't interact with other parts of libxul,
> and already optimized i.e. coded in such a way that PGO won't make
> them faster than -O2. I think that WebGL is such an example.
>

There is also the question of which interfaces the code in question can
use. For example, if the code in question calls a function on an object,
and the code for the said object lives outside of its module, the function
needs to either be virtual or publicly exported.


> To put it another way, there's a limit to the scale at which PGO makes
> sense, or else we should just link all the software on a computed as a
> single file...


I think that's an unfair comparison. Theoretically, if we had linkers
which could use 64-bit address space, we could take advantage of PGO
without needing to put all of the code inside a single source file.
Problem is, we don't have those linkers for now. :(

Cheers,
--
Ehsan
<http://ehsanakhgari.org/>

Kyle Huey

unread,
Dec 12, 2011, 2:18:17 PM12/12/11
to dev-platform
Status update:

We have two patches in hand (Bug 709657 and Bug 709721) to split out a
couple chunks of libxul. I tested one of them last night and it got the
final xul.dll size below the size of mozilla-beta's xul.dll by a couple
hundred kilobytes.

If we're willing to make the assumption that final binary size and peak
linker memory consumption are somewhat correlated then these two bugs
should buy us a fair amount of time (or code size, I suppose).

- Kyle

Jonathan Kew

unread,
Dec 12, 2011, 2:25:18 PM12/12/11
to Boris Zbarsky, dev-pl...@lists.mozilla.org
But if we expect we'll be able to re-open (with PGO) within a few days anyway, then we'll only be dealing with a few days' worth of non-PGO'd checkins that might have problems that need to be tracked down once PGO is back.

So I don't see much benefit to holding the tree mostly-closed at this point. Either we can get the PGO builds working again soon, in which case the odds are pretty good that they'll "just work" with whatever patches have landed - it's not like we break them on a daily basis - or it's going to take "longer than a few days", in which case we really can't afford to block development while we wait for it.

JK

Ted Mielczarek

unread,
Dec 12, 2011, 2:32:14 PM12/12/11
to Benoit Jacob, Ehsan Akhgari, dev-platform
On Mon, Dec 12, 2011 at 1:56 PM, Benoit Jacob <jacob.b...@gmail.com> wrote:
> This argument, however, doesn't apply equally well to all parts of
> libxul. Some parts are relatively self-contained, with critical loops
> that are well-identified, don't interact with other parts of libxul,
> and already optimized i.e. coded in such a way that PGO won't make
> them faster than -O2. I think that WebGL is such an example.

This is an almost impossible statement to make. Even highly optimized
code can be made faster by the PGO optimizer, because it does
optimizations like:
* massive inlining
* speculative virtual call inlining
* hot+cold function block separation

which are incredibly hard to replicate without hand-crafting unreadable code.

> To put it another way, there's a limit to the scale at which PGO makes
> sense, or else we should just link all the software on a computed as a
> single file...

This is probably false. If the compiler could inline your system
library calls and things like that, your software would likely be
faster. It's only because of API boundaries that things like that
don't happen.

-Ted

Ehsan Akhgari

unread,
Dec 12, 2011, 2:58:14 PM12/12/11
to Kyle Huey, dev-platform
I have an idea which might enable us to use VS2010 to build binaries that
will run with Win2k, XP and XP SP1. We were going to switch to VS2010 for
Gecko 12 anyways, so if we can get this to work, we can switch to VS2010
today and we wouldn't need to rip out anything either.

I will have useful results in a few hours.

Cheers,
--
Ehsan
<http://ehsanakhgari.org/>


Zack Weinberg

unread,
Dec 12, 2011, 3:02:34 PM12/12/11
to
On 2011-12-12 7:28 AM, Robert Kaiser wrote:
> Kyle Huey schrieb:
>> 1) Make libxul smaller - Either by removing code entirely or by splitting
>> things into separate shared libraries.
>
> If we're going with this, we should take a look what code is not in the
> hot startup path and split out that. AFAIK, the reason for linking
> everything into libxul was that startup is faster if we only need to
> open one library instead of multiple.

It also allows more aggressive deCOMtamination, although there might be
a way to work around that. I don't remember exactly what the problem
was, but I know I had to postpone some deCOM patches till we went
libxul-only because they caused link errors in a non-libxul build. bz
or bsmedberg probably know why.

zw

Benoit Jacob

unread,
Dec 12, 2011, 3:35:27 PM12/12/11
to Ted Mielczarek, Ehsan Akhgari, dev-platform
2011/12/12 Ted Mielczarek <t...@mielczarek.org>:
> On Mon, Dec 12, 2011 at 1:56 PM, Benoit Jacob <jacob.b...@gmail.com> wrote:
>> This argument, however, doesn't apply equally well to all parts of
>> libxul. Some parts are relatively self-contained, with critical loops
>> that are well-identified, don't interact with other parts of libxul,
>> and already optimized i.e. coded in such a way that PGO won't make
>> them faster than -O2. I think that WebGL is such an example.
>
> This is an almost impossible statement to make. Even highly optimized
> code can be made faster by the PGO optimizer, because it does
> optimizations like:
> * massive inlining

It's not hard to get the compiler to inline the functions that you
specifically know need to be inlined; and for the cases where we know
that compilers are getting it wrong even with inline keywords, we just
need to fix NS_ALWAYS_INLINE (
https://bugzilla.mozilla.org/show_bug.cgi?id=697810 ) and use that
macro. WebGL is already using such a macro (
https://bugzilla.mozilla.org/show_bug.cgi?id=697450 )

> * speculative virtual call inlining

The kind of optimized code that I'm talking about, doesn't use virtual
functions for stuff that needs to be inlined. I dont deny that there
are cases where you'd really need that, but that's not the case of a
majority of 'hot' code.

> * hot+cold function block separation

If you just write carefully hot code such that -O2 will compile it
well, you're fine. PGO is more useful for when you have a big codebase
and you either dont know where the hot parts are, or know that you
don't have time to carefully optimize all of them...

>
> which are incredibly hard to replicate without hand-crafting unreadable code.
>
>> To put it another way, there's a limit to the scale at which PGO makes
>> sense, or else we should just link all the software on a computed as a
>> single file...
>
> This is probably false. If the compiler could inline your system
> library calls and things like that, your software would likely be
> faster. It's only because of API boundaries that things like that
> don't happen.

Yes it could be faster, but would that be desirable? Diminishing returns etc.

Benoit

Brian Smith

unread,
Dec 12, 2011, 5:43:42 PM12/12/11
to Ehsan Akhgari, dev-pl...@lists.mozilla.org, necko...@mozilla.org
Ehsan Akhgari wrote:
> The only real fix is for us to get a 64-bit linker. I remember some
> folks mentioning how Microsoft doesn't have plans on shipping one
> (my memory might not be serving me well here). But really, we should
> talk to Microsoft and find this out. If they're not planning to ship
> a 64-bit linker within the next year or so, turning PGO off is just
> something that we would have to do at some point in the future.

We could, and probably will have to, permanently split libxul (differently than we already do).

Microsoft is not going to provide us with a 64-bit linker in any reasonable timeframe. We cannot switch wholesale to any other linker because we need the optimizations (including PGO) in Microsoft's.

Note that we effectively split libxul into multiple pieces already: libxul, mozjs, libssl3, libnss, libnssutil, libsmime, libsoftokn, libfreebl, libnspr4, libplds4, libplc4, etc. We have bugs on file to combine these together, to improve startup time, which would add 4MB (FOUR MEGABYTES) of code to libxul. We also have WebRTC and other major features that will add, on their own, hundreds of kilobytes of code, at least. I wouldn't be surprised if, this time next year, libxul would be 33-50% larger than it is now if we follow through with these plans. If we assume that the compiler/linker requires AT LEAST a linear increase in memory relative to code size, then even 3GB -> 4GB address space is going to be insufficient.

So, IMO, short-term fixes that move non-startup code out of libxul *that does NOT depend on NSPR, NSS, or mozjs* is the best solution because that would seem to be the long-term solution too.

Unfortunately, this means that Necko and PSM are actually very poor candidates (long-term) for splitting out of libxul, if we plan ahead for the time where NSPR and NSS are linked into libxul for startup performance reasons.

- Brian

Robert O'Callahan

unread,
Dec 12, 2011, 6:04:08 PM12/12/11
to Ehsan Akhgari, Kyle Huey, dev-platform
On Tue, Dec 13, 2011 at 8:58 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> I have an idea which might enable us to use VS2010 to build binaries that
> will run with Win2k, XP and XP SP1. We were going to switch to VS2010 for
> Gecko 12 anyways, so if we can get this to work, we can switch to VS2010
> today and we wouldn't need to rip out anything either.
>
> I will have useful results in a few hours.
>

What's the idea? I gather from #developers it didn't work, but I'd still
like to know :-).

Rob
--
"If we claim to be without sin, we deceive ourselves and the truth is not
in us. If we confess our sins, he is faithful and just and will forgive us
our sins and purify us from all unrighteousness. If we claim we have not
sinned, we make him out to be a liar and his word is not in us." [1 John
1:8-10]

Robert O'Callahan

unread,
Dec 12, 2011, 6:29:32 PM12/12/11
to Benoit Jacob, dev-platform, Ehsan Akhgari, Ted Mielczarek
On Tue, Dec 13, 2011 at 9:35 AM, Benoit Jacob <jacob.b...@gmail.com>wrote:

> > * hot+cold function block separation
> If you just write carefully hot code such that -O2 will compile it
> well, you're fine.
>

You can't manually separate all the hot from cold code, e.g. error handling
paths within a hot function.

Robert O'Callahan

unread,
Dec 12, 2011, 6:33:48 PM12/12/11
to Ehsan Akhgari, Benoit Jacob, dev-platform
On Tue, Dec 13, 2011 at 8:14 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> I think that's an unfair comparison. Theoretically, if we had linkers
> which could use 64-bit address space, we could take advantage of PGO
> without needing to put all of the code inside a single source file.
> Problem is, we don't have those linkers for now. :(
>

There's the 64-bit linker that builds 64-bit binaries.

So we should always be able to ship 64-bit single-libxul PGO builds. We
should do that. (Along with the other efforts for 32-bit builds.)

Jean-Paul Rosevear

unread,
Dec 12, 2011, 6:53:13 PM12/12/11
to Kyle Huey, dev-platform
Further summary.

Please check the assertions, actions and questions below. If you disagree with an assertion, please call it out. If you have an action attributed to you and you aren't the right person, please call it out. If you know the answer to open questions, call out.

tl;dr;

Goal is to re-open tree and limit risk to future shipping products.

Slicing out stuff from libxul gets re-opening the tree and through shipping FF11. After that we switch to VS2010 or we get VS2005 on 64bit builders.

=====

* Assertions:
1) We can't link PGO Windows builds because we are OOM at 3GB:
https://bugzilla.mozilla.org/show_bug.cgi?id=709193

2) VS2005 is not installed on our 64bit builders. If it was and worked we would buy some time with 4GB of memory available instead of 3GB.

3) VS2010 has a more memory efficient linker on 32bit that buys us time and would allow us to build on 64bit builders only:
https://bugzilla.mozilla.org/show_bug.cgi?id=709480

4) VS2010 builds as currently set up will force us to drop Windows 2000 and XP w/o SP and XP+SP1.

5) We do not want to switch to the VS2010 1 week before the FF11 migration to Aurora, this limits the time we have to find compiler bugs and/or binary compatibility issues.

6) Product team is not certain that its viable to drop Windows 2000 and XP w/o SP and XP+SP1. We'll have more OS usage data about 2 weeks after FF9 release:
https://bugzilla.mozilla.org/show_bug.cgi?id=668436

Once we collect data about support pack versions.

7) We backed out graphite and SPDY to get PGO builds green again. Product team says graphite not required for FF11 release, it was going to be preffed off, so it can land post migration on Dec 20. SPDY is requested because it was publicly communicated.

8) We believe turning off PGO has too much of a performance impact to consider turning it off as a short term solution. Slicing out items is a better approach for now.

* Investigations/Actions:
1) Kyle Huey, Mike Hommey and Jeff Gilbert are slicing out items from libxul to reduce the linker memory consumption:
https://bugzilla.mozilla.org/show_bug.cgi?id=709657
https://bugzilla.mozilla.org/show_bug.cgi?id=709721
https://bugzilla.mozilla.org/show_bug.cgi?id=709914

2) Ehsan is testing a work around for VS2010 not supporting Windows 2000 and XP w/o SP and XP+SP1.

3) RelEng will investigate if we can get a VS2005 (32bit) running on a Win64 builder machine? If this 32bit compiler *is* runable on win64, does this solve the problem by allowing the linker to use >3gb space?

* Open Questions:
1) Can we ship with things sliced out? How does this impact us?

-JP

Ehsan Akhgari

unread,
Dec 12, 2011, 8:18:37 PM12/12/11
to rob...@ocallahan.org, Kyle Huey, dev-platform
On Mon, Dec 12, 2011 at 6:04 PM, Robert O'Callahan <rob...@ocallahan.org>wrote:

> On Tue, Dec 13, 2011 at 8:58 AM, Ehsan Akhgari <ehsan....@gmail.com>wrote:
>
>> I have an idea which might enable us to use VS2010 to build binaries that
>> will run with Win2k, XP and XP SP1. We were going to switch to VS2010 for
>> Gecko 12 anyways, so if we can get this to work, we can switch to VS2010
>> today and we wouldn't need to rip out anything either.
>>
>> I will have useful results in a few hours.
>>
>
> What's the idea? I gather from #developers it didn't work, but I'd still
> like to know :-).
>

This might sound a bit crazy, but please bear with me.

If we had a way to provide msvcr100.dll with our own
EncodePointer/DecodePointer implementation (those functions are easy to
implement), everything would be fine on those platforms. I originally had
a few ideas, but none of them worked because of different reasons (we don't
link to crt statically, we can't ship modified versions of the binary,
etc). But I now have a solution which I think will work (I've just got the
hard part working):

I looked at the CRT source code, and it calls EncodePointer very early
during the startup (in its DllMain function, called _CRT_INIT). So firstly
we need a way to run our code before _CRT_INIT. I have got this part
working by generating a DLL which does not link against the CRT and only
imports functions from kernel32.dll (which itself does not link against the
CRT), and adding my code to the DllMain function inside that DLL. For this
to work correctly, the library for that DLL needs to come before msvcrt.lib
in the import table entries, something which can be achieved by passing
/nodefaultlib to the linker reordering the libraries on the command line
accordingly.

Once we have code which runs before the CRT initialization, we can look to
see if kernel32.dll includes the two functions we're interested in. If it
does not, we can parse the PE header of kernel32.dll, get to its
EXPORT_IMAGE_DIRECTORY table, copy the table while interleaving our two
entries to point to our own versions of those functions, and overwrite the
kernel32.dll PE header to point to our new EXPORT_IMAGE_DIRECTORY.

This way, when the loader attempts to load msvcr100.dll, it can
successfully find entries for EncodePointer/DecodePointer. This is sort of
similar to how we currently intercept LdrLoadDll on Windows for DLL
blacklisting, so while it's sort of hackish, it relies on documented stuff
from Windows. Therefore, I suspect that it will work just fine.

I'm currently working on the second half of the implementation.

Robert O'Callahan

unread,
Dec 12, 2011, 8:59:38 PM12/12/11
to Jean-Paul Rosevear, Kyle Huey, dev-platform
On Tue, Dec 13, 2011 at 12:53 PM, Jean-Paul Rosevear <j...@mozilla.com>wrote:

> 3) VS2010 has a more memory efficient linker on 32bit that buys us time
> and would allow us to build on 64bit builders only:
> https://bugzilla.mozilla.org/show_bug.cgi?id=709480
>

https://bugzilla.mozilla.org/show_bug.cgi?id=709193#c44 suggests that
VS2010 alone doesn't buy us very much. Someone should repeat that
experiment.

3) RelEng will investigate if we can get a VS2005 (32bit) running on a
> Win64 builder machine? If this 32bit compiler *is* runable on win64, does
> this solve the problem by allowing the linker to use >3gb space?
>

It should. I think someone still needs to do that experiment to make sure.

Mike Hommey

unread,
Dec 13, 2011, 2:17:46 AM12/13/11
to Brian Smith, Ehsan Akhgari, dev-pl...@lists.mozilla.org, necko...@mozilla.org
I think that putting everything together in libxul is addressing startup
performance by the wrong end. The problem we had with many components
is that they were all opened at startup, thus affecting startup time
significantly. Things changed in the meanwhile, though: we now have
manifests. This means we could actually avoid loading binary components
*at all* until they are actually needed. Meaning that we could and
should separate out all these components that we don't need at startup.
PSM is very much such a candidate. Parts or Necko probably are, though
other parts of Necko are required. Another candidate for being a
component is WebRTC, which is massive and far from needed at startup.
Video/Audio could be a candidate, if that doesn't hurt framerate.
Depending on the frontend perspectives, some other things like SVG or
WebGL could be candidates as well.

As for NSS and NSPR, instead of folding them into libxul, we could merge
them in less libraries than are currently provided: we don't have any
use of NSPR being 3 different libraries, and NSS core being 4 more. On
the long term, I'd be in favour of not using NSPR from Mozilla code at
all, and leave it there only for NSS. Anyways, the current status quo
for these libraries is not so bad: startup performance is not affected
by library loading so much anymore (and less so since FF7 iirc).

Mike

Neil

unread,
Dec 13, 2011, 5:04:12 AM12/13/11
to
Robert Kaiser wrote:

> Kyle Huey schrieb:
>
>> 1) Make libxul smaller - Either by removing code entirely or by
>> splitting things into separate shared libraries.
>
> If we're going with this, we should take a look what code is not in
> the hot startup path and split out that.

Not forgetting that the "cold" code won't be in the profile and
therefore won't benefit from PGO.

> AFAIK, the reason for linking everything into libxul was that startup
> is faster if we only need to open one library instead of multiple. If
> we split off parts we don't usually need at startup, we probably even
> make startup faster because the library to be loaded is smaller

This doesn't help with binary components which need to be loaded for
their module registration. It might help with other libraries that could
be loaded on demand, e.g. font validation, media codecs.

--
Warning: May contain traces of nuts.

Henri Sivonen

unread,
Dec 13, 2011, 5:12:31 AM12/13/11
to dev-pl...@lists.mozilla.org
On Tue, Dec 13, 2011 at 12:04 PM, Neil <ne...@parkwaycc.co.uk> wrote:
> This doesn't help with binary components which need to be loaded for their
> module registration. It might help with other libraries that could be loaded
> on demand, e.g. font validation, media codecs.

Does it help startup perf to take stuff off the XUL startup path if
it's rather likely that users with session restore turned on will have
something in their session that touches the on-demand code?

--
Henri Sivonen
hsiv...@iki.fi
http://hsivonen.iki.fi/

Mike Hommey

unread,
Dec 13, 2011, 5:19:37 AM12/13/11
to Neil, dev-pl...@lists.mozilla.org
With the use of manifests, we could make module registration
declarative, and thus remove the need for actually running any code from
a binary component until it's required.

Mike

Mike Hommey

unread,
Dec 13, 2011, 5:23:09 AM12/13/11
to Henri Sivonen, dev-pl...@lists.mozilla.org
On Tue, Dec 13, 2011 at 12:12:31PM +0200, Henri Sivonen wrote:
> On Tue, Dec 13, 2011 at 12:04 PM, Neil <ne...@parkwaycc.co.uk> wrote:
> > This doesn't help with binary components which need to be loaded for their
> > module registration. It might help with other libraries that could be loaded
> > on demand, e.g. font validation, media codecs.
>
> Does it help startup perf to take stuff off the XUL startup path if
> it's rather likely that users with session restore turned on will have
> something in their session that touches the on-demand code?

If we move towards "on-demand session restoration" being the default,
the likeliness goes down significantly. Even if we don't, the likeliness
that a significant amount of users will have several different restored
tabs using each and every on-demand code is imho pretty low.

Mike

Axel Hecht

unread,
Dec 13, 2011, 8:28:56 AM12/13/11
to
On 13.12.11 11:12, Henri Sivonen wrote:
> On Tue, Dec 13, 2011 at 12:04 PM, Neil<ne...@parkwaycc.co.uk> wrote:
>> This doesn't help with binary components which need to be loaded for their
>> module registration. It might help with other libraries that could be loaded
>> on demand, e.g. font validation, media codecs.
>
> Does it help startup perf to take stuff off the XUL startup path if
> it's rather likely that users with session restore turned on will have
> something in their session that touches the on-demand code?
>

Also, there's "snappy" and "usable for browsing" metrics, which will
probably give you different results.

Axel

Rafael Espindola

unread,
Dec 13, 2011, 10:41:20 AM12/13/11
to Jean-Paul Rosevear, Kyle Huey, dev-platform
> * Assertions:
> 1) We can't link PGO Windows builds because we are OOM at 3GB:
> https://bugzilla.mozilla.org/show_bug.cgi?id=709193

One thing that was mentioned on #developers is that while PGO implies
LTO in MSVC, it is possible to do just LTO (which Microsoft calls
Link Time Code Generation).

Has anyone tried just that? Does XUL links with just LTO? Do we still
get most of the performance improvements we get with PGO?

Cheers,
Rafael

Ted Mielczarek

unread,
Dec 13, 2011, 11:15:16 AM12/13/11
to Rafael Espindola, Jean-Paul Rosevear, dev-platform, Kyle Huey
We enabled LTO prior to enabling PGO. It was a perf win (I don't
remember exactly what), but enabling PGO on top of that was an
additional ~10% win on almost everything we measure.

-Ted

Robert Kaiser

unread,
Dec 13, 2011, 1:02:22 PM12/13/11
to
Henri Sivonen schrieb:
> On Tue, Dec 13, 2011 at 12:04 PM, Neil<ne...@parkwaycc.co.uk> wrote:
>> This doesn't help with binary components which need to be loaded for their
>> module registration. It might help with other libraries that could be loaded
>> on demand, e.g. font validation, media codecs.
>
> Does it help startup perf to take stuff off the XUL startup path if
> it's rather likely that users with session restore turned on will have
> something in their session that touches the on-demand code?

Depends on the likeliness of that stuff being used on pages. We don't
restore all pages at once, and a slight delay for background tabs should
never be a real problem if our UI is responsive and people can use the
awesomebar.

Robert Kaiser

--
Note that any statements of mine - no matter how passionate - are never
meant to be offensive but very often as food for thought or possible
arguments that we as a community should think about. And most of the
time, I even appreciate irony and fun! :)

Brian Smith

unread,
Dec 13, 2011, 6:59:25 PM12/13/11
to Mike Hommey, Kai Engert, dev-pl...@lists.mozilla.org, Taras Glek
Mike Hommey wrote:
> On Mon, Dec 12, 2011 at 02:43:42PM -0800, Brian Smith wrote:
> > We could, and probably will have to, permanently split libxul
> > (differently than we already do).

[snip]
PSM's nsNSSComponent is initialize on the main thread because it is actually used during startup. For example, the last time I checked, the first use of PSM during startup is to determine if there is a master password protecting the profile. Plus, nsNSSComponent must be initialized on the main thread before it is used by any other thread, and the only way we can guarantee that now is by loading it during startup on the main thread.

Definitely, we could reduce a LOT of the work that nsNSSComponent::Init is doing, to avoid loading pages of libssl, libsmime, etc. during startup. And, we should do this first, before combining all the NSS libraries into libxul or merging them together.

> As for NSS and NSPR, instead of folding them into libxul, we could
> merge
> them in less libraries than are currently provided: we don't have any
> use of NSPR being 3 different libraries, and NSS core being 4 more. On
> the long term, I'd be in favour of not using NSPR from Mozilla code at
> all, and leave it there only for NSS.

I agree that I would like to try this first, before trying to merge them into libxul.

> Anyways, the current status quo
> for these libraries is not so bad: startup performance is not affected
> by library loading so much anymore (and less so since FF7 iirc).

ORLY? I will investigate this when I have time.

- Brian

Ehsan Akhgari

unread,
Dec 13, 2011, 8:11:26 PM12/13/11
to Brian Smith, Mike Hommey, Kai Engert, dev-pl...@lists.mozilla.org, Taras Glek
I just landed https://hg.mozilla.org/mozilla-central/rev/221eccfa6a3f which
will likely fix the PGO build issue at least for some time. For more
details, please see the thread about simplifying the template usage started
today by besmedberg. Many thanks to Benjamin for his great suggestion!

--
Ehsan
<http://ehsanakhgari.org/>

Kyle Huey

unread,
Dec 14, 2011, 6:31:53 AM12/14/11
to Ehsan Akhgari, Brian Smith, Mike Hommey, Kai Engert, dev-pl...@lists.mozilla.org, Taras Glek
On Tue, Dec 13, 2011 at 8:11 PM, Ehsan Akhgari <ehsan....@gmail.com>wrote:

> I just landed https://hg.mozilla.org/mozilla-central/rev/221eccfa6a3fwhich
> will likely fix the PGO build issue at least for some time. For more
> details, please see the thread about simplifying the template usage started
> today by besmedberg. Many thanks to Benjamin for his great suggestion!


Unfortunately that's nowhere near enough. The tree is still restricted.

- Kyle

Dao

unread,
Dec 14, 2011, 6:42:18 AM12/14/11
to
On 14.12.2011 00:59, Brian Smith wrote:
> PSM's nsNSSComponent is initialize on the main thread because it is actually used during startup. For example, the last time I checked, the first use of PSM during startup is to determine if there is a master password protecting the profile.

I'm not sure what this means. A master password only protects passwords,
not the whole profile, and should only be required when loading a page
which you stored credentials for. This can happen shortly after startup,
of course.

Kai Engert

unread,
Dec 14, 2011, 10:56:33 AM12/14/11
to Brian Smith, Mike Hommey, dev-pl...@lists.mozilla.org, Taras Glek
On 14.12.2011 00:59, Brian Smith wrote:
>> Meaning that we could and should separate out all
>> these components that we don't need at startup.
>> PSM is very much such a candidate.
> PSM's nsNSSComponent is initialize on the main thread because it is actually used during startup. For example, the last time I checked, the first use of PSM during startup is to determine if there is a master password protecting the profile. Plus, nsNSSComponent must be initialized on the main thread before it is used by any other thread, and the only way we can guarantee that now is by loading it during startup on the main thread.

I agree with Brian. I disagree that PSM is a candidate. Firefox requires
SSL quite early. I consider SSL support a core requirement of our
networking.

> Definitely, we could reduce a LOT of the work that nsNSSComponent::Init is doing, to avoid loading pages of libssl, libsmime, etc. during startup. And, we should do this first, before combining all the NSS libraries into libxul or merging them together.

Please remember that NSPR/NSS is used by additional consumers.

The majority of the init code happens inside NSS, so any work should be
done at the NSPR/NSS level.

If you would like to speed this up (e.g. by factorizing init code), then
Mozilla should invest resources to implement such support inside NSS.

>> As for NSS and NSPR, instead of folding them into libxul, we could
>> merge
>> them in less libraries than are currently provided: we don't have any
>> use of NSPR being 3 different libraries, and NSS core being 4 more.

In my understanding this modularization has a reason, it helps
applications that need only a subset.

If you would like to find a way to merge them, instead of finding
solutions at the application level, try to find a solution at the
NSPR/NSS project level.

You could attempt to implement two separate build modes - one that keeps
the current separation, and a new one that combines everything together
- and have Mozilla use the latter.

>> On
>> the long term, I'd be in favour of not using NSPR from Mozilla code at
>> all, and leave it there only for NSS.

This seems unrealistic to me, I don't understand how that could be done
in a reasonable amount of time.

Replacing our core cross platform base (NSPR), including threading,
networking I/O, and changing all code to call new APIs, and dealing with
the fallout caused by implementation differences - that sounds like a
huge amount of work.

Kai

Mike Hommey

unread,
Dec 14, 2011, 11:19:32 AM12/14/11
to Kai Engert, Brian Smith, Taras Glek, dev-pl...@lists.mozilla.org
On Wed, Dec 14, 2011 at 04:56:33PM +0100, Kai Engert wrote:
> On 14.12.2011 00:59, Brian Smith wrote:
> >>Meaning that we could and should separate out all
> >>these components that we don't need at startup.
> >>PSM is very much such a candidate.
> >PSM's nsNSSComponent is initialize on the main thread because it is actually used during startup. For example, the last time I checked, the first use of PSM during startup is to determine if there is a master password protecting the profile. Plus, nsNSSComponent must be initialized on the main thread before it is used by any other thread, and the only way we can guarantee that now is by loading it during startup on the main thread.
>
> I agree with Brian. I disagree that PSM is a candidate. Firefox
> requires SSL quite early. I consider SSL support a core requirement
> of our networking.

There's quite early and quite early. You don't need SSL to display the
main UI yet. And in a vast majority of cases, you don't need SSL to
display the current tab on session restore.

> >Definitely, we could reduce a LOT of the work that nsNSSComponent::Init is doing, to avoid loading pages of libssl, libsmime, etc. during startup. And, we should do this first, before combining all the NSS libraries into libxul or merging them together.
>
> Please remember that NSPR/NSS is used by additional consumers.
>
> The majority of the init code happens inside NSS, so any work should
> be done at the NSPR/NSS level.
>
> If you would like to speed this up (e.g. by factorizing init code),
> then Mozilla should invest resources to implement such support
> inside NSS.

The problem is not a matter of code being run, it's a matter of
libraries being loaded at a time we don't need them. For instance,
somewhere between 5 and 10% (iirc) of the time spent uncompressing
libraries at startup on android is NSS. Not loading softokn and freebl
immediately is already a win (and not so long ago, these libs were
linked to libxul, so it wasn't even possible to load them after early
startup).
For the other libs, even with the upcoming incremental decompression
of libraries, we're still going to decompress their headers and
relocated sections. It would be better if we didn't have to do that at
all.

Mike

Kyle Huey

unread,
Dec 14, 2011, 2:15:31 PM12/14/11
to dev-platform
After disabling some code that doesn't really need to be in the tree (Skia)
and keeping Graphite turned off, I've measured a 32MB drop in the linker's
memory usage (that's about half a release cycle at our current burn rate).
There are patches in progress to get us another 41 MB.

At this point we're going to reopen the tree. It will be metered until we
clear the backlog, and m-i will remain closed until we're at a point where
it won't get 20 pushes per hour.

Also, people who plan on landing large things (read, importing brand new
chunks of third party code or massive new C++ features) should coordinate
with me first. I don't think we have anything major in the pipeline before
11 is set to branch, but I could be wrong!

- Kyle

Arthur

unread,
Dec 15, 2011, 2:02:48 AM12/15/11
to
On 11 дек, 21:27, Kyle Huey <m...@kylehuey.com> wrote:
> At the end of last week our Windows PGO builds started failing on
> mozilla-inbound (https://bugzilla.mozilla.org/show_bug.cgi?id=709193).
> After some investigation we determined that the problem seems to be that
> the linker is running out of virtual address space during the optimization
> phase.
>
> This is not the first time we've run into this problem (e.g. Bug 543034).
> A couple years ago we hit the 2 GB virtual address space limit.  The build
> machines were changed to use /3GB and that additional GB of address space
> bought us some time.  This time unfortunately the options aren't as easy as
> flipping a switch.
>
> As a temporary measure, we've turned off or ripped out a few new pieces of
> code (Graphite, SPDY, libreg) which has brought us back down under the
> limit for the moment.  We don't really know how much breathing space we
> have (but it's probably pretty small).
>
> Our three options at this point:
>
> 1) Make libxul smaller - Either by removing code entirely or by splitting
> things into separate shared libraries.
> 2) Move to MSVC 2010 - We know that changesets that reliably failed to link
> on MSVC 2005 linked successfully with MSVC 2010.  What we don't know is how
> much this helps (I expect the answer is somewhere between a lot and a
> little).  We can't really do this for (at the bare minimum) a couple more
> weeks anyways due to product considerations about what OSs we support.
> 3) Do our 32 bit builds on machines running a 64 bit OS.  This will allow
> the linker to use 4 GB of address space.
>
> I think we need to pursue a combination of (1) in the short term and (3) in
> the slightly less short term.  Gal has some ideas on what we can do for (1)
> that I'm investigating.
>
> In the mean time, mozilla-inbound is closed, and mozilla-central is
> restricted to approvals only.  The only things currently allowed to land on
> mozilla-central are:
>
> - Test-only/NPOTB changes
> - Changes that only touch Spidermonkey (which is not part of libxul on
> Windows, and thus not contributing to the problem).
> - Changes that only touch other cpp code that doesn't not end up in libxul
> (cpp code in browser/, things like sqlite, angle, nss, nspr, etc).
> - JS/XUL/HTML changes.
>
> I'm hopeful that we can hack libxul enough to get the tree open
> provisionally soon.
>
> - Kyle

Guys, how about to completely rewrite XUL? From ground-up? Come on,
guys, XUL is just HUUUUUGE. So huge that you need to radically shrink
it up to use on modern platforms, and especially on Android.
That was just idea, I'm not pushing you to do it right now.

Mike Hommey

unread,
Dec 15, 2011, 3:39:24 AM12/15/11
to dev-platform
On Wed, Dec 14, 2011 at 02:15:31PM -0500, Kyle Huey wrote:
> After disabling some code that doesn't really need to be in the tree (Skia)
> and keeping Graphite turned off, I've measured a 32MB drop in the linker's
> memory usage (that's about half a release cycle at our current burn rate).
> There are patches in progress to get us another 41 MB.

The patches in question landed and we're now back in business (that is,
we reopened the trees). I'll reiterate Kyle's comment that people who
plan on landing large things (brand new chunk of codes or massive new
C++ features) should coordinate with him first.

We should soon be monitoring the link.exe memory usage on the buildbots
(bug 710712) so that we avoid being taken by surprise again.

Thanks to all those who made it possible.

Mike

Nicholas Nethercote

unread,
Dec 15, 2011, 4:16:28 AM12/15/11
to Kyle Huey, dev-platform, release, dev-tree-management
I don't want to whine, but if this thread had been called "MSVC's
Linker Falls Short Again (Or, Why the Tree Is Closed)" we might have
avoided some negative headlines like in
http://www.pcworld.com/businesscenter/article/246039/firefox_gains_weight_challenging_its_developers.html.

Nick

xunxun

unread,
Dec 15, 2011, 4:56:02 AM12/15/11
to Nicholas Nethercote, Kyle Huey, dev-platform, release, dev-tree-management
Hi, Guys

If you use VC2005 now, you can use editbin to enable link.exe
LARGEADDRESS, then can excess 3GB memory.

--
Best Regards,
xunxun

Kyle Huey

unread,
Dec 15, 2011, 6:19:16 AM12/15/11
to Nicholas Nethercote, dev-platform
Yeah, trust me, I know. :-/

- Kyle

Mike Hommey

unread,
Dec 15, 2011, 8:17:33 AM12/15/11
to xunxun, dev-platform, Kyle Huey, Nicholas Nethercote, release, dev-tree-management
On Thu, Dec 15, 2011 at 05:56:02PM +0800, xunxun wrote:
> Hi, Guys
>
> If you use VC2005 now, you can use editbin to enable link.exe
> LARGEADDRESS, then can excess 3GB memory.

Nope, that allows to excess 2GB memory and go up to ... 3GB.

Mike

Brian Smith

unread,
Dec 15, 2011, 8:32:19 AM12/15/11
to Dao, dev-pl...@lists.mozilla.org
Dao wrote:
> I'm not sure what this means. A master password only protects
> passwords, not the whole profile, and should only be required
> when loading a page which you stored credentials for. This can
> happen shortly after startup, of course.

I do not remember the details. I remember setting a breakpoint in nsNSSComponent::Init() so I could find the magical place where components that must be loaded during startup are loaded. I found that nsNSSComponent::Init() was consistently being loaded by some JS code that was either using nsISecretDecoderRing and/or was checking to see if a master password was loaded.

It is possible that this code that loads nsNSSComponent shouldn't be executing during startup. But, *some* code does need to load nsNSSComponent during startup. It would be great to have this be well-defined instead of seemingly arbitrary.

- Brian

Robert Kaiser

unread,
Dec 15, 2011, 9:55:43 AM12/15/11
to
Arthur schrieb:
> Guys, how about to completely rewrite XUL? From ground-up? Come on,
> guys, XUL is just HUUUUUGE. So huge that you need to radically shrink
> it up to use on modern platforms, and especially on Android.

Do you have concrete data to support that? Otherwise this is just
flamebait or might even be seen as an attempt at trolling.
FWIW, I hear that Chrome libraries are actually bigger nowdays than the
sum of Firefox libraries, so I'm a bit unsure of your claims being true,
but without real data, there's no use in discussing this.

> That was just idea, I'm not pushing you to do it right now.

Then why are you even throwing it in the discussion here?

Neil

unread,
Dec 16, 2011, 12:07:36 PM12/16/11
to
xunxun wrote:

> If you use VC2005 now, you can use editbin to enable link.exe
> LARGEADDRESS, then can excess 3GB memory.

Of course link.exe is already large address aware anyway, otherwise we
would have panicked two years ago when we hit the 2GB limit.

Jean-Marc Desperrier

unread,
Dec 16, 2011, 12:12:24 PM