Fission MemShrink Newsletter #1: What (it is) and Why (it matters to you)

967 views
Skip to first unread message

Kris Maglione

unread,
Jul 10, 2018, 2:19:15 PM7/10/18
to Firefox Dev, dev-pl...@lists.mozilla.org
Welcome to the first edition of the Fission MemShrink newsletter.[1]

In this edition, I'll sum up what the project is, and why it matters to you.
In subsequent editions, I'll give updates on progress that we've made, and
areas that we'll need to focus on next.[2]


The Fission MemShrink project is one of the most easily overlooked aspects of
Project Fission (also known as Site Isolation), but is absolutely critical to
its success. And will require a company- and community-wide effort effort to
meet its goals.

The problem is thus: In order for site isolation to work, we need to be able
to run *at least* 100 content processes in an average Firefox session. Each of
those processes has its own base memory overhead—memory we use just for
creating the process, regardless of what's running in it. In the post-Fission
world, that overhead needs to be less than 10MB per process in order to keep the
extra overhead from Fission below 1GB. Right now, on our best-cast platform,
Windows 10, is somewhere between 17 and 21MB. Linux and OS-X hover between 25
and 35MB. In other words, between 2 and 3.5GB for an ordinary session.

That means that, in the best case, we need to reduce the memory we use in
content processes by *at least* 7MB. The problem, of course, is that there are
only so many places we can cut memory without losing functionality, and even
fewer places where we can make big wins. But, there are lots of places we can
make small and medium-sized wins.

So, to put the task into perspective, of all of the places we can cut a
certain amount of overhead, here are the number of each that we need to fix in
order to reach 1MB:

250KB: 4
100KB: 10
75KB: 13
50KB: 20
20KB: 50
10KB: 100
5KB: 200

Now remember: we need to do *all* of these in order to reach our goal. It's
not a matter of one 250KB improvement or 50 5KB improvements. It's 4 250KB *and*
200 5KB improvements. There just aren't enough places we can cut 250KB. If we
fall short in any of those areas, Project Fission will fail, and Firefox will be
the only major browser without site isolation.

But it won't fail, because all of you are awesome, and this is a totally
achievable goal if we all throw our effort behind it.

Essentially what this means, though, is that if we identify an area of
overhead that's 50KB[3] or larger that can be eliminated, it *has* to be
eliminated. There just aren't that many large chunks to remove. They all need
to go. And if an area of code has a dozen 5KB chunks that can be eliminated,
maybe they don't all have to go, but at least half of them do. The more the
better.


To help us triage these issues, we have a tracking bug (https://bugzil.la/memshrink-content),
and a per-bug whiteboard tag ([overhead:...]) which gives an estimate of how
much per-process overhead we believe fixing that bug would eliminate. Please
feel free to add blockers to the tracking bug if you think they're relevant, and
to add or update [overhead] tags if you have reasonable estimates.


With all of that said, here's a brief update of the progress we've made so far:

In the past month, unique memory per process[4] has dropped 3-4MB[5], and JS
memory usage in particular has dropped 1.1-1.9MB.

Particular credit goes to:

* Eric Rahm added an AWSY test suite to track base content process memory
(https://bugzil.la/1442361). Results:

Resident unique: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684862,1,4&series=mozilla-central,1684846,1,4&series=mozilla-central,1685133,1,4&series=mozilla-central,1685127,1,4
Explicit allocations: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1706218,1,4&series=mozilla-inbound,1706220,1,4&series=mozilla-inbound,1706216,1,4
JS: https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-central,1684866,1,4&series=mozilla-central,1685137,1,4&series=mozilla-central,1685131,1,4

* Andrew McCreight created a tool for tracking JS memory usage, and figuring
out which scripts and objects are responsible for how much of it
(https://bugzil.la/1463569).

* Andrew and Nika Layzell also completely rewrote the way we handle XPIDL type
info so that it's statically compiled into the executable and shared between
all processes (https://bugzil.la/1438688, https://bugzil.la/1444745).

* Felipe Gomes split a bunch of code out of frame scripts so that it could be
lazily loaded only when needed (https://bugzil.la/1467278, ...) and added a
whitelist of JSMs that are allowed to be loaded at content process startup
(https://bugzil.la/1471066)

* I did a bit of this too, and also prevented us from loading some other JSMs
before we need them (https://bugzil.la/1470333, https://bugzil.la/1469719,
...)

* Nick Nethercote made dynamic nsAtoms allocate their string storage inline
rather than use a refcounted StringBuffer (https://bugzil.la/1447951)

* Emilio Álvarez reduced the amount of memory the Gecko Profiler uses in
content processes.

* Nathan Froyd fixed our static nsAtom code so it didn't generate static
initializers (https://bugzil.la/1455178) and reduced the stack size of our
image decoder threads (https://bugzil.la/1443932).

* Doug Thayer reduced the number of hang monitor threads we start in each
process (https://bugzil.la/1448040)

* Boris Zbarsky removed a bunch of useless QueryInterface implementations
(https://bugzil.la/1452862), made our static isInstance methods use less
memory (https://bugzil.la/1452786), and generally deleted a bunch of
useless, legacy nsI* interfaces that required us to add extra vtable
pointers to a lot of DOM object instances.

And your humble author contributed the following:

* Changed our localization string bundles to use shared memory for bundles
which are loaded into content processes (https://bugzil.la/1470365).
This bug also adds some helpers which should make it easer to use shared
memory for more things in the future.

* Made some changes to the script preloader to avoid keeping an unnecessary
encoded copy of scripts in the content process (https://bugzil.la/1470793),
to drop cached single-use scripts (https://bugzil.la/1471091), and to improve
the set of scripts we load in content processes (https://bugzil.la/1471089).

* Made some smaller optimizations to avoid making copies of strings in
preference callbacks (https://bugzil.la/1472523), and to remove the XPC
compilation scope (https://bugzil.la/1442737)

Apologies to anyone I missed.


[1]: Please feel free to read the '.' as a '!' if you're so inclined. I
generally shy away from exclamation marks.
[2]: If this seems like a massive rip-off of Ehsan's Quantum Flow newsletter
format, that's because it is. Thanks, Ehsan :)
[3]: 50KB per process, which is to say 5MB across 100 content processes.
[4]: The total memory mapped by each content process which is not shared by
other processes. Approximately equal to USS.
[5]: It's hard to be precise, since the numbers can be noisy, and are often
bi-modal.

Randell Jesup

unread,
Jul 10, 2018, 3:38:37 PM7/10/18
to
>Welcome to the first edition of the Fission MemShrink newsletter.[1]

This is awesome and critical.

I'll note (and many of you know this well) that in addition to getting
rid of allocations (or making them lazy), another primary solution is to
move data out of the Content processes, and into the master process (or
some other shared process, if that's advisable for security or other
reasons), and access the data over IPC. Or you can move it to a shared
memory block (with appropriate locking if not static). For example, on
linux one of our worst offenders is fontconfig; Chrome for example
remotes much of that to the master process.

--
Randell Jesup, Mozilla Corp
remove "news" for personal email

Jean-Yves Avenard

unread,
Jul 11, 2018, 7:49:18 AM7/11/18
to Kris Maglione, dev-pl...@lists.mozilla.org, Firefox Dev
Hi

That’s great info, thank you.

There’s one place where we could gain heaps is in the media stack.
Currently, each content process allocate a thread-pool with at least 8 threads for use with the media decoders, each threads a default stack size of 256kB.
(https://searchfox.org/mozilla-central/source/xpcom/threads/nsIThreadManager.idl#53)

That stack size has been increased over the years due to the growing use of either system frameworks (in particular the mac CoreVideo framework that use over 200kB alone), and right now 256kB itself isn’t enough for the new AV1 decoder from libaom.

One of the work the media team has started, is to have all those decoders run in a dedicated process: the reason for this work was mostly done for security reasons, but there will be side gains memory-wise.

This work is tracked in bug 1471535 (https://bugzilla.mozilla.org/show_bug.cgi?id=1471535)

Once this is done, and we no longer calls decoders in the content process, the decoder process could use an increase stack size, while reducing the content process default stack size to 128kB (and maybe even 64kB)

That alone may be sufficient to achieve your mentioned goals.

An immediate intermediary step could be to use two different stack sizes as we pretty much know which one needs more over others.

JY


> On 10 Jul 2018, at 8:19 pm, Kris Maglione <kmag...@mozilla.com> wrote:
>
> Welcome to the first edition of the Fission MemShrink newsletter.[1]
>
> _______________________________________________
> firefox-dev mailing list
> firef...@mozilla.org
> https://mail.mozilla.org/listinfo/firefox-dev

David Bruant

unread,
Jul 11, 2018, 8:42:22 AM7/11/18
to Kris Maglione, dev-pl...@lists.mozilla.org, Firefox Dev
Thanks Kris for all this information and the beginning of the first issue
of this newsletter!

2018-07-10 20:19 GMT+02:00 Kris Maglione <kmag...@mozilla.com>:

> The problem is thus: In order for site isolation to work, we need to be
> able to run *at least* 100 content processes in an average Firefox session

I've seen this information of 100 content processes in a couple places but
i haven't been able to find the rationale for it. How was the 100 number
picked? Would 90 prevent a release of project fission?
How will the rollout happen?
Will the rollout happen progressively (like 2 content processes soon, 4
soon after, 10 some time after, etc.) or does it have to be 1 (current
situation IIUC) then 100?


* Andrew McCreight created a tool for tracking JS memory usage, and figuring
> out which scripts and objects are responsible for how much of it
> (https://bugzil.la/1463569).
>
How often is this code run? Is there a place to find the daily output of
this tool applied to a nightly build for instance?

Thanks again,

David

Boris Zbarsky

unread,
Jul 11, 2018, 2:08:13 PM7/11/18
to
On 7/11/18 5:42 AM, David Bruant wrote:
> I've seen this information of 100 content processes in a couple places but
> i haven't been able to find the rationale for it. How was the 100 number
> picked?

I believe this is based on telemetry for number of distinct sites
involved in browsing sessions.

> Would 90 prevent a release of project fission?

It would make it harder to ship to users, yes... Whether it "prevents"
would depend on other considerations.

> Will the rollout happen progressively (like 2 content processes soon, 4
> soon after, 10 some time after, etc.) or does it have to be 1 (current
> situation IIUC)

Current situation is 4 processes.

How we scale up from there is TBD.

-Boris

Andrew McCreight

unread,
Jul 11, 2018, 2:44:30 PM7/11/18
to David Bruant, Kris Maglione, dev-platform, Firefox Dev
On Wed, Jul 11, 2018 at 5:42 AM, David Bruant <brua...@gmail.com> wrote:

>
>
> * Andrew McCreight created a tool for tracking JS memory usage, and
>> figuring
>> out which scripts and objects are responsible for how much of it
>> (https://bugzil.la/1463569).
>>
> How often is this code run? Is there a place to find the daily output of
> this tool applied to a nightly build for instance?
>

You have to manually run this using a special build (hopefully I'll be able
to at least land code so that a special build is not needed). It isn't
clear from that description, but the focus here is on the chrome JS that is
part of the browser, rather than on websites. Reducing content process
chrome JS memory usage is going to have to be a big focus for this effort,
because I believe other browsers don't write their UI in JS, and the way
JIT stuff works it is harder to share code memory between processes than
with AOT compiled code.

If you look at about:memory, there's already a decent breakdown of how much
memory is used in JS for different things, but that doesn't help you figure
out which individual scripts are taking up memory. JSMs and content scripts
are run in only a few globals (to save memory), but that means that looking
up how much memory a global uses doesn't tell you much.


Andrew


> Thanks again,
>
> David

Kris Maglione

unread,
Jul 11, 2018, 2:45:17 PM7/11/18
to David Bruant, dev-pl...@lists.mozilla.org, Firefox Dev
On Wed, Jul 11, 2018 at 02:42:11PM +0200, David Bruant wrote:
>2018-07-10 20:19 GMT+02:00 Kris Maglione <kmag...@mozilla.com>:
>
>> The problem is thus: In order for site isolation to work, we need to be
>> able to run *at least* 100 content processes in an average Firefox session
>
>I've seen this information of 100 content processes in a couple places but
>i haven't been able to find the rationale for it. How was the 100 number
>picked?

So, the basic problem here is that we don't get to choose the number of
content processes we'll have. It will depend entirely on the number of
origins that we load documents from at any given time. In practice, the
biggest contributing factor to that number tends to be iframes (mostly
for things like ads and social widgets).

The "100 processes" number was initially chosen based on experimentation
(basically, counting the number of origins loaded by typical pages on
certain popular sites) and our knowledge of typical usage patterns. It's
meant to be a conservative estimate of the number of processes typical
users are likely to hit on a regular basis, though hopefully not all the
time.

For heavy users, we expect the number to be much higher[1]. And while those
users typically have more RAM to spare, they also tend not to be happy
when we waste it.

We also need to add to that number the Activity Stream process that
hosts things like about:newtab and about:home, the system extension
process, processes for any other extensions the user has installed
(which will each likely need their own processes for the same reasons
each content origin will), and the pre-loaded web content process[4].


We've been working on improving our estimates by collecting telemetry on
the number of document groups[2] per tab group[3]:

https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=1&end_date=2018-06-30&keys=__none__!__none__!__none__&max_channel_version=nightly%252F63&measure=TOTAL_HTTP_DOCGROUPS_PER_TABGROUP&min_channel_version=null&processType=*&product=Firefox&sanitize=0&sort_keys=submissions&start_date=2018-06-25&table=0&trim=1&use_submission_date=0

But we don't have enough data to draw conclusions yet.

> Would 90 prevent a release of project fission?

This isn't really something we get to choose. The closest I can come is
something like "would an overhead of 1.1GB prevent a release of project
Fission". And, while the answer may turn out to be "no", I'd prefer not
to speculate, because that's a decision we'd wind up paying for with
user dissatisfaction.

There are some other hacks that we can use to decrease the overall
overhead, like aggressively unloading background tabs, and flushing
their resources. We're almost certainly going to wind up having to do
some of that regardless, but it comes at a performance cost. The more
aggressive we have to be about it, the less responsive the browser is
going to wind up being. So, again, the shorter we fall on our memory
reduction efforts, the more we're going to pay in terms of user
satisfaction.

>How will the rollout happen?
> Will the rollout happen progressively (like 2 content processes soon, 4
>soon after, 10 some time after, etc.) or does it have to be 1 (current
>situation IIUC) then 100?
>
>
>* Andrew McCreight created a tool for tracking JS memory usage, and figuring
>> out which scripts and objects are responsible for how much of it
>> (https://bugzil.la/1463569).
>>
>How often is this code run? Is there a place to find the daily output of
>this tool applied to a nightly build for instance?

For the moment, it requires a patched build of Firefox, so we've been
running it locally as we try to track down and fix memory issues, and
Andrew has been periodically updating the numbers in the bug.

I believe Andrew has been working on updating the patch to a land-able
state (which is non-trivial), after which we'll hopefully be able to get
up-to-date numbers from automation.


[1]: Particularly readers of TechCrunch, which regularly loads 30
origins on a single page.
[2]: Essentially documents of different origin.
[3]: Essentially sets of tabs that are tied together because they were
opened by things like window.open() calls or link clicks from other
tabs.
[4]: Which currently have only one of, but may need more of in the
future in order to support loading several iframes in a given page
without noticeable lag or jank.

Randell Jesup

unread,
Jul 11, 2018, 4:07:47 PM7/11/18
to
>On 7/11/18 5:42 AM, David Bruant wrote:
>> I've seen this information of 100 content processes in a couple places but
>> i haven't been able to find the rationale for it. How was the 100 number
>> picked?
>
>I believe this is based on telemetry for number of distinct sites involved
>in browsing sessions.

As an example, 10 randomly chosen tabs in Chrome site isolation (a few
months ago) yielded ~80 renderers (Content processes). Some sites
generate a lot; that list of 10 included some which likely don't
generate more than 1 or 2: google.com, mozilla.org, facebook login page,
wikipedia (might spawn a few?).

>> Would 90 prevent a release of project fission?
>
>It would make it harder to ship to users, yes... Whether it "prevents"
>would depend on other considerations.

It's a continuum - the more memory we use, the more OOMs, the worse
we'll look (relative to Chrome), the larger impact on system perf, etc.
There's likely no hard line, but there may be a defined "we need to get
at least here" line, and for now that's 100 apparently (I wasn't
directly involved in picking it, so I don't know how "hard" it is).

We'll have to do more than just limit process sizes, but limiting
process sizes is basically table stakes, IMO.

Kris Maglione

unread,
Jul 11, 2018, 4:11:11 PM7/11/18
to Jean-Yves Avenard, dev-pl...@lists.mozilla.org, Firefox Dev
On Wed, Jul 11, 2018 at 01:49:04PM +0200, Jean-Yves Avenard wrote:
>There’s one place where we could gain heaps is in the media stack.
>Currently, each content process allocate a thread-pool with at least 8
>threads for use with the media decoders, each threads a default stack size of
>256kB.
>(https://searchfox.org/mozilla-central/source/xpcom/threads/nsIThreadManager.idl#53)
>
>That stack size has been increased over the years due to the growing use of
>either system frameworks (in particular the mac CoreVideo framework that use
>over 200kB alone), and right now 256kB itself isn’t enough for the new AV1
>decoder from libaom.
>
>One of the work the media team has started, is to have all those decoders run
>in a dedicated process: the reason for this work was mostly done for security
>reasons, but there will be side gains memory-wise.
>
>This work is tracked in bug 1471535
>(https://bugzilla.mozilla.org/show_bug.cgi?id=1471535)
>
>Once this is done, and we no longer calls decoders in the content process,
>the decoder process could use an increase stack size, while reducing the
>content process default stack size to 128kB (and maybe even 64kB)
>
>That alone may be sufficient to achieve your mentioned goals.

Thanks. Boris added this as a blocker.

It looks like it will be helpful, but unfortunately won't give us the 2MB
simple arithmetic would suggest. On Windows, at least, (and probably
elsewhere, but need to confirm) thread stacks are lazily committed, so as long
as the decoders aren't used in a process, the overhead is probably closer to
25KB per thread.

Shrinking the size of the thread pool and lazily spinning up threads when
they're first needed would probably save us 200KB per process, though...
>> * Andrew McCreight created a tool for tracking JS memory usage, and figuring
>> out which scripts and objects are responsible for how much of it
>> (https://bugzil.la/1463569).
>>
>> _______________________________________________
>> firefox-dev mailing list
>> firef...@mozilla.org
>> https://mail.mozilla.org/listinfo/firefox-dev
>



--
Kris Maglione
Senior Firefox Add-ons Engineer
Mozilla Corporation

It's always good to take an orthogonal view of something. It develops
ideas.
--Ken Thompson

Jean-Yves Avenard

unread,
Jul 11, 2018, 5:42:14 PM7/11/18
to Kris Maglione, dev-pl...@lists.mozilla.org, Firefox Dev
Hi

> On 11 Jul 2018, at 10:10 pm, Kris Maglione <kmag...@mozilla.com> wrote:
> Thanks. Boris added this as a blocker.
>
> It looks like it will be helpful, but unfortunately won't give us the 2MB simple arithmetic would suggest. On Windows, at least, (and probably elsewhere, but need to confirm) thread stacks are lazily committed, so as long as the decoders aren't used in a process, the overhead is probably closer to 25KB per thread.
>
> Shrinking the size of the thread pool and lazily spinning up threads when they're first needed would probably save us 200KB per process, though...

I haven’t looked much in details, not being an expert on this and having just finished watching the world cup…

A quick glance at the code gives me:

On mac/linux using pthread:
when a thread is created, the stack size is set using pthread_attr_setstacksize
https://searchfox.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptthread.c#355

On Linux, the man page is clear:
"The stack size attribute determines the minimum size (in bytes) that will be allocated for threads created using the thread attributes object attr.”

On mac, less so, I’m not sure what’s the behaviour there is, if it’s allocated or not…

On Windows:
https://searchfox.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95thred.c#151

the thread is created with STACK_SIZE_PARAM_IS_A_RESERVATION flag set. This will allocate the memory immediately.

The saving I was mentioning earlier isn’t just due to media decoder threadpool thread stack no longer needing to be that big, but that all other threadpools can be reduced too. Threadpools aren’t used only when playing a video/audio file.

Anyway, this needs further inspection… we’ll know soon :)

I do hope that the 100 process figures scenario that was given is a worse case scenario though...
JY

Mike Hommey

unread,
Jul 11, 2018, 5:54:17 PM7/11/18
to Jean-Yves Avenard, Kris Maglione, dev-pl...@lists.mozilla.org, Firefox Dev
On Wed, Jul 11, 2018 at 11:42:01PM +0200, Jean-Yves Avenard wrote:
> Hi
>
> > On 11 Jul 2018, at 10:10 pm, Kris Maglione <kmag...@mozilla.com> wrote:
> > Thanks. Boris added this as a blocker.
> >
> > It looks like it will be helpful, but unfortunately won't give us the 2MB simple arithmetic would suggest. On Windows, at least, (and probably elsewhere, but need to confirm) thread stacks are lazily committed, so as long as the decoders aren't used in a process, the overhead is probably closer to 25KB per thread.
> >
> > Shrinking the size of the thread pool and lazily spinning up threads when they're first needed would probably save us 200KB per process, though...
>
> I haven’t looked much in details, not being an expert on this and having just finished watching the world cup…
>
> A quick glance at the code gives me:
>
> On mac/linux using pthread:
> when a thread is created, the stack size is set using pthread_attr_setstacksize
> https://searchfox.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptthread.c#355
>
> On Linux, the man page is clear:
> "The stack size attribute determines the minimum size (in bytes) that will be allocated for threads created using the thread attributes object attr.”
>
> On mac, less so, I’m not sure what’s the behaviour there is, if it’s allocated or not…
>
> On Windows:
> https://searchfox.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95thred.c#151
>
> the thread is created with STACK_SIZE_PARAM_IS_A_RESERVATION flag set. This will allocate the memory immediately.

Allocate in this context means address space being consumed. It doesn't
mean memory being actually committed. Memory is only committed once
used, so only as much as what the code running in the thread actually
uses is committed (rounded to page size).

This means at least 4k per thread, so the more threads we have at
initialization, the more memory is committed. That being said, we're
talking about something akin to NUWA here, and presumably, we're talking
about processes that don't initialize everything.

Mike

Kris Maglione

unread,
Jul 11, 2018, 5:57:09 PM7/11/18
to Jean-Yves Avenard, dev-pl...@lists.mozilla.org, Firefox Dev
On Wed, Jul 11, 2018 at 11:42:01PM +0200, Jean-Yves Avenard wrote:
>> On 11 Jul 2018, at 10:10 pm, Kris Maglione <kmag...@mozilla.com> wrote:
>>It looks like it will be helpful, but unfortunately won't give us the 2MB
>>simple arithmetic would suggest. On Windows, at least, (and probably
>>elsewhere, but need to confirm) thread stacks are lazily committed, so as long
>>as the decoders aren't used in a process, the overhead is probably closer to
>>25KB per thread.
>
>I haven’t looked much in details, not being an expert on this and having just
>finished watching the world cup…
>
>A quick glance at the code gives me:
>
>On mac/linux using pthread:
>when a thread is created, the stack size is set using pthread_attr_setstacksize
>https://searchfox.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptthread.c#355
>
>On Linux, the man page is clear:
>"The stack size attribute determines the minimum size (in bytes) that will be
>allocated for threads created using the thread attributes object attr.”

Right, but allocation size doesn't imply that the memory is committed, just that
it's mapped. In general, anonymous mapped memory isn't actually committed (and
therefore doesn't become part of the process's USS) until it's touched.

>On Windows:
>https://searchfox.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95thred.c#151
>
>the thread is created with STACK_SIZE_PARAM_IS_A_RESERVATION flag set. This
>will allocate the memory immediately.

Allocate, yes, but not commit. That flag is actually what ensures that our
Windows thread stacks don't consume system memory until they're actually
touched.

>The saving I was mentioning earlier isn’t just due to media decoder threadpool
>thread stack no longer needing to be that big, but that all other threadpools
>can be reduced too. Threadpools aren’t used only when playing a video/audio
>file.

Reducing thread pool sizes would certainly be helpful. One unfortunate
side-effect of large thread pools is that, even with lazy commit thread stacks,
the more threads you run code on, the more stacks wind up with committed pages.

Karl Tomlinson

unread,
Jul 11, 2018, 7:25:58 PM7/11/18
to
Is there a guideline that should be used to evaluate what can
acceptably run in the same process for different sites?

I assume the primary goal is to prevent one site from reading
information that should only be available to another site?

There would also be defense-in-depth value from having each site
sandboxed separately because a security breach from one site could
not compromise another.

I guess a single compositor process is acceptable because there is
essentially no information returning from the compositor?

A font server may be acceptable, because information returned is
of limited power?

Use of system font, graphics, or audio servers is in a similar
bucket I guess.

Would using a single process for network be acceptable, not
because information returned is limited, but because we're willing
to have some compromise because there is a small API surface? Or
would that be acceptable because content JS does not run in that
process?

Would it be acceptable to perform layout in a single process for
multiple sites (if that were practical)?

Would it be easier to answer the opposite question? What should
not run in a shared process? JS is a given. Others?

Robert O'Callahan

unread,
Jul 11, 2018, 7:56:17 PM7/11/18
to Karl Tomlinson, dev-platform
On Thu, Jul 12, 2018 at 11:25 AM, Karl Tomlinson <moz...@karlt.net> wrote:

> Would it be easier to answer the opposite question? What should
> not run in a shared process? JS is a given. Others?
>

Currently when an exploitable bug is found in content process code,
attackers use JS to weaponize it with an arsenal of known techniques (e.g.
heap spraying and shaping). An important question is whether, assuming a
similar bug were found in a shared non-content process, how difficult would
it be for content JS to apply those techniques remotely across the process
boundary? That would be a pretty interesting problem for security
researchers to work on.

Use of system font, graphics, or audio servers is in a similar bucket I
> guess.
>

Taking control of an audio server would let you listen into phone calls,
which seems interesting.

Another question is whether you can exfiltrate cross-origin data by
performing side-channel attacks against those shared processes. You
probably need to assume that Spectre-ish attacks will be blocked at process
boundaries by hardware/OS mitigations, but there could be
browser-implementation-specific timing attacks etc. E.g. do IPDL IDs
exposed to content processes leak useful information about the activities
of other processes? Of course there are cross-origin timing-based
information leaks that are already known and somewhat unfixable :-(.

Rob
--
Su ot deraeppa sah dna Rehtaf eht htiw saw hcihw, efil lanrete eht uoy ot
mialcorp ew dna, ti ot yfitset dna ti nees evah ew; deraeppa efil eht. Efil
fo Drow eht gninrecnoc mialcorp ew siht - dehcuot evah sdnah ruo dna ta
dekool evah ew hcihw, seye ruo htiw nees evah ew hcihw, draeh evah ew
hcihw, gninnigeb eht morf saw hcihw taht.

Emilio Cobos Álvarez

unread,
Jul 12, 2018, 6:58:18 AM7/12/18
to dev-pl...@lists.mozilla.org
Thanks for doing this!

Just curious, is there a bug on file to measure excess capacity on
nsTArrays and hash tables?

WebKit has a bunch of bugs like:

https://bugs.webkit.org/show_bug.cgi?id=186709

Which seem relevant.

-- Emilio
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Tom Ritter

unread,
Jul 12, 2018, 11:10:59 AM7/12/18
to Karl Tomlinson, Mozilla
On Wed, Jul 11, 2018 at 6:25 PM, Karl Tomlinson <moz...@karlt.net> wrote:

> Is there a guideline that should be used to evaluate what can
> acceptably run in the same process for different sites?
>


This is on me to write. I have been slow at doing so mainly because there's
a lot of "What does X look like and where do its pats run" investigation I
feel I need to do to write it. (For X in at least { WebExtensions, WebRTC,
Compositing, Filters, ... })



> I assume the primary goal is to prevent one site from reading
> information that should only be available to another site?
>

Yep.



On Wed, Jul 11, 2018 at 6:56 PM, Robert O'Callahan <rob...@ocallahan.org>
wrote:

> On Thu, Jul 12, 2018 at 11:25 AM, Karl Tomlinson <moz...@karlt.net>
> wrote:
>
> > Would it be easier to answer the opposite question? What should
> > not run in a shared process? JS is a given. Others?
> >
>
> Currently when an exploitable bug is found in content process code,
> attackers use JS to weaponize it with an arsenal of known techniques (e.g.
> heap spraying and shaping). An important question is whether, assuming a
> similar bug were found in a shared non-content process, how difficult would
> it be for content JS to apply those techniques remotely across the process
> boundary?


You're completely correct.


> That would be a pretty interesting problem for security
> researchers to work on.
>

It's always illustrative to have exploits that demonstrate this goal in the
target of interest - they may have created generic techniques that we can
address fundamentally (like with Memory Partitioning or Allocator
Hardening). But people have been writing exploits for targets that don't
have a scripting environment for two decades or more, so all of those are
prior art for this sort of exploitation. This isn't a reason not to pursue
this work, and it's not saying this work isn't a net security win though!

I have been pondering (and brainstormed with a few people) about creating
something Google native-client-like to enforce process-like state
separation between threads in a single process. That might make it safer to
share utility processes between content processes. But it's considerably
less straightforward than I was hoping. Big open research question.


Use of system font, graphics, or audio servers is in a similar bucket I
> > guess.
> >
>
> Taking control of an audio server would let you listen into phone calls,
> which seems interesting.
>
> Another question is whether you can exfiltrate cross-origin data by
> performing side-channel attacks against those shared processes. You
> probably need to assume that Spectre-ish attacks will be blocked at process
> boundaries by hardware/OS mitigations, but there could be
> browser-implementation-specific timing attacks etc. E.g. do IPDL IDs
> exposed to content processes leak useful information about the activities
> of other processes? Of course there are cross-origin timing-based
> information leaks that are already known and somewhat unfixable :-(.


Yup!

-tom

Andrew McCreight

unread,
Jul 12, 2018, 11:56:42 AM7/12/18
to Emilio Cobos Álvarez, dev-platform
On Thu, Jul 12, 2018 at 3:57 AM, Emilio Cobos Álvarez <emi...@crisal.io>
wrote:

> Thanks for doing this!
>
> Just curious, is there a bug on file to measure excess capacity on
> nsTArrays and hash tables?
>
> WebKit has a bunch of bugs like:
>
> https://bugs.webkit.org/show_bug.cgi?id=186709
>
> Which seem relevant.
>

njn looked at that kind of issue at some point (he changed how arrays grow,
for instance, to reduce overhead), but it has probably been around 5 years,
so there may be room for improvement for things added in the meanwhile.
However, our focus here is really on reducing per-process memory overhead,
rather than generic memory improvements, because we've had a lot of focus
on the latter as part of MemShrink, but not the former, so there's likely
easier improvements to be had.

Andrew

Randell Jesup

unread,
Jul 12, 2018, 4:08:56 PM7/12/18
to
>I do hope that the 100 process figures scenario that was given is a worse case scenario though...

It's not. Worst case is a LOT worse.

Shutting down threads/threadpools when not needed or off an idle timer
is a Good thing. There may be some perf hit since it may mean starting
a thread instead of just sending a message at times; this may require
some tuning in specific cases, or leaving 1 thread or more running
anyways.

Stylo will be an interesting case here.

We may need to trade first-load time against memory use by lazy-initing
more things than now, though we did quite a bit on that already for
reducing startup time.

Kris Maglione

unread,
Jul 12, 2018, 4:19:44 PM7/12/18
to Emilio Cobos Álvarez, dev-pl...@lists.mozilla.org
On Thu, Jul 12, 2018 at 12:57:35PM +0200, Emilio Cobos Álvarez wrote:
>Thanks for doing this!
>
>Just curious, is there a bug on file to measure excess capacity on
>nsTArrays and hash tables?

I don't think so, but it's a good idea.

I've actually been thinking on filing a bug to do something similar, to
measure cumulative effects of excess padding in certain types since I
began looking into bug 1460674, and Sylvestre mentioned that
clang-analyzer can generate reports on excess padding.

It would probably be a good idea to try to roll this into the same
project.

One nice change coming up on this front is that bug 1402910 will probably
allow us to increase the load factors of most of our hashtables without
losing performance. Having up-to-date numbers for these things would
probably help decide how to prioritize those sorts of bugs.
>>(https://bugzil.la/memshrink-content), and a per-bug whiteboard tag
>>([overhead:...]) which gives an estimate of how much per-process
>>overhead we believe fixing that bug would eliminate. Please feel
>>free to add blockers to the tracking bug if you think they're
>>relevant, and to add or update [overhead] tags if you have
>>reasonable estimates.
>>
>>
>>With all of that said, here's a brief update of the progress we've
>>made so far:
>>
>>In the past month, unique memory per process[4] has dropped
>>3-4MB[5], and JS memory usage in particular has dropped 1.1-1.9MB.
>>
>>Particular credit goes to:
>>
>>* Eric Rahm added an AWSY test suite to track base content process memory
>>   (https://bugzil.la/1442361). Results:
>>
>>(https://bugzil.la/1470793),
>>   to drop cached single-use scripts (https://bugzil.la/1471091),
>>and to improve
>>   the set of scripts we load in content processes
>>(https://bugzil.la/1471089).
>>
>>* Made some smaller optimizations to avoid making copies of strings in
>>   preference callbacks (https://bugzil.la/1472523), and to remove the XPC
>>   compilation scope (https://bugzil.la/1442737)
>>
>>Apologies to anyone I missed.
>>
>>
>>[1]: Please feel free to read the '.' as a '!' if you're so inclined. I
>>     generally shy away from exclamation marks.
>>[2]: If this seems like a massive rip-off of Ehsan's Quantum Flow
>>newsletter
>>     format, that's because it is. Thanks, Ehsan :)
>>[3]: 50KB per process, which is to say 5MB across 100 content processes.
>>[4]: The total memory mapped by each content process which is not shared by
>>     other processes. Approximately equal to USS.
>>[5]: It's hard to be precise, since the numbers can be noisy, and are often
>>     bi-modal.
>>_______________________________________________
>>dev-platform mailing list
>>dev-pl...@lists.mozilla.org
>>https://lists.mozilla.org/listinfo/dev-platform
>_______________________________________________
>dev-platform mailing list
>dev-pl...@lists.mozilla.org
>https://lists.mozilla.org/listinfo/dev-platform

--
Kris Maglione
Senior Firefox Add-ons Engineer
Mozilla Corporation

NSS is what you would get if HP Lovecraft wrote crypto code.
--keeler

Gabriele Svelto

unread,
Jul 12, 2018, 4:27:27 PM7/12/18
to Kris Maglione, Emilio Cobos Álvarez, dev-pl...@lists.mozilla.org
On 12/07/2018 22:19, Kris Maglione wrote:
> I've actually been thinking on filing a bug to do something similar, to
> measure cumulative effects of excess padding in certain types since I
> began looking into bug 1460674, and Sylvestre mentioned that
> clang-analyzer can generate reports on excess padding.

I've encountered at least one structure where a boolean flag is 64-bits
in size on 64-bit builds. If we really want to go to the last mile we
might want to also evaluate things like tagged pointers; there's
probably some KiB's to be saved there too.

There's also more than one place where we're using strings to identify
stuff where we could use enums/integers instead. And yeah, my much
delayed refactoring of the observer service got a lot higher on my
priority list after reading this thread.

Gabriele

signature.asc

Kris Maglione

unread,
Jul 12, 2018, 4:30:41 PM7/12/18
to Andrew McCreight, Emilio Cobos Álvarez, dev-platform
On Thu, Jul 12, 2018 at 08:56:28AM -0700, Andrew McCreight wrote:
>On Thu, Jul 12, 2018 at 3:57 AM, Emilio Cobos Álvarez <emi...@crisal.io>
>wrote:
>
>> Thanks for doing this!
>>
>> Just curious, is there a bug on file to measure excess capacity on
>> nsTArrays and hash tables?
>
>njn looked at that kind of issue at some point (he changed how arrays grow,
>for instance, to reduce overhead), but it has probably been around 5 years,
>so there may be room for improvement for things added in the meanwhile.
>However, our focus here is really on reducing per-process memory overhead,
>rather than generic memory improvements, because we've had a lot of focus
>on the latter as part of MemShrink, but not the former, so there's likely
>easier improvements to be had.

I kind of suspect that improving the storage efficiency of hashtables (and
probably nsTArrays too) will have an out-sized effect on per-process memory.
Just at startup, for a mostly empty process, we have a huge amount of memory
devoted to hashtables that would otherwise be shared across a bunch of
origins—enough that removing just 4 bytes of padding per entry would save 87K
per process. And that number tends to grow as we populate caches that we need
for things like layout and atoms.

As much as I'd like to be able to share many of those caches between
processes, there are always going to need process-specific hashtables on top
of the shared ones for things that can't be/shouldn't be/aren't yet shared.
And that extra overhead tends to grow proportionally to the number of
processes we have.
>>> Resident unique: https://treeherder.mozilla.org
>>> /perf.html#/graphs?series=mozilla-central,1684862,1,4&series
>>> =mozilla-central,1684846,1,4&series=mozilla-central,
>>> 1685133,1,4&series=mozilla-central,1685127,1,4
>>> Explicit allocations: https://treeherder.mozilla.org
>>> /perf.html#/graphs?series=mozilla-inbound,1706218,1,4&series
>>> =mozilla-inbound,1706220,1,4&series=mozilla-inbound,1706216,1,4
>>> JS: https://treeherder.mozilla.org/perf.html#/graphs?series=mozi
>>> lla-central,1684866,1,4&series=mozilla-central,1685137,1,4&
>>> series=mozilla-central,1685131,1,4
>>>
Most of the great triumphs and tragedies of history are caused not by
people being fundamentally good or fundamentally evil, but by people
being fundamentally people.
--Terry Pratchett

Kris Maglione

unread,
Jul 12, 2018, 4:48:45 PM7/12/18
to Randell Jesup, dev-pl...@lists.mozilla.org
This is a really important point: Memory usage and performance deeply
intertwined.

There are hard limits on the amount of memory we can use, and the more
of it we waste needlessly, the less we have available for performance
optimizations that need it. In the worst (performance) case, we wind up
swapping, at which point performance may as well not exist.

We're going to have to make hard decisions about when/how often/how
aggressively we flush caches, spin down threads, unload tabs, ... The
more unnecessary overhead we save, the less extreme we're going to have
to be about this. And the better we get at spinning down unused threads
and evicting low impact cache entries, the less aggressive we're going
to have to be about the high impact ones. Throwing those things away
will have a performance impact, but not throwing them away will, in the
end, have a bigger one.

Kris Maglione

unread,
Jul 12, 2018, 4:51:23 PM7/12/18
to Gabriele Svelto, Emilio Cobos Álvarez, dev-pl...@lists.mozilla.org
On Thu, Jul 12, 2018 at 10:27:13PM +0200, Gabriele Svelto wrote:
>On 12/07/2018 22:19, Kris Maglione wrote:
>> I've actually been thinking on filing a bug to do something similar, to
>> measure cumulative effects of excess padding in certain types since I
>> began looking into bug 1460674, and Sylvestre mentioned that
>> clang-analyzer can generate reports on excess padding.
>
>I've encountered at least one structure where a boolean flag is 64-bits
>in size on 64-bit builds. If we really want to go to the last mile we
>might want to also evaluate things like tagged pointers; there's
>probably some KiB's to be saved there too.

I actually have a patch sitting around with helpers to make it super easy to
use smart pointers as tagged pointers :) I never wound up putting it up for
review, since my original use case went away, but it you can think of any
specific cases where it would be useful, I'd be happy to try and get it
landed.

smaug

unread,
Jul 12, 2018, 5:08:21 PM7/12/18
to Randell Jesup
One thing to remember that some of the child processes will be more important than others. For example all the processes used for browsing contexts in
the foreground tab should probably prefer performance over memory (in cases that is something we can choose from), but if a process
is only used for browsing contexts in background tabs and isn't playing any audio or such, it can probably use less memory hungry approaches.
Like, could stylo use fewer threads when used in background-tabs-only-processes, and once the process becomes foreground, more threads are created.
We have similar approach in many cases for performance and responsiveness reasons, but less often for memory usage reasons.

Xidorn Quan

unread,
Jul 12, 2018, 7:22:52 PM7/12/18
to dev-pl...@lists.mozilla.org
On Fri, Jul 13, 2018, at 7:08 AM, smaug wrote:
> One thing to remember that some of the child processes will be more
> important than others. For example all the processes used for browsing
> contexts in
> the foreground tab should probably prefer performance over memory (in
> cases that is something we can choose from), but if a process
> is only used for browsing contexts in background tabs and isn't playing
> any audio or such, it can probably use less memory hungry approaches.
> Like, could stylo use fewer threads when used in background-tabs-only-
> processes, and once the process becomes foreground, more threads are
> created.

I've filed a bug for this after I saw this email thread: https://bugzilla.mozilla.org/show_bug.cgi?id=1475091

- Xidorn

Cameron McCormack

unread,
Jul 12, 2018, 7:28:20 PM7/12/18
to Kris Maglione, Gabriele Svelto, Emilio Cobos Álvarez, dev-pl...@lists.mozilla.org
On Fri, Jul 13, 2018, at 6:51 AM, Kris Maglione wrote:
> I actually have a patch sitting around with helpers to make it super easy to
> use smart pointers as tagged pointers :) I never wound up putting it up for
> review, since my original use case went away, but it you can think of any
> specific cases where it would be useful, I'd be happy to try and get it
> landed.

Speaking of tagged pointers, I've used lower one or two bits for tagging a number of times, but I've never tried packing things into the high bits of a 64 bit pointer. Is that inadvisable for any reason? How many bits can I use, given the 64 bit platforms we need to support?

Nicholas Nethercote

unread,
Jul 12, 2018, 9:08:37 PM7/12/18
to Andrew McCreight, Emilio Cobos Álvarez, dev-platform
On Fri, Jul 13, 2018 at 1:56 AM, Andrew McCreight <amccr...@mozilla.com>
wrote:

> >
> > Just curious, is there a bug on file to measure excess capacity on
> > nsTArrays and hash tables?
>
> njn looked at that kind of issue at some point (he changed how arrays grow,
> for instance, to reduce overhead), but it has probably been around 5 years,
> so there may be room for improvement for things added in the meanwhile.
>

For a trip down memory lane, check out
https://blog.mozilla.org/nnethercote/2011/08/05/clownshoes-available-in-sizes-2101-and-up/.
The size classes described in that post are still in use today.

More usefully: if anyone wants to investigate slop -- which is only one
kind of wasted space, but an important one -- it's now really easy with DMD:
- Invoke DMD in "Live" mode (i.e. generic heap profiling mode, rather than
dark matter detection mode).
- Use the `--sort-by slop` flag with dmd.py.

Full instructions are at
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/DMD.

Nick

Randell Jesup

unread,
Jul 12, 2018, 10:56:02 PM7/12/18
to
>On 07/12/2018 11:08 PM, Randell Jesup wrote:
>> We may need to trade first-load time against memory use by lazy-initing
>> more things than now, though we did quite a bit on that already for
>> reducing startup time.
>
>One thing to remember that some of the child processes will be more
>important than others. For example all the processes used for browsing
>contexts in the foreground tab should probably prefer performance over
>memory (in cases that is something we can choose from), but if a
>process is only used for browsing contexts in background tabs and isn't
>playing any audio or such, it can probably use less memory hungry
>approaches.

Correct - we need to have observers/what-have-you for
background/foreground state (and we may want an intermediate state or
two - foreground-but-not-focused (for example a visible window that
isn't the focused window); recently-in-foreground (switching back and
forth); background-for-longer-than-delta, etc.

Modules can use these to drop caches, shut down unnecessary threads,
change strategies, force GCs/CCs, etc.

Some of this certainly already exists, but may need to be extended (and
used a lot more).

Gabriele Svelto

unread,
Jul 13, 2018, 7:37:17 AM7/13/18
to Randell Jesup, dev-pl...@lists.mozilla.org
On 13/07/2018 04:55, Randell Jesup wrote:
> Correct - we need to have observers/what-have-you for
> background/foreground state (and we may want an intermediate state or
> two - foreground-but-not-focused (for example a visible window that
> isn't the focused window); recently-in-foreground (switching back and
> forth); background-for-longer-than-delta, etc.
>
> Modules can use these to drop caches, shut down unnecessary threads,
> change strategies, force GCs/CCs, etc.
>
> Some of this certainly already exists, but may need to be extended (and
> used a lot more).

We already had most of this stuff in the ProcessPriorityManager [1]
which has be only ever used in Firefox OS. Since we had
one-process-per-tab there it was designed that way so it might need some
reworking to deal with one tab consisting of multiple content processes.

Also note that dealing with the "importance" of a page is not just a
matter of visibility and focus. There are other factors to take into
account such as if the page is playing audio or video (like listening to
music on YouTube), if it's self-updating and so on.

The only mechanism to reduce memory consumption we have now is
memory-pressure events which while functional are still under-used. We
might also need more fine grained mechanisms than "drop as much memory
as you can".

Gabriele

[1]
https://searchfox.org/mozilla-central/rev/46292b1212d2d61d7b5a7df184406774727085b8/dom/ipc/ProcessPriorityManager.cpp

signature.asc

Gabriele Svelto

unread,
Jul 13, 2018, 7:57:43 AM7/13/18
to dev-pl...@lists.mozilla.org
Just another bit of info to raise awareness on a thorny issue we have to
face if we want to significantly raise the number of content processes.
On 64-bit Windows we often consume significantly more commit space than
physical memory. This consumption is currently unaccounted for in
about:memory though I've seen hints of it being cause by the GPU driver
(or other parts of the graphics pipeline). I've filed bug 1475518 [1] so
that I don't forget and I encourage anybody with Windows experience to
have a look because it's something we _need_ to solve to reduce content
process memory usage.

Gabriele

[1] Commit-space usage investigation
https://bugzilla.mozilla.org/show_bug.cgi?id=1475518

signature.asc

David Major

unread,
Jul 13, 2018, 9:19:52 AM7/13/18
to gsv...@mozilla.com, dev-platform
This touches on a really important point: we're not the only ones
allocating memory.

Just a few that come to mind: GPU drivers, system media codecs, a11y
tools, and especially on Windows we have to deal with "utility"
applications, corporate-mandated gunk, and downright crapware.

When we're measuring progress toward our goals, look at not only your
own pristine dev box but also that one neighbor whose adware you're
always cleaning out.

Randell Jesup

unread,
Jul 13, 2018, 11:14:30 AM7/13/18
to
>On Thu, Jul 12, 2018 at 08:56:28AM -0700, Andrew McCreight wrote:
>>On Thu, Jul 12, 2018 at 3:57 AM, Emilio Cobos Álvarez <emi...@crisal.io>
>>wrote:
>>
>>> Just curious, is there a bug on file to measure excess capacity on
>>> nsTArrays and hash tables?
[snip]
>I kind of suspect that improving the storage efficiency of hashtables (and
>probably nsTArrays too) will have an out-sized effect on per-process
>memory. Just at startup, for a mostly empty process, we have a huge amount
>of memory devoted to hashtables that would otherwise be shared across a
>bunch of origins—enough that removing just 4 bytes of padding per entry
>would save 87K per process. And that number tends to grow as we populate
>caches that we need for things like layout and atoms.

Hash tables are a big issue. There are a lot of 64K/128K/256K
allocations at the moment for hashtables. When we started looking at
this in bug 1436250, we had a 256K, ~4 128K, and a whole bunch of 64K
hashtable allocs (on linux). Some may be smaller or gone now, but it's
still big.

I wonder if it's worth the perf hit to realloc to exact size hash tables
that are build-once - probably. hashtable->Finalize()? (I wonder if
that would let us make any other memory/speed optimizations if we know
the table is now static.)

Randell Jesup

unread,
Jul 13, 2018, 11:28:40 AM7/13/18
to
>On 13/07/2018 04:55, Randell Jesup wrote:
>> Correct - we need to have observers/what-have-you for
>> background/foreground state (and we may want an intermediate state or
>> two - foreground-but-not-focused (for example a visible window that
>> isn't the focused window); recently-in-foreground (switching back and
>> forth); background-for-longer-than-delta, etc.
>>
>> Modules can use these to drop caches, shut down unnecessary threads,
>> change strategies, force GCs/CCs, etc.

>Also note that dealing with the "importance" of a page is not just a
>matter of visibility and focus. There are other factors to take into
>account such as if the page is playing audio or video (like listening to
>music on YouTube), if it's self-updating and so on.

Absolutely

>The only mechanism to reduce memory consumption we have now is
>memory-pressure events which while functional are still under-used. We
>might also need more fine grained mechanisms than "drop as much memory
>as you can".

This is also very important for GeckoView

Felipe G

unread,
Jul 13, 2018, 1:59:31 PM7/13/18
to dev-pl...@lists.mozilla.org
>
>
>
> >Also note that dealing with the "importance" of a page is not just a
> >matter of visibility and focus. There are other factors to take into
> >account such as if the page is playing audio or video (like listening to
> >music on YouTube), if it's self-updating and so on.
>
> Absolutely
>

We should think about how we can make different performance and memory
trade-offs for processes that are hosting top-level frames and processes
hosting 3rd-party subframes

>

Kris Maglione

unread,
Jul 13, 2018, 2:57:43 PM7/13/18
to Randell Jesup, dev-pl...@lists.mozilla.org
On Fri, Jul 13, 2018 at 11:14:24AM -0400, Randell Jesup wrote:
>Hash tables are a big issue. There are a lot of 64K/128K/256K
>allocations at the moment for hashtables. When we started looking at
>this in bug 1436250, we had a 256K, ~4 128K, and a whole bunch of 64K
>hashtable allocs (on linux). Some may be smaller or gone now, but it's
>still big.
>
>I wonder if it's worth the perf hit to realloc to exact size hash tables
>that are build-once - probably. hashtable->Finalize()? (I wonder if
>that would let us make any other memory/speed optimizations if we know
>the table is now static.)

I think, as much as possible, we really want static or mostly-static
hash tables to be shared between processes. I've already been working on this
in a few areas, e.g., bug 1470365 for string bundles, which are completely
static, and bug 1471025 for preferences, which are mostly static.

And those patches add helpers which should make it pretty easy to do the same
for more things in the future, so that should probably be our go-to strategy
for reducing per-process overhead, when possible.

gsqu...@mozilla.com

unread,
Jul 13, 2018, 8:22:50 PM7/13/18
to
On Wednesday, July 11, 2018 at 4:19:15 AM UTC+10, Kris Maglione wrote:
> [...]
> Essentially what this means, though, is that if we identify an area of
> overhead that's 50KB[3] or larger that can be eliminated, it *has* to be
> eliminated. There just aren't that many large chunks to remove. They all need
> to go. And if an area of code has a dozen 5KB chunks that can be eliminated,
> maybe they don't all have to go, but at least half of them do. The more the
> better.

Some questions: -- Sorry if some of this is already common knowledge or has been discussed.

Are there tools available, that could easily track memory usage of specific things?
E.g., could I instrument one class, so that every allocation would be tracked automatically, and I'd get nice stats at the end?
Including wasted space because of larger allocation blocks?

Could I even run what-if scenarios, where I could instrument a class and extract its current size but also provide an alternate size (based on what I think I could make it shrink), and in the end I'll know how much I could save overall?

Do we have Try tests that simulate real-world usage, so we could collect memory-usage data that's relevant to our users, but also reproducible?

Should there be some kind of Talos-like CI tests that focus on memory usage, so we'd get some warning if a particular patch suddenly eats too much memory?

Boris Zbarsky

unread,
Jul 13, 2018, 8:43:11 PM7/13/18
to
On 7/13/18 5:22 PM, gsqu...@mozilla.com wrote:
> E.g., could I instrument one class, so that every allocation would be tracked automatically, and I'd get nice stats at the end?

You mean apart from just having a memory reporter for it?

> Including wasted space because of larger allocation blocks?

Memory reporters using mallocSizeOf include that space, yes.

> Could I even run what-if scenarios, where I could instrument a class and extract its current size but also provide an alternate size (based on what I think I could make it shrink), and in the end I'll know how much I could save overall?

You could hack the relevant memory reporter, sure.

> Do we have Try tests that simulate real-world usage, so we could collect memory-usage data that's relevant to our users, but also reproducible?

See the "awsy-10s" test suite, which sort of aims to do that.

> Should there be some kind of Talos-like CI tests that focus on memory usage, so we'd get some warning if a particular patch suddenly eats too much memory?

This is what awsy-e10s aims to do, yes.

-Boris
Reply all
Reply to author
Forward
0 new messages