Changes to MEMORY_TOTAL telemetry on MacOS

50 views
Skip to first unread message

Paul Bone

unread,
Sep 26, 2022, 2:03:50 AM9/26/22
to dev-pl...@mozilla.org

G'day.

If you care about changes to memory telemetry then read-on.

TLDR: I changed how MEMORY_TOTAL is measured on MacOS and the values we
record are going to increase. This doesn't represent a true increase in
memory usage however, just how it's measured, and only on MacOS. Oh, and
it's inaccurate, but it was inaccurate before this change and we don't
good option, so we're going with least-bad, I hope. If you want to
know how much memory something uses about:memory is good.

We've had some problems measuring memory usage on MacOS recently. This
started when https://bugzilla.mozilla.org/show_bug.cgi?id=1546442#c41 added
a guard page within blocks of memory managed by jemalloc. The guard page
was added between the block's header and its payload. We noticed that our
"resident unique" memory usage halved. That's not right!

The cause was that by unmap()ing a page (or mprotect()ing it) within a memory
region would break the memory region into multiple regions. The problem was
that now the memory regions are marked as shared and our measurement of
"resident unique" memory discounted them thinking them to be shared.
So by using different APIs within MacOS we can check if they're really
shared memory (between processes) or private memory that has been aliased into
more than one mapping.
https://bugzilla.mozilla.org/show_bug.cgi?id=1743781
BTW, This is (almost) the most accurate way to measure memory.

The problem however is that the new API is slow. And not only is it slow
but it seems to jank any other thread/process that may share memory
mappings. https://bugzilla.mozilla.org/show_bug.cgi?id=1779138
That's okay for something like about:memory's memory report, where we still
use it to calculate the resident-unique measurement. But it's not okay for
telemetry, which may run periodically and affect a user's experience. We
disabled MEMORY_TOTAL telemetry on MacOS temporarily.

I've now clicked LAND on
https://bugzilla.mozilla.org/show_bug.cgi?id=1786860 which re-enables the
MEMORY_TOTAL telemetry, but using a different measurement. It uses the
"physical footprint" figure as calculated by MacOS. This is a nice
measurement when considering a single process, or when you ask MacOS to
calculate it for a set of processes on the command line (too slow for us to
use). Exactly how it's calculated seems to be an "implementation detail",
but you can read XNU sources if you like. But the intention is that it
represents memory that is "dirtied", in other words, that there would be a
cost to swapping out if the kernel decided to do so. It also includes
shared memory, and that's the problem for Firefox telemetry, we query the
physical footprint for each process and then add them together, meaning we
over-count shared memory. This is why MEMORY_TOTAL will now be larger on
MacOS and won't be accurate (over-counting). However it wasn't accurate
before (completely ignoring shared memory AND counting a lot of private
aliased memory as shared memory).

All we can really say about MEMORY_TOTAL, before and after these changes is
that if it's stable-over-time or trending downwards that's good. And if it's
trending upwards that's possibly-not-good (but maybe we're using the memory
to ship new useful features).

What could we do going forward?

* We could account for the shared memory we know about (eg IPC) and
calculate it once when calculating MEMORY_TOTAL.

* We could do nothing, MEMORY_TOTAL was inaccurate before and the world
didn't end. Maybe it's better now because you read this e-mail and now
*know* that it's inaccurate and won't make false assumptions.

* We could remove this telemetry to avoid it confusing anyone.


Emilio Cobos Álvarez

unread,
Sep 26, 2022, 4:54:40 AM9/26/22
to Paul Bone, dev-pl...@mozilla.org
Hi Paul,

Thanks for working on this, the previous state was clearly very unfortunate!

Do I understand correctly that, if I move something that takes 10MB per process to shared memory, in a way that it uses a total of 20MB of shared memory for all processes, that'd be reported as a MEMORY_TOTAL increase on macOS (even though we improved memory usage)?

If so, it seems like a pretty big caveat... I guess it might be fine, assuming we have precise memory reporting on other platforms? Still, it seems like the kind of thing that's likely to make someone scratch their head for a while until they find the right root cause (and then find there's not much they can do about it).

Do we / should we have some documentation for this telemetry probe we could update / create, to prevent knowledge about this quirk getting lost? Maybe just a note around here or a comment here or in the header could do... Not sure :)

Thanks!

 -- Emilio

--
You received this message because you are subscribed to the Google Groups "dev-pl...@mozilla.org" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev-platform...@mozilla.org.
To view this discussion on the web visit https://groups.google.com/a/mozilla.org/d/msgid/dev-platform/20220926060343.GA10084%40aluminium.

Kris Maglione

unread,
Sep 26, 2022, 7:05:50 PM9/26/22
to Emilio Cobos Álvarez, Paul Bone, dev-pl...@mozilla.org
On Sun, Sep 25, 2022, 22:54 Emilio Cobos Álvarez <emi...@mozilla.com> wrote:
Do I understand correctly that, if I move something that takes 10MB per process to shared memory, in a way that it uses a total of 20MB of shared memory for all processes, that'd be reported as a MEMORY_TOTAL increase on macOS (even though we improved memory usage)?

That is, unfortunately, what it means. But it will only show up that way in telemetry, not in automation or about:memory. And it also won't show up that way in telemetry for other platforms, though our MEMORY_TOTAL telemetry isn't perfect anywhere (it's a really hard measurement to do well, and especially to do well cheaply, which telemetry requires)

Paul Bone

unread,
Sep 26, 2022, 9:21:07 PM9/26/22
to Kris Maglione, Emilio Cobos Álvarez, dev-pl...@mozilla.org
On Mon, Sep 26, 2022 at 01:05:35PM -1000, Kris Maglione wrote:
> On Sun, Sep 25, 2022, 22:54 Emilio Cobos Álvarez <emi...@mozilla.com> wrote:
>
> > Do I understand correctly that, if I move something that takes 10MB per
> > process to shared memory, in a way that it uses a total of 20MB of shared
> > memory for all processes, that'd be reported as a MEMORY_TOTAL *increase*
> > on macOS (even though we improved memory usage)?
> >
>
> That is, unfortunately, what it means. But it will only show up that way in
> telemetry, not in automation or about:memory. And it also won't show up
> that way in telemetry for other platforms, though our MEMORY_TOTAL
> telemetry isn't perfect anywhere (it's a really hard measurement to do
> well, and especially to do well cheaply, which telemetry requires)
>

I'd like to add that removing it from telemetry altogether is not the worst
idea. It prevents it from misleading anyone who doesn't find the docs
(which I will add, thanks for the reminder Emilio).

Reply all
Reply to author
Forward
0 new messages