Effect on tail memory metrics of a stable-sized, long-lived object?

15 views
Skip to first unread message

David Van Cleve

unread,
Apr 6, 2021, 8:51:11 PM4/6/21
to memory-dev, Charles Harrison
Hi memory-dev,

I'm working on a feature that introduces new storage implemented in the network service. We currently keep the feature's state in memory, which (perhaps not surprisingly) leads some metrics to tell us we are using more memory than before (dashboard link; sorry, Google-internal). 

We have some quota knobs we can tune to decrease this memory use, or we could consider architectural changes that keep less of the data in memory.

It seems like the biggest user-facing impact comes from tail memory metrics, which we don't see regress on the linked dashboard. I know memory metrics can be noisy, so I want to understand if there is a first-principles reason we would think that tail memory "really is regressing" even if the metrics don't show it.

If a feature reads (say) 20 KiB from disk into a long-lived in-memory object close to process initialization, would we always expect tail memory metrics to increase by a similar number? In this case, we could think through our optimization options without needing to have the regression confirmed by particular metrics.

Alternatively, are the moving pieces complex enough that understanding tail memory impact always requires observing an empirical regression empirically? (In this case, we'd probably hold off and revisit the numbers after a bigger, or longer, rollout.)

Thanks!

Bruce Dawson

unread,
Apr 6, 2021, 9:15:40 PM4/6/21
to David Van Cleve, memory-dev, Charles Harrison
We generally (on Windows for sure, I'm not sure about elsewhere) use memory commit as our way of measuring memory. This means that if you allocate 20 KiB of memory then (ignoring some randomness caused by heap fragmentation) all memory metrics should go up by 20 KiB. This is true even if some memory gets swapped out.

This is a good way to measure memory because if you measure memory by looking at the working-set of a process then memory metrics become "self-correcting" where using more memory can lead to the OS trimming your working set which then makes you use less memory which is just confusing and not useful.

So, my understanding is that you should see a 20 KiB increase everywhere, and where you are not it is due to limited precision and noise (or due to you causing the very high-percentile users to OOM fail at a slightly higher rate).

The main disclaimer would be that your change might have additional impacts on memory that you haven't noticed, but that should be the minimum impact.

Somebody else can correct me if I am wrong.

--
You received this message because you are subscribed to the Google Groups "memory-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to memory-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/memory-dev/CAMeJurcywkx-DaB1U9VL1iUk6OSKotqM1thNG4DqnfYRhGzBkA%40mail.gmail.com.


--
Bruce Dawson

Kentaro Hara

unread,
Apr 7, 2021, 1:03:15 AM4/7/21
to Bruce Dawson, David Van Cleve, memory-dev, Charles Harrison
+1 to what Bruce said.

In practice, how much memory are you talking about? If it's up to 20 KiB, it might not be noticeable even in the committed memory metrics. If the network service is using 10 MiB, 20 KiB is only 0.2%.






--
Kentaro Hara, Tokyo

David Van Cleve

unread,
Apr 7, 2021, 4:51:42 PM4/7/21
to Kentaro Hara, Bruce Dawson, memory-dev, Charles Harrison
Thanks, both! Very helpful: it seems we don't need empirical evaluation of the effect on the memory metrics and can reason about the feature's impact in isolation.

Kentaro: We'd expect to see 50 +- 20 KiB, give or take, with an unfortunate characteristic that each client will take a while to hit the upper bound (the feature's storage fills up gradually), which means it might not show up in metrics immediately after enabling the feature, even if it would later on.

Bartek Nowierski

unread,
Apr 8, 2021, 1:39:06 AM4/8/21
to David Van Cleve, Erik Chen, Benoit Lize, Siddhartha S, Kentaro Hara, Bruce Dawson, memory-dev, Charles Harrison, Bartek Nowierski
I see the UMA link is for Android. If I understand correctly, what Bruce said doesn't apply to non-Windows platforms. Benoit, Erik or Sidd may know more.


Erik Chen

unread,
Apr 8, 2021, 12:19:08 PM4/8/21
to Bartek Nowierski, David Van Cleve, Benoit Lize, Siddhartha S, Kentaro Hara, Bruce Dawson, memory-dev, Charles Harrison
If a feature reads (say) 20 KiB from disk into a long-lived in-memory object close to process initialization, would we always expect tail memory metrics to increase by a similar number?
How this gets measured on non-windows platforms depends on how this statement is implemented. Are you using mmap or are you allocating memory via malloc/new/partition-alloc? The statement of creating an "object" implies the latter. The latter will show up in memory metrics on all platforms, the former will likely not.

Siddhartha S

unread,
Apr 8, 2021, 2:22:39 PM4/8/21
to Erik Chen, Bartek Nowierski, David Van Cleve, Benoit Lize, Kentaro Hara, Bruce Dawson, memory-dev, Charles Harrison
> It seems like the biggest user-facing impact comes from tail memory metrics, which we don't see regress on the linked dashboard.

My opinion is that memory regression at the median is still bad. Using more memory at any stage will cause more swaps or more ooms based on our studies. If there is a regression that affects only median and not high percentile it could still impact user experience.

On Thu, Apr 8, 2021 at 9:19 AM Erik Chen <erik...@google.com> wrote:
If a feature reads (say) 20 KiB from disk into a long-lived in-memory object close to process initialization, would we always expect tail memory metrics to increase by a similar number?
How this gets measured on non-windows platforms depends on how this statement is implemented. Are you using mmap or are you allocating memory via malloc/new/partition-alloc? The statement of creating an "object" implies the latter. The latter will show up in memory metrics on all platforms, the former will likely not.

Memory maps in android will still show up in memory metrics as long as it is not shared across processes. In this case the browser reads from disk / allocates memory, and is not shared. So, it will show up in PrivateMemoryFootprint.

Siddhartha S

unread,
Apr 8, 2021, 2:46:05 PM4/8/21
to Erik Chen, Bartek Nowierski, David Van Cleve, Benoit Lize, Kentaro Hara, Bruce Dawson, memory-dev, Charles Harrison
On Thu, Apr 8, 2021 at 11:22 AM Siddhartha S <ss...@google.com> wrote:
> It seems like the biggest user-facing impact comes from tail memory metrics, which we don't see regress on the linked dashboard.

My opinion is that memory regression at the median is still bad. Using more memory at any stage will cause more swaps or more ooms based on our studies. If there is a regression that affects only median and not high percentile it could still impact user experience.

On Thu, Apr 8, 2021 at 9:19 AM Erik Chen <erik...@google.com> wrote:
If a feature reads (say) 20 KiB from disk into a long-lived in-memory object close to process initialization, would we always expect tail memory metrics to increase by a similar number?
How this gets measured on non-windows platforms depends on how this statement is implemented. Are you using mmap or are you allocating memory via malloc/new/partition-alloc? The statement of creating an "object" implies the latter. The latter will show up in memory metrics on all platforms, the former will likely not.

Memory maps in android will still show up in memory metrics as long as it is not shared across processes. In this case the browser reads from disk / allocates memory, and is not shared. So, it will show up in PrivateMemoryFootprint.
 
Sorry I mis-remembered. mmap would be accounted for only if it is MAP_ANONYMOUS, if it is backed by a file then we would not see it. The allocators mentioned by Erik will mmap anonymous segments internally, so they will show up. The file backed pages would not be counted for even though they are supposed to be "private footprint". 
Reply all
Reply to author
Forward
0 new messages