Intent to change Task Manager to display consistent memory metrics.

115 views
Skip to first unread message

Erik Chen

unread,
Sep 5, 2017, 10:03:25 PM9/5/17
to Chromium-dev
tl dr: The existing metrics vary by platform, but generally fails to report compressed and swapped memory [among other inconsistencies]. Both memory-infra, and UMA/Heartbeats have already been updated to support "Private Memory Footprint".  

See: https://docs.google.com/document/d/1PZyRzChnvkUNUB85Op46aqkFXuAGUJi751DJuB6O40g/edit#


erik...@chromium.org

unread,
Sep 8, 2017, 8:35:19 PM9/8/17
to Chromium-dev
There's currently some discussion happening on my CL: https://chromium-review.googlesource.com/c/chromium/src/+/646047. I'm going to move that discussion here.

> Patch Set 4:

> 

> Okay, that helps. I'd still like to have a bit more information in the CL about the implications.

> 

> My concern is that this new number is over counting. I haven't tested but my guess would be that the new number is showing the private commit charge for the process. Chrome sometimes commits far more memory than it uses and that leads to over-estimates. Let's discuss more Monday.


There is no, single perfect memory measurement. For any stat that is chosen, I can come up with an artificial scenario with the stat doesn't perform "appropriately".


The concern with committed memory is that it "over-estimates". 


For a more thorough discussion, see the original design doc for "consistent memory metrics". For an in-depth analysis focus on Windows, see Albert's analysis.


Two observations:

1) Let us make the assumption that we want to measure resident + swapped/compressed memory, and don't want to committed, but otherwise unused memory on Windows. 


First of all, this concept doesn't directly translate to other platforms. Linux has overcommit, which defaults on. macOS behaves similarly, but I'm not aware of any toggle-able flags. On both Linux and macOS, we measure resident + swapped/compressed memory. As I point in my doc, these measurements [for the renderer] are almost identical between macOS and Windows [within 1%]!


This suggests that there is very little memory that is committed, but neither resident nor swapped. And this makes sense. While there are valid use cases for committing large swathes of memory but not using it...we happen to not use them within Chrome. Instead, we usually see Linux-based Chromium developers committing large swathes of memory in error, because they don't realize that "over_commit" doesn't exist on Windows! For example, in this thread that occurred two weeks ago: Chromium developers accidentally wrote code that committed 512MB on Windows, which was not their actual intent! 


2) This is the same measurement we're using for UMA stats, chirp alerts, heartbeat metrics, memory-infra, etc. We believe that functionally, this metric allows us to solve problems [catch regressions, observe improvements, etc.]. 

bruce...@chromium.org

unread,
Sep 11, 2017, 2:31:40 PM9/11/17
to Chromium-dev
If private-committed memory does not significantly over-estimate private-working-set plus private-swapped/trimmed then I agree that this could be a good metric. But is that true on Windows? In the referenced document flipboard uses ~50 MB more by the new measure, wikipedia uses 16 MB more, the GPU process uses 57 MB more, etc. These are significant in absolute terms and as a percentage.

It is true that Windows does not support over-commit, so a badly behaved process that commits significantly more memory than it intends to use can potentially cause "out-of-memory" failures. In practice I think this is very rare. Much more common is for machines to run out of address-space or run out of physical memory. Our move to 64-bit means that running out of address space is no longer an issue, but running out of physical memory is.

Displaying our commit charge instead of our private working set will make us look worse and I don't think it aligns with what we actually want to optimize for. I can see it being advantages to monitor commit charge so that we can detect abuses of it, but commit charge does not equal memory.

Displaying private-working-set plus swapped/trimmed-pages would be ideal but I don't know a way to get that information on Windows.

erik...@chromium.org

unread,
Sep 14, 2017, 6:43:25 PM9/14/17
to Chromium-dev, Albert J. Wong (王重傑), Nick Carter
+ ajwong, nick

Thank you Bruce! You've brought up several very good points. High-level thoughts first, then responses inline.

I believe that we should move forward with changing to Task Manager to use private memory footprint on Windows, under a new name [Memory Footprint], which is on by default. We should rename the existing column to be "Working Set", and have it be hidden by default. 

I considered picking the name "Commit Charge" for the new column, which reflects the current calculation, but we expect to change the calculation at least once [to include shared memory] and possibly twice [to reduce the over-counting of allocators], in the not-too-distant future. 

This decision is primarily based on the fact that the current number [private working set] under-represents Chrome's memory usage [with no lower limit] when memory starts to get compressed/swapped. I couldn't find stats on this, but I believe that most machines have non-zero compressed/swapped memory most of the time.

The new calculation does have flaws. Commit charge overcounts private memory footprint by a relatively small, fixed amount for renderers [0-20MB]. It also overcounts GPU memory usage by an unknown amount [at least 50 MB has been observed]. UMA stats for this calculation of the GPU process suggest that this does not over-count by much more than that. 

On Monday, September 11, 2017 at 11:31:40 AM UTC-7, bruce...@chromium.org wrote:
If private-committed memory does not significantly over-estimate private-working-set plus private-swapped/trimmed then I agree that this could be a good metric. But is that true on Windows? In the referenced document flipboard uses ~50 MB more by the new measure, wikipedia uses 16 MB more, the GPU process uses 57 MB more, etc. These are significant in absolute terms and as a percentage.
I went through these examples. The large difference in flipboard is due to GC timings. The largest difference in renderer memory usage I observed was ~20MB, in extension processes. Using vmmap and ETW to examine the difference, the three largest factors are responsible for >90% of the difference.
  • Overcounting of PartitionAlloc regions.
  • Overcounting of v8 regions.
  • Overcounting of HeapAlloc regions.
All three have the same root cause - Implementation of allocators on Windows will commit a region larger than they need [e.g. 2MB for partition alloc], and then dole out blocks from the region. If most blocks have never been used, there will be a large difference between working set and commit charge. In my observations, PartitionAlloc is responsible for >50% of the difference. This difference is particularly large for small Extension processes. This makes sense, since we expect Extension processes to have mostly-empty PartitionAlloc regions, since they don't allocate many objects.

It's possible to update allocators that we control to emit numbers that we can use to create a better estimate of private memory footprint. I've filed a bug for it here. That being said, I view this as an optional optimization that is not necessary to roll out a significant improvement to the numbers reported by the task manager.
 

It is true that Windows does not support over-commit, so a badly behaved process that commits significantly more memory than it intends to use can potentially cause "out-of-memory" failures. In practice I think this is very rare. Much more common is for machines to run out of address-space or run out of physical memory. Our move to 64-bit means that running out of address space is no longer an issue, but running out of physical memory is.
Fair enough!
 

Displaying our commit charge instead of our private working set will make us look worse
Absolutely agreed! That being said, using resident set makes us look "artificially good" under high memory-pressure scenarios. I think that this is a complex topic, and I'm happy to discuss it offline and/or with PMs, etc., but don't think that this should sway us one way or another.
 
and I don't think it aligns with what we actually want to optimize for. I can see it being advantages to monitor commit charge so that we can detect abuses of it, but commit charge does not equal memory.
I think we disagree on this point. If a developer calls malloc(10MB), and never uses that region, the result will have no impact on working set but have 10MB on commit charge. And this would be a bug that we want to avoid. If a developer calls malloc(10MB), writes to the region, and then never uses it again, causing the region to eventually swap out, this will have no impact on working set but 10MB on commit charge. These two examples seem trivial and silly, but they're representative of real bugs that get written by Chromium developers, which otherwise are not visible by looking at just resident set. 

Or to frame this question differently: When we want to ignore an increase to commit charge?

I can think of two cases right now: GPU drivers - clearly we have no insight into what's going on here, but we don't know with resident set either. And we don't have too much control. And there's noise due to implementation of allocators, specifically w.r.t. allocation granularity and block size. But that is a fixed, relatively small quantity, whose noise is proportional to allocation block size [e.g. relatively small for HeapAlloc, much larger for PartitionAlloc]. 
 

Displaying private-working-set plus swapped/trimmed-pages would be ideal but I don't know a way to get that information on Windows.
To the best of my knowledge, there are no public APIs that will differentiate between: committed memory that has never been touched, and committed memory that has been touched, and subsequently swapped/trimmed. As such, our options are to start with working set, and attempt to "add back in" swapped/trimmed regions, or start with committed memory, and attempt to "subtract out" never-touched regions. 
 

Bruce Dawson

unread,
Sep 14, 2017, 9:11:42 PM9/14/17
to Erik Chen, Chromium-dev, Albert J. Wong (王重傑), Nick Carter
I am glad (especially after discussing in person) that we all agree on the technical details and the pros/cons of the different approaches.

I really like the idea of giving the new memory number a new name. I also think we should seriously consider either a help page (F1 on task manager does nothing) or tooltips (hovering over the current memory column does nothing) so that we have an opportunity to explain the meaning of the number. If we start applying adjustments then we will end up with a number that doesn't match anything you can find in procexp or task manager and I think we owe it to people to make the actual meaning discoverable.

Are we going to try to get PartitionAlloc to commit fewer bytes? If we did that then our new memory number would be smaller, our commit numbers would also be smaller in procexp and task manager, and our actual commit charge would be less. This would make it much easier, immediately, to see how much memory had been paged out by Chrome, regardless of what our task manager says. Put another way, if commit is the number that we care about then shouldn't we try to reduce it?

If we can reduce our commit charge then that number becomes so much more trustworthy and useful. We can't do anything about it in the GPU process but for every other process we should be able to get fairly tightly bounded constraints on commited-pages-that-we-never-touched, which would make having that number as the one-true-number more unambiguously good.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to a topic in the Google Groups "Chromium-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/chromium.org/d/topic/chromium-dev/ELSYMXnvbBc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to chromium-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/e98f3951-b326-4396-a09c-18578ceafd01%40chromium.org.

Albert J. Wong (王重傑)

unread,
Sep 15, 2017, 1:19:49 PM9/15/17
to Bruce Dawson, Erik Chen, Chromium-dev, Nick Carter, pal...@chromium.org, Kentaro Hara
[ +palmer and haraken for a heads up on PartitionAlloc ]

Regarding PartitionAlloc, it's on my list for next week to try to try exactly what you are suggesting: see if we can reduce the commit charge for untouched memory.  I believe this is totally possible by using MEM_RESERVE to grab a continuous address space extent and then doing MEM_COMMIT as necessary with a high-watermark per extent. The current goal is to try to make each PA bucket only eat up a few extra pages (1 is ideal, but will need to look at object size and cost of multiple syscalls to MEM_COMMIT) at the ragged edge.

As a caveat though, doing a 2-step MEM_RESERVE/MEM_COMMIT introduces an extra syscall transitions on a normal allocation when we step over the high water mark.  There's some stuff to play with here to try and amortize the cost, but we may up in a scenario where we're trading actual overall performance for more understandable/reliable metric and have to reopen this discussion.  So regarding "shouldn't we try to reduce this number?"...yes...we should and will, but I want to explicitly state the (possibly obvious) point that metric and memory reduction is not a good thing. The goal should be to understand and reduce waste, not to just universally reduce.

Regarding the metric rename, agreed we need to make it possible to understand our terms. Not gonna wade into the UI quagmire, but at the very least, we need solid documentation publicly in Markdown on what "footprint" means.  @Erikchen: should we bubble up priority on fixing key_concepts.md or similar? We need an executive summary of your CMM doc.

My take away from this is there are items to fix for readability and some exploration to do on our various heaps, but I think we're agreed on making this change with the blocker being on ensuring people can understand the terms we used?  Does that sound right?

-Albert


Wez

unread,
Sep 15, 2017, 1:52:04 PM9/15/17
to ajw...@chromium.org, Bruce Dawson, Erik Chen, Chromium-dev, Nick Carter, pal...@chromium.org, Kentaro Hara
(and now from the correct account)

On 15 September 2017 at 10:48, Wez <w...@google.com> wrote:
On 15 September 2017 at 10:17, Albert J. Wong (王重傑) <ajw...@chromium.org> wrote:
[ +palmer and haraken for a heads up on PartitionAlloc ]

Regarding PartitionAlloc, it's on my list for next week to try to try exactly what you are suggesting: see if we can reduce the commit charge for untouched memory.  I believe this is totally possible by using MEM_RESERVE to grab a continuous address space extent and then doing MEM_COMMIT as necessary with a high-watermark per extent. The current goal is to try to make each PA bucket only eat up a few extra pages (1 is ideal, but will need to look at object size and cost of multiple syscalls to MEM_COMMIT) at the ragged edge.

As a caveat though, doing a 2-step MEM_RESERVE/MEM_COMMIT introduces an extra syscall transitions on a normal allocation when we step over the high water mark.
There's some stuff to play with here to try and amortize the cost, but we may up in a scenario where we're trading actual overall performance for more understandable/reliable metric and have to reopen this discussion.  So regarding "shouldn't we try to reduce this number?"...yes...we should and will, but I want to explicitly state the (possibly obvious) point that metric and memory reduction is not a good thing. The goal should be to understand and reduce waste, not to just universally reduce.

This is a critical point; we really need to take care not to over-focus on reducing metrics like the commit-charge, and stay focused on the user-facing impact of our memory usage patterns.

(As an illustrative example, we sometimes focus on line-coverage as a metric for test coverage, but that is typically only worthwhile to improve when it is very poor - reaching 100% line-coverage doesn't necessarily guarantee good test-coverage, so once line-coverage is "good enough", it's usually better to focus efforts elsewhere.)

Regarding the metric rename, agreed we need to make it possible to understand our terms.

Right; which metric to show, and its meaning, is in part a function of what we are able to query from the platform, but also a function of what question it is that we're aiming to answer.

If the typical questions we're answering are things like

- "what is making my device swap-jank?"
- "why do programs keep OOMing?"
- "is my site's memory usage 'reasonable' given what it is doing?"

then Erik's CMM in combination with other metrics (e.g. hard fault count for the swap-jank question) are likely good-enough, given clarity as to their meaning.

I would argue that our existing crop of metrics make these sorts of questions hard to answer without a deep understanding of both Chrome and platform memory-management...
 
Not gonna wade into the UI quagmire, but at the very least, we need solid documentation publicly in Markdown on what "footprint" means.  @Erikchen: should we bubble up priority on fixing key_concepts.md or similar? We need an executive summary of your CMM doc.

Yes please! :)

My take away from this is there are items to fix for readability and some exploration to do on our various heaps, but I think we're agreed on making this change with the blocker being on ensuring people can understand the terms we used?  Does that sound right?

SGTM
 

-Albert


You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CALcbsXDX3N7q%3DHCKBDFx3FGJ-c0MB%3D5%3DQRWOXV%2BZQ9Uf52a4Kw%40mail.gmail.com.


Bruce Dawson

unread,
Sep 15, 2017, 3:11:32 PM9/15/17
to Wez, Albert J. Wong (王重傑), Erik Chen, Chromium-dev, Nick Carter, Chris Palmer, Kentaro Hara
Luckily the cost of VirtualAlloc to commit more pages is quite low. From previous measurements I have found that VirtualAlloc to reserve and commit several MB of RAM takes about 5 μs. I don't know what the cost of committing previously reserve pages would be, but presumably no more. Faulting in each page is about 0.7 μs, so the unavoidable cost of faulting in a dozen pages is greater than the cost of committing them (as a group) on demand. Doing some quick tests to measure the costs would be worthwhile. Note that I do not think we should decommit aggressively - the cost/benefit tradeoff for that seems very poor.

> we may end up in a scenario where we're trading actual overall performance for more understandable/reliable metric and have to reopen this discussion

Agreed. But it is worth spending some cost to make commit-charge better approximate our true memory numbers both because reducing commit charge does have some modest value, but mostly because there is tremendous value in having numbers that we can trust.

Put another way, by making commit charge our top-line memory summary we are telling developers to optimize for it. So we should optimize for it (within reason).

erik...@chromium.org

unread,
Sep 15, 2017, 4:27:05 PM9/15/17
to Chromium-dev, w...@chromium.org, ajw...@chromium.org, erik...@chromium.org, ni...@chromium.org, pal...@chromium.org, har...@chromium.org
> it is worth spending some cost to make commit-charge better approximate our true memory numbers
Agreed. We'll look into this next week and also update the relevant documentation. 

I cleaned up the Consistent Memory Metrics doc. The previous version had a long-winded theoretical discussion which has been moved to the appendix. Now, private memory footprint is defined on all platforms as: private, anonymous, non-discardable memory that is resident/compressed/swapped. If compressed, we count pre-compression size.

Our current estimate on Windows for private memory footprint is to use private committed memory. This appears to be an okay, but not great estimate. My original thought was that we could update the computation of private memory footprint to explicitly subtract out "untouched", committed regions from allocators under our control. However, based on Bruce's stats, it might even make sense to update the behavior of partition alloc to be more lazy about committing memory from super-pages. The added benefit of this is that our "private memory footprint" continues to be exactly private committed, which is also reported by other system utilities. 

To follow along on improvements for private memory footprint on Windows and/or changes to partition alloc, see: https://bugs.chromium.org/p/chromium/issues/detail?id=765406

Kentaro Hara

unread,
Sep 17, 2017, 10:20:01 PM9/17/17
to Erik Chen, Chromium-dev, Wez, Albert J. Wong (王重傑), Nick Carter, Chris Palmer
On Sat, Sep 16, 2017 at 5:27 AM, <erik...@chromium.org> wrote:
> it is worth spending some cost to make commit-charge better approximate our true memory numbers
Agreed. We'll look into this next week and also update the relevant documentation. 

I cleaned up the Consistent Memory Metrics doc. The previous version had a long-winded theoretical discussion which has been moved to the appendix. Now, private memory footprint is defined on all platforms as: private, anonymous, non-discardable memory that is resident/compressed/swapped. If compressed, we count pre-compression size.

Our current estimate on Windows for private memory footprint is to use private committed memory. This appears to be an okay, but not great estimate. My original thought was that we could update the computation of private memory footprint to explicitly subtract out "untouched", committed regions from allocators under our control. However, based on Bruce's stats, it might even make sense to update the behavior of partition alloc to be more lazy about committing memory from super-pages. The added benefit of this is that our "private memory footprint" continues to be exactly private committed, which is also reported by other system utilities. 

To follow along on improvements for private memory footprint on Windows and/or changes to partition alloc, see: https://bugs.chromium.org/p/chromium/issues/detail?id=765406






On Friday, September 15, 2017 at 12:11:32 PM UTC-7, Bruce Dawson wrote:
Luckily the cost of VirtualAlloc to commit more pages is quite low. From previous measurements I have found that VirtualAlloc to reserve and commit several MB of RAM takes about 5 μs. I don't know what the cost of committing previously reserve pages would be, but presumably no more. Faulting in each page is about 0.7 μs, so the unavoidable cost of faulting in a dozen pages is greater than the cost of committing them (as a group) on demand. Doing some quick tests to measure the costs would be worthwhile. Note that I do not think we should decommit aggressively - the cost/benefit tradeoff for that seems very poor.

> we may end up in a scenario where we're trading actual overall performance for more understandable/reliable metric and have to reopen this discussion

Agreed. But it is worth spending some cost to make commit-charge better approximate our true memory numbers both because reducing commit charge does have some modest value, but mostly because there is tremendous value in having numbers that we can trust.

Put another way, by making commit charge our top-line memory summary we are telling developers to optimize for it. So we should optimize for it (within reason).

On Fri, Sep 15, 2017 at 10:49 AM, Wez <w...@chromium.org> wrote:
(and now from the correct account)

On 15 September 2017 at 10:48, Wez <w...@google.com> wrote:
On 15 September 2017 at 10:17, Albert J. Wong (王重傑) <ajw...@chromium.org> wrote:
[ +palmer and haraken for a heads up on PartitionAlloc ]

Regarding PartitionAlloc, it's on my list for next week to try to try exactly what you are suggesting: see if we can reduce the commit charge for untouched memory.  I believe this is totally possible by using MEM_RESERVE to grab a continuous address space extent and then doing MEM_COMMIT as necessary with a high-watermark per extent. The current goal is to try to make each PA bucket only eat up a few extra pages (1 is ideal, but will need to look at object size and cost of multiple syscalls to MEM_COMMIT) at the ragged edge.

As a caveat though, doing a 2-step MEM_RESERVE/MEM_COMMIT introduces an extra syscall transitions on a normal allocation when we step over the high water mark.
There's some stuff to play with here to try and amortize the cost, but we may up in a scenario where we're trading actual overall performance for more understandable/reliable metric and have to reopen this discussion.  So regarding "shouldn't we try to reduce this number?"...yes...we should and will, but I want to explicitly state the (possibly obvious) point that metric and memory reduction is not a good thing. The goal should be to understand and reduce waste, not to just universally reduce.

This is a critical point; we really need to take care not to over-focus on reducing metrics like the commit-charge, and stay focused on the user-facing impact of our memory usage patterns.

(As an illustrative example, we sometimes focus on line-coverage as a metric for test coverage, but that is typically only worthwhile to improve when it is very poor - reaching 100% line-coverage doesn't necessarily guarantee good test-coverage, so once line-coverage is "good enough", it's usually better to focus efforts elsewhere.)


+1 to experiment with what Albert suggested.

I think that reducing the commit-charge is a good reduction assuming that the cost of doing the 2-step MEM_RESERVE/MEM_COMMIT is amortized and it does not regress performance :)



--
Kentaro Hara, Tokyo, Japan

Albert J. Wong (王重傑)

unread,
Sep 18, 2017, 2:07:39 PM9/18/17
to Kentaro Hara, Erik Chen, Chromium-dev, Wez, Nick Carter, Chris Palmer
It sounds like all 5 of us (brucedawson, erikchen, haraken, wez, me) are in violent agreement then?

Clearly try to reduce it. Try the two step reserve/commit and measure perf (at least system health, but maybe also microbenchmark) and make it happen unless something tells us otherwise?

On the task manager front though, are there objections to us moving forward with changing the metric name + documentation in parallel? I don't see a reason to block one on the other. The partition alloc tests is literally next on my queue, but I also am anxious to get the task manager cause there's evidence of people using the current number incorrectly.

-Albert

WoodpeckerTablet

unread,
Sep 18, 2017, 3:02:28 PM9/18/17
to chromi...@chromium.org
How the fuck removed YouTube flash embed page?//
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CALcbsXDLt%2Bk6N6NpkDTsuqyQ9Ua93WhGHiNAHQsv2KPkasOuLA%40mail.gmail.com.


erik...@chromium.org

unread,
Sep 19, 2017, 2:12:40 PM9/19/17
to Chromium-dev, har...@chromium.org, erik...@chromium.org, w...@chromium.org, ni...@chromium.org, pal...@chromium.org
Thanks everyone for the feedback! :)

I intend to move forward with the following:
  1. Create a new column named "Memory Footprint" and show it by default.
  2. The "Memory" column still exists, but is hidden by default.
I've updated the overview with this information, as well as screenshots showing "Memory Footprint" and "Memory" side by side. Albert will look into reducing PA committed memory usage.
Reply all
Reply to author
Forward
0 new messages