Perfetto heap profiling vs Debug.getNativeHeapAllocatedSize()

Diego Espiñeira

unread,

Jun 23, 2021, 7:27:48 AM6/23/21

to Perfetto Development - www.perfetto.dev

Hi, I'm doing heap profiling on a stock (as in non rooted) Samsung Tab S6Lite.

Inside the app being profiled I'm printing to logcat the result of Debug.getNativeHeapAllocatedSize() and the result is almost always larger than the peak memory reported by Perfetto. Sometimes the difference exceeds 40MiB. That's very significant. Have you observed this before? Is there an explanation to this?

The configuration used is

buffers: {

size_kb: 63488

}

data_sources: {

config: {

name: "android.heapprofd"

heapprofd_config: {

shmem_size_bytes: 8388608

sampling_interval_bytes: 4096

block_client: true

process_cmdline: "mypackage"

continuous_dump_config: {

dump_phase_ms: 0

dump_interval_ms: 100

}

data_sources: {

config: {

name: "android.packages_list"

}

duration_ms: 0

enable_extra_guardrails: false

statsd_metadata: {

}

write_into_file: true

flush_period_ms: 604800000

flush_timeout_ms: 30000

The SQL statement used to get the peak memory consumption is

select a.ts - min(b.ts) time, round(sum(size) / 1024.0 / 1024.0, 2) space_size, "MiB" units

from (select distinct ts from heap_profile_allocation) a

join (select ts, size from heap_profile_allocation) b ON (b.ts <= a.ts)

group by a.ts order by 2 desc limit 1

Any insight on why this happens would be greatly appreciated.

Cheers,

Diego

Primiano Tucci

unread,

Jun 23, 2021, 7:35:13 AM6/23/21

to Diego Espiñeira, Perfetto Development - www.perfetto.dev

Interestingly, getNativeHeapAllocatedSize seems to call into android_os_Debug_getNativeHeapAllocatedSize which then calls into mallinfo().uordblks;

This is realistically allocator-related fragmentation. Would be nice if you could grab snapshots of /proc/$pid/smaps (or showmaps -a `pidof com.yourpackage`) to cross check.

Which version of android is this out of curiosity?

Also if you lower down the sampling_interval_bytes to 1 (will be slooooow) does the number change? Just to rule out pathological patterns that diverge too much because of the sampling (40 MB seems a lot for that TBH, I've never seen that much diff)

--
You received this message because you are subscribed to the Google Groups "Perfetto Development - www.perfetto.dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to perfetto-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/perfetto-dev/8fa1a39a-31c2-45c2-950b-8f0e223e29bcn%40googlegroups.com.

--

Primiano Tucci
Software Engineer
Google UK Limited

Message has been deleted

Primiano Tucci

unread,

Jun 23, 2021, 8:07:19 AM6/23/21

to Diego Espiñeira, Christopher Ferris, Perfetto Development - www.perfetto.dev

+Christopher Ferris (note: this is a public ML)

Is there a way to figure out the allocator overhead (fragmentation and whatnot) by looking at mallinfo().

In other words, is there a sensible subtraction of mallinfo structs field to have a sense of "bytes requested by the application" -vs "total bytes due to rounding, alignment, arenas and whatever else the allocator might be doing?"

My question here is: is there a way to tell if the application here is hitting some pathological case?

Diego: out of curiosity, if you put periodic calls to mallopt(M_PURGE) does the difference change/disappear?

On Wed, Jun 23, 2021 at 12:52 PM Diego Espiñeira <despi...@gmail.com> wrote:

It's Android 11.

I tried with sampling_interval_bytes: 1, but there was no change in the end result.

This is what was in maps

** MEMINFO in pid 29409 [com.tomtom.navkit2.tests.kpi] **
                   Pss  Private  Private  SwapPss      Rss     Heap     Heap     Heap
                 Total    Dirty    Clean    Dirty    Total     Size    Alloc     Free
                ------   ------   ------   ------   ------   ------   ------   ------
  Native Heap   203989   203956        0       39   204916   222524   217845     4678
  Dalvik Heap     3819     3744        0       24     4544     5269     2635     2634
 Dalvik Other     2264     1968        0        0     2944
        Stack     1428     1428        0        0     1436
    Other dev       16        4       12        0      308
     .so mmap    54609     2452    50884       42    83640
    .jar mmap     2178        0      860        0    31664
    .apk mmap     1073        0      416        0    37252
    .dex mmap    24370    24288       16        0    24588
    .oat mmap       34        0        0        0     2272
    .art mmap    18130    17264       60      148    30396
   Other mmap       38        8        0        0     2416
      Unknown      873      804        0        3     1160
        TOTAL   313077   255916    52248      256   427536   227793   220480     7312 App Summary
                       Pss(KB)                        Rss(KB)
                        ------                         ------
           Java Heap:    21068                          34940
         Native Heap:   203956                         204916
                Code:    78928                         180076
               Stack:     1428                           1436
            Graphics:        0                              0
       Private Other:     2784
              System:     4913
             Unknown:                                    6168           TOTAL PSS:   313077            TOTAL RSS:   427536       TOTAL SWAP PSS:      256 Objects
               Views:        0         ViewRootImpl:        0
         AppContexts:        5           Activities:        0
              Assets:       19        AssetManagers:        0
       Local Binders:       15        Proxy Binders:       43
       Parcel memory:        5         Parcel count:       23
    Death Recipients:        1      OpenSSL Sockets:        0
            WebViews:        0 SQL
         MEMORY_USED:        0
  PAGECACHE_OVERFLOW:        0          MALLOC_SIZE:        0

___

And the peak size was ~171MiB, in this case the difference was ~30MiB.

Could this be attributed to memory fragmentation?

Diego Espiñeira

unread,

Jun 23, 2021, 8:10:13 AM6/23/21

to Primiano Tucci, Christopher Ferris, Perfetto Development - www.perfetto.dev

> out of curiosity, if you put periodic calls to mallopt(M_PURGE) does the difference change/disappear?

I'll try that.

Ryan Savitski

unread,

Jun 23, 2021, 9:18:06 AM6/23/21

to Diego Espiñeira, Primiano Tucci, Christopher Ferris, Perfetto Development - www.perfetto.dev

Hello Diego. I don't have time for a detailed answer, but have a couple of points that might be relevant.

Firstly, there's an option - dump_at_max - for heapprofd to keep track of the biggest observed heap state (still based on sampled allocations), and produce a single dump at the end for that peak state. It might help if you're tracking just the peak state, and the allocation pattern interacts poorly with periodic dumps. Given that profile, I'd just open it in the perfetto UI and look at the overall size of the active heap in the profile flamegraph (that might also help sanity check your query's results).

As far as I know, Android 11 uses the scudo allocator for native allocs, and we've indeed seen O(10MB) lingering state in the allocator itself after periods of bursty small allocations. The purge mallopt should get rid of that state, so it's a way of ruling that out.

It's also worth pointing out that mallinfo struct's values are really dependent on the allocator in question, and might not be returning the values you're expecting. It's very possible that the java getNativeHeapAllocatedSize has different semantics under scudo. I don't have a good answer here, and would myself go to the implementation to start figuring out how to interpret mallinfo from scudo (you will likely need to go back to android 11 sources, or at least check the file history).

You could also check malloc_info, if that's present, but you'll have to deal with xml :(

If scudo ends up looking like the suspect, then I'll reiterate Primiano's suggestion of using "showmap -a `pidof com.yourpackage`" to get a long dump of parsed smaps, and look for "private dirty" column for the various "[anon:scudo:primary]" mappings. There will be multiple mappings, as it's split on allocation size classes iirc. To clarify, it's those pages that we expect to be cleared with a mallopt purge.

Aside: if you're finding the textproto config to be a pain to edit, tools/heap_profile might come in handy. It's a helper script to drive the heap profiler on device.

-Ryan

To view this discussion on the web visit https://groups.google.com/d/msgid/perfetto-dev/CAKXYDS1rXdXDeJfbaa6H%2B-aYFVqs3tYYHv4jEhWU8bE%2Bv4%2B5hQ%40mail.gmail.com.

Diego Espiñeira

unread,

Jun 23, 2021, 9:25:13 AM6/23/21

to Perfetto Development - www.perfetto.dev

I change the test to do a

print getNativeHeapAllocatedSize() results

mallopt(M_PURGE, 0)

print getNativeHeapAllocatedSize() results again

and the difference turned out to be less than a MiB. So I think the problem is on the platform's way of reporting these values, cause if I try this on an x86_64 emulator, the discrepancy is just a couple of KiB.

Diego Espiñeira

unread,

Jun 24, 2021, 3:36:00 AM6/24/21

to Christopher Ferris, Kostya Kortchinsky, Perfetto Development - www.perfetto.dev, Primiano Tucci, Ryan Savitski

Is it possible that this particular implementation of Android 11 (Samsung’s) still uses jemalloc?

On Wed, 23 Jun 2021 at 22:24 Christopher Ferris <cfe...@google.com> wrote:

To answer the original question from Primiano:

Is there a way to figure out the allocator overhead (fragmentation and whatnot) by looking at mallinfo().

No, there is no method I know of that does this. Or at least there is no exposed method for this. I added Kostya in case there is a method that might not have been exposed.

Also, I wanted to respond to a couple of the points in Ryan's e-mail, which are inline:

On Wed, Jun 23, 2021 at 6:18 AM Ryan Savitski <rsav...@google.com> wrote:
Hello Diego. I don't have time for a detailed answer, but have a couple of points that might be relevant.

Firstly, there's an option - dump_at_max - for heapprofd to keep track of the biggest observed heap state (still based on sampled allocations), and produce a single dump at the end for that peak state. It might help if you're tracking just the peak state, and the allocation pattern interacts poorly with periodic dumps. Given that profile, I'd just open it in the perfetto UI and look at the overall size of the active heap in the profile flamegraph (that might also help sanity check your query's results).

As far as I know, Android 11 uses the scudo allocator for native allocs, and we've indeed seen O(10MB) lingering state in the allocator itself after periods of bursty small allocations. The purge mallopt should get rid of that state, so it's a way of ruling that out.

It's also worth pointing out that mallinfo struct's values are really dependent on the allocator in question, and might not be returning the values you're expecting. It's very possible that the java getNativeHeapAllocatedSize has different semantics under scudo. I don't have a good answer here, and would myself go to the implementation to start figuring out how to interpret mallinfo from scudo (you will likely need to go back to android 11 sources, or at least check the file history).

As much as possible, I tried to keep all of the data returned by mallinfo to be comparable between scudo and the previous allocator (jemalloc). The java getNativeHeapAllocatedSize should be comparable between scudo and jemalloc, the data used is, technically, the same. There are some obscure fields in mallinfo that are not exactly the same, but total bytes allocated should be comparable. The issue might be that scudo tends to produce a much more accurate count of allocated bytes than jemalloc. There are all sorts of ways that jemalloc will return a larger number than was actually allocated, while scudo will return exactly what you allocated. So the gap between allocated and RSS/PSS would look larger on jemalloc than scudo, but scudo is actually much more accurate than jemalloc.

You could also check malloc_info, if that's present, but you'll have to deal with xml :(

malloc_info currently only dumps the primary allocations, so it's probably not going to be the best choice.

If scudo ends up looking like the suspect, then I'll reiterate Primiano's suggestion of using "showmap -a `pidof com.yourpackage`" to get a long dump of parsed smaps, and look for "private dirty" column for the various "[anon:scudo:primary]" mappings. There will be multiple mappings, as it's split on allocation size classes iirc. To clarify, it's those pages that we expect to be cleared with a mallopt purge.

Aside: if you're finding the textproto config to be a pain to edit, tools/heap_profile might come in handy. It's a helper script to drive the heap profiler on device.

-Ryan

Christopher

Diego Espiñeira

unread,

Jun 24, 2021, 2:33:19 PM6/24/21

to Christopher Ferris, Kostya Kortchinsky, Perfetto Development - www.perfetto.dev, Primiano Tucci, Ryan Savitski

Great, thanks!

We already confirmed that all devices we have using jemalloc present this discrepancy and all devices we have using scudo don't.

We saw this in Samsung devices (Android 11 and older) and on an old MotoG device. All running jemalloc.

We tried this in Pixel devices and custom devices all running Android 11 using scudo and the discrepancy is not observed.

So we can conclude jemalloc is to blame. Either it has a bug on reporting (mallinfo) or it has a big overhead that's also reported by mallinfo.

Thanks for your help

Best

Diego

On Thu, Jun 24, 2021 at 8:26 PM Christopher Ferris <cfe...@google.com> wrote:

I guess it's possible. If you call malloc_info, the xml will contain the name of the allocator being used. You can see how we verify the allocator in a bionic test:

https://android.googlesource.com/platform/bionic/+/master/tests/malloc_test.cpp#394

It's also an example of how to call the malloc_info function.

Christopher

Reply all

Reply to author

Forward