Perfetto heap profiling vs Debug.getNativeHeapAllocatedSize()

260 views
Skip to first unread message

Diego Espiñeira

unread,
Jun 23, 2021, 7:27:48 AM6/23/21
to Perfetto Development - www.perfetto.dev
Hi, I'm doing heap profiling on a stock (as in non rooted) Samsung Tab S6Lite.
Inside the app being profiled I'm printing to logcat the result of Debug.getNativeHeapAllocatedSize() and the result is almost always larger than the peak memory reported by Perfetto. Sometimes the difference exceeds 40MiB. That's very significant. Have you observed this before? Is there an explanation to this?

The configuration used is
buffers: {
  size_kb: 63488
}
data_sources: {
  config: {
    name: "android.heapprofd"
    heapprofd_config: {
      shmem_size_bytes: 8388608
      sampling_interval_bytes: 4096
      block_client: true
      process_cmdline: "mypackage"
      continuous_dump_config: {
        dump_phase_ms: 0
        dump_interval_ms: 100
      }
    }
  }
}
data_sources: {
  config: {
    name: "android.packages_list"
  }
}
duration_ms: 0
enable_extra_guardrails: false
statsd_metadata: {
}
write_into_file: true
flush_period_ms: 604800000
flush_timeout_ms: 30000

The SQL statement used to get the peak memory consumption is
select a.ts - min(b.ts) time, round(sum(size) / 1024.0 / 1024.0, 2) space_size, "MiB" units
  from (select distinct ts from heap_profile_allocation) a
  join (select ts, size from heap_profile_allocation) b ON (b.ts <= a.ts)
group by a.ts order by 2 desc limit 1

Any insight on why this happens would be greatly appreciated.

Cheers,
Diego

Primiano Tucci

unread,
Jun 23, 2021, 7:35:13 AM6/23/21
to Diego Espiñeira, Perfetto Development - www.perfetto.dev
Interestingly, getNativeHeapAllocatedSize seems to call into android_os_Debug_getNativeHeapAllocatedSize which then calls into mallinfo().uordblks;
This is realistically allocator-related fragmentation. Would be nice if you could grab snapshots of /proc/$pid/smaps (or showmaps -a `pidof com.yourpackage`) to cross check.
Which version of android is this out of curiosity?

Also if you lower down the sampling_interval_bytes to 1 (will be slooooow) does the number change? Just to rule out pathological patterns that diverge too much because of the sampling (40 MB seems a lot for that TBH, I've never seen that much diff)

--
You received this message because you are subscribed to the Google Groups "Perfetto Development - www.perfetto.dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to perfetto-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/perfetto-dev/8fa1a39a-31c2-45c2-950b-8f0e223e29bcn%40googlegroups.com.


--
Primiano Tucci
Software Engineer
Google UK Limited

Message has been deleted

Primiano Tucci

unread,
Jun 23, 2021, 8:07:19 AM6/23/21
to Diego Espiñeira, Christopher Ferris, Perfetto Development - www.perfetto.dev
+Christopher Ferris (note: this is a public ML)

Is there a way to figure out the allocator overhead (fragmentation and whatnot) by looking at mallinfo().
In other words, is there a sensible subtraction of mallinfo structs field to have a sense of "bytes requested by the application" -vs "total bytes due to rounding, alignment, arenas and whatever else the allocator might be doing?" 
My question here is: is there a way to tell if the application here is hitting some pathological case?

Diego: out of curiosity, if you put periodic calls to mallopt(M_PURGE) does the difference change/disappear?

On Wed, Jun 23, 2021 at 12:52 PM Diego Espiñeira <despi...@gmail.com> wrote:
It's Android 11.
I tried with sampling_interval_bytes: 1, but there was no change in the end result.
This is what was in maps
** MEMINFO in pid 29409 [com.tomtom.navkit2.tests.kpi] **
Pss Private Private SwapPss Rss Heap Heap Heap
Total Dirty Clean Dirty Total Size Alloc Free
------ ------ ------ ------ ------ ------ ------ ------
Native Heap 203989 203956 0 39 204916 222524 217845 4678
Dalvik Heap 3819 3744 0 24 4544 5269 2635 2634
Dalvik Other 2264 1968 0 0 2944
Stack 1428 1428 0 0 1436
Other dev 16 4 12 0 308
.so mmap 54609 2452 50884 42 83640
.jar mmap 2178 0 860 0 31664
.apk mmap 1073 0 416 0 37252
.dex mmap 24370 24288 16 0 24588
.oat mmap 34 0 0 0 2272
.art mmap 18130 17264 60 148 30396
Other mmap 38 8 0 0 2416
Unknown 873 804 0 3 1160
TOTAL 313077 255916 52248 256 427536 227793 220480 7312 App Summary
Pss(KB) Rss(KB)
------ ------
Java Heap: 21068 34940
Native Heap: 203956 204916
Code: 78928 180076
Stack: 1428 1436
Graphics: 0 0
Private Other: 2784
System: 4913
Unknown: 6168 TOTAL PSS: 313077 TOTAL RSS: 427536 TOTAL SWAP PSS: 256 Objects
Views: 0 ViewRootImpl: 0
AppContexts: 5 Activities: 0
Assets: 19 AssetManagers: 0
Local Binders: 15 Proxy Binders: 43
Parcel memory: 5 Parcel count: 23
Death Recipients: 1 OpenSSL Sockets: 0
WebViews: 0 SQL
MEMORY_USED: 0
PAGECACHE_OVERFLOW: 0 MALLOC_SIZE: 0
___

And the peak size was ~171MiB, in this case the difference was ~30MiB.
Could this be attributed to memory fragmentation?

Diego Espiñeira

unread,
Jun 23, 2021, 8:10:13 AM6/23/21
to Primiano Tucci, Christopher Ferris, Perfetto Development - www.perfetto.dev
> out of curiosity, if you put periodic calls to mallopt(M_PURGE) does the difference change/disappear?
I'll try that.

Ryan Savitski

unread,
Jun 23, 2021, 9:18:06 AM6/23/21
to Diego Espiñeira, Primiano Tucci, Christopher Ferris, Perfetto Development - www.perfetto.dev
Hello Diego. I don't have time for a detailed answer, but have a couple of points that might be relevant.

Firstly, there's an option - dump_at_max - for heapprofd to keep track of the biggest observed heap state (still based on sampled allocations), and produce a single dump at the end for that peak state. It might help if you're tracking just the peak state, and the allocation pattern interacts poorly with periodic dumps. Given that profile, I'd just open it in the perfetto UI and look at the overall size of the active heap in the profile flamegraph (that might also help sanity check your query's results).

As far as I know, Android 11 uses the scudo allocator for native allocs, and we've indeed seen O(10MB) lingering state in the allocator itself after periods of bursty small allocations. The purge mallopt should get rid of that state, so it's a way of ruling that out.

It's also worth pointing out that mallinfo struct's values are really dependent on the allocator in question, and might not be returning the values you're expecting. It's very possible that the java getNativeHeapAllocatedSize has different semantics under scudo. I don't have a good answer here, and would myself go to the implementation to start figuring out how to interpret mallinfo from scudo (you will likely need to go back to android 11 sources, or at least check the file history).

You could also check malloc_info, if that's present, but you'll have to deal with xml :(

If scudo ends up looking like the suspect, then I'll reiterate Primiano's suggestion of using "showmap -a `pidof com.yourpackage`" to get a long dump of parsed smaps, and look for "private dirty" column for the various "[anon:scudo:primary]" mappings. There will be multiple mappings, as it's split on allocation size classes iirc. To clarify, it's those pages that we expect to be cleared with a mallopt purge.

Aside: if you're finding the textproto config to be a pain to edit, tools/heap_profile might come in handy. It's a helper script to drive the heap profiler on device.

-Ryan

Diego Espiñeira

unread,
Jun 23, 2021, 9:25:13 AM6/23/21
to Perfetto Development - www.perfetto.dev
I change the test to do a
print getNativeHeapAllocatedSize() results
mallopt(M_PURGE, 0)
print getNativeHeapAllocatedSize() results again
and the difference turned out to be less than a MiB. So I think the problem is on the platform's way of reporting these values, cause if I try this on an x86_64 emulator, the discrepancy is just a couple of KiB.

Diego Espiñeira

unread,
Jun 24, 2021, 3:36:00 AM6/24/21
to Christopher Ferris, Kostya Kortchinsky, Perfetto Development - www.perfetto.dev, Primiano Tucci, Ryan Savitski
Is it possible that this particular implementation of Android 11 (Samsung’s) still uses jemalloc?

On Wed, 23 Jun 2021 at 22:24 Christopher Ferris <cfe...@google.com> wrote:
To answer the original question from Primiano:

Is there a way to figure out the allocator overhead (fragmentation and whatnot) by looking at mallinfo().

No, there is no method I know of that does this. Or at least there is no exposed method for this. I added Kostya in case there is a method that might not have been exposed.

Also, I wanted to respond to a couple of the points in Ryan's e-mail, which are inline:

On Wed, Jun 23, 2021 at 6:18 AM Ryan Savitski <rsav...@google.com> wrote:
Hello Diego. I don't have time for a detailed answer, but have a couple of points that might be relevant.

Firstly, there's an option - dump_at_max - for heapprofd to keep track of the biggest observed heap state (still based on sampled allocations), and produce a single dump at the end for that peak state. It might help if you're tracking just the peak state, and the allocation pattern interacts poorly with periodic dumps. Given that profile, I'd just open it in the perfetto UI and look at the overall size of the active heap in the profile flamegraph (that might also help sanity check your query's results).

As far as I know, Android 11 uses the scudo allocator for native allocs, and we've indeed seen O(10MB) lingering state in the allocator itself after periods of bursty small allocations. The purge mallopt should get rid of that state, so it's a way of ruling that out.

It's also worth pointing out that mallinfo struct's values are really dependent on the allocator in question, and might not be returning the values you're expecting. It's very possible that the java getNativeHeapAllocatedSize has different semantics under scudo. I don't have a good answer here, and would myself go to the implementation to start figuring out how to interpret mallinfo from scudo (you will likely need to go back to android 11 sources, or at least check the file history).

As much as possible, I tried to keep all of the data returned by mallinfo to be comparable between scudo and the previous allocator (jemalloc). The java getNativeHeapAllocatedSize should be comparable between scudo and jemalloc, the data used is, technically, the same. There are some obscure fields in mallinfo that are not exactly the same, but total bytes allocated should be comparable. The issue might be that scudo tends to produce a much more accurate count of allocated bytes than jemalloc. There are all sorts of ways that jemalloc will return a larger number than was actually allocated, while scudo will return exactly what you allocated. So the gap between allocated and RSS/PSS would look larger on jemalloc than scudo, but scudo is actually much more accurate than jemalloc.

You could also check malloc_info, if that's present, but you'll have to deal with xml :(

malloc_info currently only dumps the primary allocations, so it's probably not going to be the best choice.

If scudo ends up looking like the suspect, then I'll reiterate Primiano's suggestion of using "showmap -a `pidof com.yourpackage`" to get a long dump of parsed smaps, and look for "private dirty" column for the various "[anon:scudo:primary]" mappings. There will be multiple mappings, as it's split on allocation size classes iirc. To clarify, it's those pages that we expect to be cleared with a mallopt purge.

Aside: if you're finding the textproto config to be a pain to edit, tools/heap_profile might come in handy. It's a helper script to drive the heap profiler on device.

-Ryan

Christopher 

Diego Espiñeira

unread,
Jun 24, 2021, 2:33:19 PM6/24/21
to Christopher Ferris, Kostya Kortchinsky, Perfetto Development - www.perfetto.dev, Primiano Tucci, Ryan Savitski
Great, thanks!
We already confirmed that all devices we have using jemalloc present this discrepancy and all devices we have using scudo don't.
We saw this in Samsung devices (Android 11 and older) and on an old MotoG device. All running jemalloc.
We tried this in Pixel devices and custom devices all running Android 11 using scudo and the discrepancy is not observed.
So we can conclude jemalloc is to blame. Either it has a bug on reporting (mallinfo) or it has a big overhead that's also reported by mallinfo.

Thanks for your help

Best
Diego

On Thu, Jun 24, 2021 at 8:26 PM Christopher Ferris <cfe...@google.com> wrote:
I guess it's possible. If you call malloc_info, the xml will contain the name of the allocator being used. You can see how we verify the allocator in a bionic test:


It's also an example of how to call the malloc_info function.

Christopher
Reply all
Reply to author
Forward
0 new messages