Memory measurement with Async Profiler

Maxim Terletsky

unread,

Feb 5, 2018, 10:20:47 AM2/5/18

to mechanical-sympathy

Hi guys,

I have a need to check memory allocations in our Java server per Thread. For that purpose I modified slightly the Async Profiler tool. As we can see in allocTracer.cpp file, it registers for those two events - send_allocation_outside_tlab_event and send_allocation_outside_tlab_event . Now, in my Java app I did some simple testings with

allocationStart1 = AsyncProfiler.getInstance().getAndZeroTotalAllocationsPerThread()
ArrayList aa = new ArrayList();
for (int i = 0; i < 14000; i++) {
  aa.add(RandomStringUtils.randomAlphanumeric(100));
}
allocationsEnd1 =  AsyncProfiler.getInstance().getAndZeroTotalAllocationsPerThread();

getAndZeroTotalAllocationsPerThread is my adding to AsyncProfiler JNI, approximately returning the memory calculated by
static __thread u64 _total_counter_per_thread;
..
void Profiler::recordSample(void* ucontext, u64 counter, jint event_type, jmethodID event) {
    _total_counter_per_thread += counter;
}
This code replaces original Profiler::recordSample method.

Now, since we are talking about TLAB and not just an event that fires on every memory allocation, this memory is not straightforward. If I make "aa" to be "static" than it's much more predictable since it actually grows and takes more and more TLABs.

So my question is basically whether this approach is feasible in your opinion? Or, because those events will fire only when objects are allocated outside of TLAB/in new TLAB, I won't have anything useful from it?

Andrei Pangin

unread,

Feb 5, 2018, 12:11:56 PM2/5/18

to mechanical-sympathy

Hi Maxim,

It seems like you're trying to make a cumulative per-thread allocation counter. There is already such counter available out-of-the-box, see ThreadMXBean.getThreadAllocatedBytes http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/73d5bcd0585d/src/share/classes/com/sun/management/ThreadMXBean.java#l145

понедельник, 5 февраля 2018 г., 18:20:47 UTC+3 пользователь Maxim Terletsky написал:

Andrei Pangin

unread,

Feb 5, 2018, 12:25:48 PM2/5/18

to mechanical-sympathy

There is a way to do this with async-profiler, too.

send_allocation_in_new_tlab_event accepts 3 arguments: (KlassHandle klass, size_t tlab_size, size_t alloc_size). async-profiler currently does not use tlab_size, but this is exactly what is needed in your case. Total memory allocated by a thread can be estimated as total allocations outside TLAB plus the total size of allocated TLABs.

понедельник, 5 февраля 2018 г., 20:11:56 UTC+3 пользователь Andrei Pangin написал:

Maxim Terletsky

unread,

Feb 5, 2018, 3:03:13 PM2/5/18

to mechanical-sympathy

Thanks Andrei, I will try both of those methods!
Are they supposed to give me ~the same or will they work differently?

Andrei Pangin

unread,

Feb 5, 2018, 5:08:36 PM2/5/18

to mechanical-sympathy

There is a small difference in how these methods treat unused TLAB remainder. ThreadMXBean accumulates only used part of TLAB, reflecting the size of allocated objects. On the other hand, the sum of tlab_size reflects the actual heap pressure. Practically I think the difference is negligible.

понедельник, 5 февраля 2018 г., 23:03:13 UTC+3 пользователь Maxim Terletsky написал:

Maxim Terletsky

unread,

Feb 6, 2018, 6:54:48 AM2/6/18

to mechanical-sympathy

One more question - why the original AsyncProfiler does not use tlab_size in this method we talking about?

https://github.com/jvm-profiling-tools/async-profiler/blob/master/src/allocTracer.cpp#L65

Andrei Pangin

unread,

Feb 6, 2018, 12:11:23 PM2/6/18

to mechanical-sympathy

¯\_(ツ)_/¯

Just don't know how it should be used.

вторник, 6 февраля 2018 г., 14:54:48 UTC+3 пользователь Maxim Terletsky написал:

Maxim Terletsky

unread,

Feb 7, 2018, 4:17:39 AM2/7/18

to mechanical-sympathy

Well, for recording allocations in alloc mode. Why didn't you want this size?

Maxim Terletsky

unread,

Feb 8, 2018, 5:04:12 AM2/8/18

to mechanical-sympathy

And another question - where can I find implementation code for that getThreadAllocatedMemory1 ?

The reason I am asking is that I see that calling this code does take some time (our request time grown under peak load with that function). While the growth is not substantial I am curious to understand the reason why this getThreadAllocatedMemory1 method takes time.

Andrei Pangin

unread,

Feb 8, 2018, 8:52:57 AM2/8/18

to mechanical-sympathy

"Didn't want" is probably not the right word. I just never needed this information.

If you have a suggestion how async-profiler may benefit from this, please create a ticket or preferably a pull request on GitHub.

Thanks.

среда, 7 февраля 2018 г., 12:17:39 UTC+3 пользователь Maxim Terletsky написал:

Andrei Pangin

unread,

Feb 8, 2018, 9:07:58 AM2/8/18

to mechanical-sympathy

jmm_GetThreadAllocatedMemory at http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/a2c8195708cc/src/share/vm/services/management.cpp#l2209

The biggest problem with this method is the synchronization on the global Threads_lock and the iteration over the whole thread list.

For the same reason we could not call this method too often in our production applications.

As a workaround I've re-implemented 'GetThreadAllocatedMemory' for current thread using VMStructs. This is exactly the case I'm planning to cover in my upcoming JPoint talk: https://jpoint.ru/en/talks/7lh7kkir1ye6uamik4w4cs/

четверг, 8 февраля 2018 г., 13:04:12 UTC+3 пользователь Maxim Terletsky написал:

Maxim Terletsky

unread,

Feb 8, 2018, 9:50:57 AM2/8/18

to mechanical-sympathy

Cool, thanks!

Would love to see the presentation/slides/code if it will be available online..

Maxim Terletsky

unread,

Feb 11, 2018, 1:46:33 AM2/11/18

to mechanical-sympathy

@Andrei Pangin

Hi,

Sometimes we saw the following exception thrown https://pastebin.com/kJptx2nA

RDI=0x000000015a4e7110: _ZN11AllocTracer13signalHandlerEiP9__siginfoPv+0 in /Users/....../libasyncProfiler.dylib at 0x000000015a4e6000

The code, probably very familiar to you :) , is https://pastebin.com/CPmysehe - the only diff with the original is

u64 tlab_size = frame.arg1();
Profiler::_instance.recordSample(ucontext, obj_size + tlab_size, BCI_KLASS, alloc_class);

which I added as you suggested.
Do you know what could be the issue?

Andrei Pangin

unread,

Feb 12, 2018, 10:53:25 AM2/12/18

to mechanical-sympathy

Hi Maxim,

Does it also fail with original unmodified libasyncProfiler? I guess you rewrite libasyncProfiler.dylib while the library is already loaded into the target process - can it be the case?

The crash log seems incomplete, I can't tell for sure what's wrong.

воскресенье, 11 февраля 2018 г., 9:46:33 UTC+3 пользователь Maxim Terletsky написал:

Maxim Terletsky

unread,

Feb 12, 2018, 12:20:54 PM2/12/18

to mechanica...@googlegroups.com

I doubt that is the case (rewrite libasyncProfiler.dylib while the library is already loaded into the target process) ... It happens on developer machines (Mac), not production (Linux) - at least we don't see it in Production :)

Are you sure log https://pastebin.com/kJptx2nA is not complete?
I will try to enable it again for developers and ask them to send me more logs..

Andrei Pangin

unread,

Feb 12, 2018, 3:47:19 PM2/12/18

to mechanical-sympathy

OK, if you have more or less reliable way to reproduce the problem, please file a bug at https://github.com/jvm-profiling-tools/async-profiler/issues

BTW, obj_size + tlab_size does not make sense, since tlab_size already includes all objects allocated in TLAB.

Regards,

Andrei

понедельник, 12 февраля 2018 г., 20:20:54 UTC+3 пользователь Maxim Terletsky написал:

Reply all

Reply to author

Forward