JVM detection of thread at safepoint

Mark Price

unread,

Dec 5, 2017, 3:54:40 AM12/5/17

to mechanical-sympathy

Hi Folks,
I'm investigating some long time-to-safepoint pauses in oracle/openjdk. The application in question is also suffering from some fairly nasty I/O problems where latency-sensitive threads are being descheduled in uninterruptible sleep state due to needing a file-system lock.

My question: can the JVM detect that a thread is in signal/interrupt-handler code and thus treat it as though it is at a safepoint (as I believe happens when a thread is in native code via a JNI call)?

For instance, given the stack trace below, will the JVM need to wait for the thread to be scheduled back on to CPU in order to come to a safepoint, or will it be treated as "in-native"?

        7fff81714cd9 __schedule ([kernel.kallsyms])
        7fff817151e5 schedule ([kernel.kallsyms])
        7fff81717a4b rwsem_down_write_failed ([kernel.kallsyms])
        7fff813556e7 call_rwsem_down_write_failed ([kernel.kallsyms])
        7fff817172ad down_write ([kernel.kallsyms])
        7fffa0403dcf xfs_ilock ([kernel.kallsyms])
        7fffa04018fe xfs_vn_update_time ([kernel.kallsyms])
        7fff8122cc5d file_update_time ([kernel.kallsyms])
        7fffa03f7183 xfs_filemap_page_mkwrite ([kernel.kallsyms])
        7fff811ba935 do_page_mkwrite ([kernel.kallsyms])
        7fff811bda74 handle_pte_fault ([kernel.kallsyms])
        7fff811c041b handle_mm_fault ([kernel.kallsyms])
        7fff8106adbe __do_page_fault ([kernel.kallsyms])
        7fff8106b0c0 do_page_fault ([kernel.kallsyms])
        7fff8171af48 page_fault ([kernel.kallsyms])
        ---- java stack trace ends here ----

Cheers,

Mark

Aleksey Shipilev

unread,

Dec 5, 2017, 4:24:33 AM12/5/17

to mechanica...@googlegroups.com, Mark Price

I am pretty sure out-of-band page fault in Java thread does not yield a safepoint. At least because
safepoint polls happen at given location in the generated code, because we need the pointer map as
the part of the machine state, and that is generated by Hotspot (only) around the safepoint polls.
Page faulting on random read/write insns does not have that luxury. Even if JVM had intercepted that
fault, there is not enough metadata to work on.

The stacktrace above seems to say you have page faulted and this incurred disk I/O? This is
swapping, I think, and all performance bets are off at that point.

Thanks,
-Aleksey

signature.asc

Mark Price

unread,

Dec 5, 2017, 4:30:57 AM12/5/17

to mechanical-sympathy

Hi Aleksey,
thanks for the response. The I/O is definitely one problem, but I was trying to figure out whether it was contributing to the long TTSP times, or whether I might have some code that was misbehaving (e.g. NonCountedLoops).

Your response aligns with my guesswork, so hopefully I just have the one problem to solve ;)

Cheers,

Mark

Gil Tene

unread,

Dec 5, 2017, 5:53:17 AM12/5/17

to mechanica...@googlegroups.com

Page faults in mapped file i/o and counted loops are certainly two common causes of long TTSP. But there are many other paths that *could* cause it as well in HotSpot. Without catching it and looking at the stack trace, it's hard to know which ones to blame. Once you knock out one cause, you'll see if there is another.

In the specific stack trace you showed [assuming that trace was taken during a long TTSP], mapped file i/o is the most likely culprit. Your trace seems to be around making the page write-able for the first time and updating the file time (which takes a lock), but even without needing the lock, the fault itself could end up waiting for the i/o to complete (read page from disk), and that (when Murphy pays you a visit) can end up waiting behind 100s other i/o operations (e.g. when your i/o happens at the same time the kernel decided to flush some dirty pages in the cache), leading to TTSPs in the 100s of msec.

As I'm sure you already know, one simple way to get around mapped file related TTSP is to not used mapped files. Explicit random i/o calls are always done while at a safepoint, so they can't cause high TTSPs.

Kirk Pepperdine

unread,

Dec 5, 2017, 6:11:44 AM12/5/17

to mechanica...@googlegroups.com

Hi,

On Dec 5, 2017, at 11:53 AM, Gil Tene <g...@azul.com> wrote:

Page faults in mapped file i/o and counted loops are certainly two common causes of long TTSP. But there are many other paths that *could* cause it as well in HotSpot. Without catching it and looking at the stack trace, it's hard to know which ones to blame. Once you knock out one cause, you'll see if there is another.

In the specific stack trace you showed [assuming that trace was taken during a long TTSP], mapped file i/o is the most likely culprit. Your trace seems to be around making the page write-able for the first time and updating the file time (which takes a lock), but even without needing the lock, the fault itself could end up waiting for the i/o to complete (read page from disk), and that (when Murh=phy pays you a visit) can end up waiting behind 100s other i/o operations (e.g. when your i/o happens at the same time the kernel decided to flush some dirty pages in the cache), leading to TTSPs in the 100s of msec.

As I'm sure you already know, one simple way to get around mapped file related TTSP is to not used mapped files. Explicit random i/o calls are always done while at a safepoint, so they can't cause high TTSPs.

I guess another way to avoid the long TTSP is to not safe point.. ;-)

Most of the long TTSP’s IME are either due to outside scheduling interference or loops that are compiled to the point where they turn into counted loops. Finding this using traditional tooling is impossible because you need the safe point which has most likely been compiled out of the code to get a sample to see why you’re waiting for a safe point. Using an async profiler (like honest) leaves you wanting for some correlation between calls for safe point and the samples you’re looking at. So, I don’t really have a good answer as to how to find these things. What I have done in the past is used the GC log to get an idea of the toxicity level of the environment you’re running in and then use that information to decide if I need to look for internal or external causes. If the causes are internal than it’s most likely a counted loop. Using JITWatch in this cause can you get a candidate list of counted loops to consider. After that I’ve binary searched through that list. If the environment is toxic.. then you need to clean up the environment. If you are brave, you add some instrumentation to the JVM to sample for the last thread to reach the safe point and use that in your prod environment.

Kind regards,

Kirk

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Price

unread,

Dec 5, 2017, 6:22:43 AM12/5/17

to mechanica...@googlegroups.com

Hi Kirk,

thanks for the response.

I've run the same workload with -XX:+KeepSafepointsInCountedLoops, and the problem persists, so I'm fairly certain it's not that (assuming the JVM is respecting the flags...). The application is GC-free, so the safepoints are only going to be coming from (non-exhaustive): guaranteed safepoints, JIT operation?, lock biasing, others?

The environment is indeed toxic: multiple JVMs mmap files on the order of 1/4 system memory, performing either read or write operations and the page-cache flusher threads are struggling to cope (not surprisingly). I don't really want to go down the route of using madvise/mlock.

Happily, this is a test environment, so I can be brave with profiling - what sort of instrumentation do you refer to - kernel jprobes or JVM hacking?

Cheers,

Mark

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Gil Tene

unread,

Dec 5, 2017, 6:28:47 AM12/5/17

to mechanical-sympathy

One of the challenges in using GC logs to find out if you have TTSP issues is that GC events are extremely infrequent, so not having seen a high TTSP in the log doesn't mean you won't hit one soon... It's literally like deducing the likelihood of hitting something bad from 1000 data points collected with a 1-in-a-million random sampler. Coverage is very low.

In Zing, we have a built-in TTSP profiler for exactly this reason. Coverage. We initially used it internally, to detect and clean up all the JVM code paths that were problematic, and now we use it with customers, mostly to prove and point fingers at the system-related causes of TTSP when they occur. The way the TTSP profiler works is basically by using a tick based profiler that records the tick time arms a thread-local "go to a no-op safepoint" whose only purpose is to report the amount of time elapsed form the original tick to that thread getting to a safepoint. The thread leaves the safepoint immediate, so no pauses happen, and we keep track of the call stacks for e.g. the top 50 TTSP times seen since the last profile reset. With this technique, we can reliably completely cover and report on any experienced code path with a TTSP larger than e.g. 5 msec, and we get fairly good coverage even for the lower levels (just not 100% coverage, unless you are willing to take a e.g. TTSP-profiling tick 2000 times per second on each running thread).

I expect that with HotSpot finally looking at adding thread-specific safe-pointing and checkpoints, similar tooling will be needed there.

Hi,

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Mark Price

unread,

Dec 5, 2017, 6:31:52 AM12/5/17

to mechanical-sympathy

Hi Gil,
thanks for the response.

I'm fairly sure that interaction with the page-cache is one of the problems. When this is occurring, the free-mem is already hovering around vm.min_free_kbytes, and the mapped files are a significant fraction of system memory. From what I can see, each process that maps a file will get its own copy in the page-cache (kernel shared pages doesn't seem to apply to the page-cache, and is otherwise disabled on the test machine), so we probably have approaching total system memory in use by cached pages.

I had thought that pages that were last written-to before the vm.dirty_expire_centisecs threshold would be written out to disk by the flusher threads, but I read on lkml that the access times are maintained on a per-inode basis, rather than per-page. If this is the case, then the system in question is making it very difficult for the page-cache to work efficiently.

The system makes use of a "pre-toucher" thread to try to handle page-faults ahead of the thread that is trying to write application data to the mapped pages. However, it seems that it is not always successful, so I need to spend a bit of time trying to figure out why that is not working. It's possible that there is just too much memory pressure, and the OS is swapping out pages that have be loaded by the pre-toucher before the application gets to them.

Cheers,

Mark

Mark Price

unread,

Dec 5, 2017, 6:33:54 AM12/5/17

to mechanical-sympathy

In Zing, we have a built-in TTSP profiler for exactly this reason.

I remember it fondly :)

Gil Tene

unread,

Dec 5, 2017, 6:50:44 AM12/5/17

to mechanical-sympathy

You can use it to track down your problem. You can download a Zing trial and play with it. It won't tell you anything about the various HotSpot specific TTSP paths (since most of those don't exist in Zing), but since you suspect mapped i/o based TTSPs, Zing should run into those just as much, and you'll get good TTSP coverage and clear-blame stack traces to play with.

Gil Tene

unread,

Dec 5, 2017, 6:55:38 AM12/5/17

to mechanical-sympathy

On Tuesday, December 5, 2017 at 12:31:52 PM UTC+1, Mark Price wrote:

Hi Gil,
thanks for the response.

I'm fairly sure that interaction with the page-cache is one of the problems. When this is occurring, the free-mem is already hovering around vm.min_free_kbytes, and the mapped files are a significant fraction of system memory. From what I can see, each process that maps a file will get its own copy in the page-cache (kernel shared pages doesn't seem to apply to the page-cache, and is otherwise disabled on the test machine), so we probably have approaching total system memory in use by cached pages.

That (each process having it's own copy) is surprising to me. Unless the mapping is such that private copies are required, I'd expect the processes to share the page cache entries.

I had thought that pages that were last written-to before the vm.dirty_expire_centisecs threshold would be written out to disk by the flusher threads, but I read on lkml that the access times are maintained on a per-inode basis, rather than per-page. If this is the case, then the system in question is making it very difficult for the page-cache to work efficiently.

The system makes use of a "pre-toucher" thread to try to handle page-faults ahead of the thread that is trying to write application data to the mapped pages. However, it seems that it is not always successful, so I need to spend a bit of time trying to figure out why that is not working. It's possible that there is just too much memory pressure, and the OS is swapping out pages that have be loaded by the pre-toucher before the application gets to them.

Is your pre-toucher thread a Java thread doing it's pre-touching using mapped i/o in the same process? If so, then the pre-toucher thread itself will be a high TTSP causer. The trick is to do the pre-touch in a thread that is already at a safepoint (e.g. do your pre-touch using mapped i/o from within a JNI call, use another process, or do the retouch with non-mapped i/o).

Mark Price

unread,

Dec 5, 2017, 7:26:23 AM12/5/17

to mechanical-sympathy

That (each process having it's own copy) is surprising to me. Unless the mapping is such that private copies are required, I'd expect the processes to share the page cache entries.

I can't recreate this effect locally using FileChannel.map(); the library in use in the application uses a slightly more exotic route to get to mmap, so it could be a bug there; will investigate. I could also have been imagining it.

Is your pre-toucher thread a Java thread doing it's pre-touching using mapped i/o in the same process? If so, then the pre-toucher thread itself will be a high TTSP causer. The trick is to do the pre-touch in a thread that is already at a safepoint (e.g. do your pre-touch using mapped i/o from within a JNI call, use another process, or do the retouch with non-mapped i/o).

Yes, just a java thread in the same process; I hadn't considered that it would also cause long TTSP, but of course it's just as likely (or more likely) to be scheduled off due to a page fault. I could try using pwrite via FileChannel.write() to do the pre-touching, but I think it needs to perform a CAS (i.e. don't overwrite data that is already present), so a JNI method would be the only way to go. Unless just doing a FileChannel.position(writeLimit).read(buffer) would do the job? Presumably that is enough to load the page into the cache and performing a write is unnecessary.

Message has been deleted

Gil Tene

unread,

Dec 5, 2017, 7:39:22 AM12/5/17

to mechanica...@googlegroups.com

Sent from my iPad

On Dec 5, 2017, at 1:26 PM, Mark Price <ma...@aitusoftware.com> wrote:

That (each process having it's own copy) is surprising to me. Unless the mapping is such that private copies are required, I'd expect the processes to share the page cache entries.

I can't recreate this effect locally using FileChannel.map(); the library in use in the application uses a slightly more exotic route to get to mmap, so it could be a bug there; will investigate. I could also have been imagining it.

Is your pre-toucher thread a Java thread doing it's pre-touching using mapped i/o in the same process? If so, then the pre-toucher thread itself will be a high TTSP causer. The trick is to do the pre-touch in a thread that is already at a safepoint (e.g. do your pre-touch using mapped i/o from within a JNI call, use another process, or do the retouch with non-mapped i/o).

Yes, just a java thread in the same process; I hadn't considered that it would also cause long TTSP, but of course it's just as likely (or more likely) to be scheduled off due to a page fault. I could try using pwrite via FileChannel.write() to do the pre-touching, but I think it needs to perform a CAS (i.e. don't overwrite data that is already present), so a JNI method would be the only way to go. Unless just doing a FileChannel.position(writeLimit).read(buffer) would do the job? Presumably that is enough to load the page into the cache and performing a write is unnecessary.

This (non mapped reading at the write limit) will work to eliminate the actual page I/O impact on TTSP, but the time update path with the lock that you show in your initial stack trace will probably still hit you. I’d go either with a JNI CAS, or a forked-off mapped Java pretoucher as a separate process (tell it what you wNt touched via its stdin). Not sure which one is uglier. The pure java is more portable (for Unix/Linux variants at least)

--

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

Mark Price

unread,

Dec 5, 2017, 8:50:28 AM12/5/17

to mechanical-sympathy

Great, plenty of things to try :)

Thanks for your input everyone.

To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Alex Bagehot

unread,

Dec 5, 2017, 7:02:13 PM12/5/17

to mechanica...@googlegroups.com

On Tue, Dec 5, 2017 at 11:22 AM, Mark Price <ma...@aitusoftware.com> wrote:

Hi Kirk,
thanks for the response.

I've run the same workload with -XX:+KeepSafepointsInCountedLoops, and the problem persists, so I'm fairly certain it's not that (assuming the JVM is respecting the flags...). The application is GC-free, so the safepoints are only going to be coming from (non-exhaustive): guaranteed safepoints, JIT operation?, lock biasing, others?

The environment is indeed toxic: multiple JVMs mmap files on the order of 1/4 system memory, performing either read or write operations and the page-cache flusher threads are struggling to cope (not surprisingly). I don't really want to go down the route of using madvise/mlock.

Happily, this is a test environment, so I can be brave with profiling - what sort of instrumentation do you refer to - kernel jprobes or JVM hacking?

Hi Mark, Kirk,

This seems to be a common question which could do with a generic solution.

This can be done with kernel uprobes tracing without changing the jvm. There will be less guesswork left with a kernel profiler that can report both the kernel and user stacks.

To find out:

1) what provoked the safepoint

VMThread execute : _ZN8VMThread7executeEP12VM_Operation

We could trace the scheduler wake event at this point but scheduler events can be very high frequency, we avoid if possible.

2) what is delaying the safepoint sync

SafpointSynchronize block : _ZN20SafepointSynchronize5blockEP10JavaThread

The proof of concept OpenJdk Safepoint profiler example uses linux perf. Async profiler may be able to do a better job at reporting the interpreted frames.

libjvm=

perf probe -d probe_libjvm:* && for p in _ZN8VMThread7executeEP12VM_Operation _ZN20SafepointSynchronize16do_cleanup_tasksEv _ZN20SafepointSynchronize5blockEP10JavaThread _ZN20SafepointSynchronize5beginEv _ZN2os28make_polling_page_unreadableEv _ZN20SafepointSynchronize3endEv _ZN2os26make_polling_page_readableEv ;do perf probe -x $libjvm $p;done

java_cmd=

perf record -e cpu-clock -e probe_libjvm:* -g -- taskset -c 1,2 $java_cmd -cp . -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PreserveFramePointer BogusPauseReportingDemo -t 100

BogusPauseReportingDemo started, will run for 100 msec

vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count

12.614: ParallelGCFailedAllocation [ 10 0 2 ] [ 0 217 218 0 34 ] 0

Total time for which application threads were stopped: 0.2530938 seconds, Stopping threads took: 0.2180568 seconds

BogusPauseReportingDemo done...

vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count

13.178: no vm operation [ 6 0 0 ] [ 0 0 0 0 0 ] 0

Polling page always armed

ParallelGCFailedAllocation 2

0 VM operations coalesced during safepoint

Maximum sync time 241 ms

Maximum vm operation time (except for Exit VM operation) 34 ms

[ perf record: Woken up 46 times to write data ]

[ perf record: Captured and wrote 2.574 MB perf.data (14328 samples) ]

perf script|grep probe_libjvm

java 17997 [001] 2446326.288185: probe_libjvm:_ZN8VMThread7executeEP12VM_Operation: (7f78a2367ee0)

java 17859 [001] 2446326.288251: probe_libjvm:_ZN20SafepointSynchronize5beginEv: (7f78a21d2280)

java 17859 [001] 2446326.288261: probe_libjvm:_ZN2os23serialize_thread_statesEv: (7f78a21272a0)

java 17859 [001] 2446326.288352: probe_libjvm:_ZN2os28make_polling_page_unreadableEv: (7f78a2131260)

java 17997 [001] 2446326.288403: probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)

java 17856 [002] 2446326.366545: probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)

java 17998 [002] 2446326.506283: probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)

java 17859 [002] 2446326.506313: probe_libjvm:_ZN20SafepointSynchronize16do_cleanup_tasksEv: (7f78a21d20d0)

java 17859 [002] 2446326.541251: probe_libjvm:_ZN20SafepointSynchronize3endEv: (7f78a21d3670)

java 17859 [002] 2446326.541320: probe_libjvm:_ZN2os26make_polling_page_readableEv: (7f78a21312c0)

...

java 17997 [001] 2446326.288185: probe_libjvm:_ZN8VMThread7executeEP12VM_Operation: (7f78a2367ee0)

b2fee0 VMThread::execute (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

ae96f3 TypeArrayKlass::allocate_common (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

681577 InterpreterRuntime::newarray (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

7f788d0244c7 Interpreter (/tmp/perf-17855.map)

7f788d0004e7 call_stub (/tmp/perf-17855.map)

6941e3 JavaCalls::call_helper (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691747 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691d27 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

6d5864 thread_entry (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

acdfbb JavaThread::thread_main_inner (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

ace49f JavaThread::run (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

8f2f62 java_start (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

773a start_thread (/usr/lib64/libpthread-2.24.so)

java 17997 [001] 2446326.288403: probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)

99ba70 SafepointSynchronize::block (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

b30354 VMThread::execute (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

91a3e8 ParallelScavengeHeap::mem_allocate (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

ae96f3 TypeArrayKlass::allocate_common (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

681577 InterpreterRuntime::newarray (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

7f788d0244c7 Interpreter (/tmp/perf-17855.map)

7f788d0004e7 call_stub (/tmp/perf-17855.map)

6941e3 JavaCalls::call_helper (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691747 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691d27 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

6d5864 thread_entry (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

acdfbb JavaThread::thread_main_inner (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

ace49f JavaThread::run (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

8f2f62 java_start (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

773a start_thread (/usr/lib64/libpthread-2.24.so)

java 17856 [002] 2446326.366545: probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)

99ba70 SafepointSynchronize::block (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

6db9a0 JVM_Sleep (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

7f788d0174d4 Interpreter (/tmp/perf-17855.map)

7f788d007ffd Interpreter (/tmp/perf-17855.map)

7f788d0004e7 call_stub (/tmp/perf-17855.map)

6941e3 JavaCalls::call_helper (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

6b6e43 jni_invoke_static (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

6b9566 jni_CallStaticVoidMethod (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

3371 JavaMain (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/jli/libjli.so)

773a start_thread (/usr/lib64/libpthread-2.24.so)

java 17998 [002] 2446326.506283: probe_libjvm:_ZN20SafepointSynchronize5blockEP10JavaThread: (7f78a21d3a70)

99ba70 SafepointSynchronize::block (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

7f788d108c2a Ljava/lang/System;::arraycopy (/tmp/perf-17855.map)

7f788d007ffd Interpreter (/tmp/perf-17855.map)

7f788d0004e7 call_stub (/tmp/perf-17855.map)

6941e3 JavaCalls::call_helper (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691747 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691d27 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

6d5864 thread_entry (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

acdfbb JavaThread::thread_main_inner (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

ace49f JavaThread::run (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

8f2f62 java_start (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

773a start_thread (/usr/lib64/libpthread-2.24.so)

Thanks

Alex

Cheers,

Mark

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

Gil Tene

unread,

Dec 5, 2017, 9:08:44 PM12/5/17

to mechanica...@googlegroups.com

Alex, capturing the stack trace as the thread actually reaches a taken safepoint (via SafepointSynchronize::block) is definitely a cool trick. As a "where was I when I finally reached a safepoint after a long TTSP" tracer, it may well help in investigating the problematic code paths if you are able to deduce what they were by walking backwards from the captured trace point. But often this is not sufficient to figure out what causes the long TTSP; it helps raise suspicions, but does not catch the code "red handed".

E.g. when your long TTSP is caused by a page fault incurred during on a mapped file access (which is a common case), the tracepoint for SafpointSynchronize block will not show the faulting operation on the stack. Instead, the SafpointSynchronize block trace will likely end up showing the stack trace for the back edge of a loop that contained (potentially deeply inlined) ByeBuffer.get() operations. E.g. when parsing the contents of a mapped file and taking a page fault, the trace would likely be in the body of Base64.Decoder.decode(ByteBuffer buffer) call, with no direct hint that the cause of the TTSP is a mapped file, or a page fault. If you examined the stack trace you may be able to guess (from the overall context) that the decode call was made on a ByteBuffer that was actually a MappedByteBuffer, and then suspect page faults and i/o as the culprit, but it won't be staring you in the face with a "this is what's going on" smile. And if the code where this occurred is not yours, but instead buried in some useful combinations of 3rd party maven central things you may be using (like a JSON parser lib being given a mapped byte buffer by library thing that for some reason uses mapped files to read it's config), the hunt may take a while...

IMO the key to identifying the long-TTSP-causing code paths in a way that is legible enough for people to identify or usefully speculate about the TTSP cause is to capture the stack trace during the offending operation (rather than at the safepoint polling opportunity that immediately followed it). An example technique for doing that would be to implement some TTSP-timeout notification flag that would capture stack traces when a safepoint has been requested but not yet arrived at within a timeout period. E.g. if HotSpot sent a signal to offending (not yet-reached-safepoint-after-too-long-a-time) threads and the signal handler that deals with that signal called a specific function (e.g. "report_long_ttsp"), tracing THAT function would give you the stack trace you want, and catch the TTSP-cause red handed. And the signal handler itself could obviously log the traces in the JVM.

BTW, my BogusPauseReportingDemo and BogusPauseLoopReportingDemo were originally written to demonstrate TTSP behavior and the discrepancy in reporting it as pause time in GC logs. The examples use array copy operations and long running counted loops (respectively), and show that application stopped times reported with the -XX:+PrintGCApplicationStoppedTime output lines, are not reflected in the GC pause output lines [e.g. see output below]. In the BogusPauseReportingDemo array copy as-is demo case, we're "only" talking about a 20x discrepancy, and in the BogusPauseLoopReportingDemo counted loop as-is demo case we're talking about a 1,500x+ discrepancy. Both can be made arbitrarily larger or smaller by playing with copy length and trip counts. You need to run them for a while (I usually use 100 second runs, so -t 100000) to see the real effects (the loop demo, for example, will only start showing the big discrepancies after 10-30 seconds of runtime with a 1-2GB heap on my laptop):

% java -Xmx1g -XX:+UseG1GC -cp target/classes -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime BogusPauseReportingDemo -t 100000

...

3.075: [GC cleanup 526M->526M(1024M), 0.0007277 secs]

3.075: Total time for which application threads were stopped: 0.0429391 seconds, Stopping threads took: 0.0421697 seconds

3.138: [GC pause (G1 Humongous Allocation) (young) 850M->514M(1024M), 0.0019372 secs]

3.140: Total time for which application threads were stopped: 0.0142784 seconds, Stopping threads took: 0.0122150 seconds

3.177: [GC pause (G1 Humongous Allocation) (young) (initial-mark) 516M->514M(1024M), 0.0023110 secs]

3.179: Total time for which application threads were stopped: 0.0386066 seconds, Stopping threads took: 0.0361923 seconds

...

% java -Xmx1g -XX:+UseG1GC -cp target/classes -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime BogusLoopPauseReportingDemo -t 100000

...

34.782: [GC cleanup 524M->524M(1024M), 0.0005976 secs]

34.783: Total time for which application threads were stopped: 1.4176907 seconds, Stopping threads took: 1.4170656 seconds

36.183: [GC pause (G1 Humongous Allocation) (young) 850M->514M(1024M), 0.0009254 secs]

36.184: Total time for which application threads were stopped: 1.3500081 seconds, Stopping threads took: 1.3489825 seconds

37.573: [GC pause (G1 Humongous Allocation) (young) (initial-mark) 516M->514M(1024M), 0.0008025 secs]

37.574: Total time for which application threads were stopped: 1.3897568 seconds, Stopping threads took: 1.3888889 seconds

...

% java -version

java version "1.8.0_151"

Java(TM) SE Runtime Environment (build 1.8.0_151-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

Cheers,

Mark

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Alex Bagehot

unread,

Dec 6, 2017, 3:34:18 AM12/6/17

to mechanica...@googlegroups.com

Hi Gil

On Wed, Dec 6, 2017 at 2:08 AM, Gil Tene <g...@azul.com> wrote:

Alex, capturing the stack trace as the thread actually reaches a taken safepoint (via SafepointSynchronize::block) is definitely a cool trick. As a "where was I when I finally reached a safepoint after a long TTSP" tracer, it may well help in investigating the problematic code paths if you are able to deduce what they were by walking backwards from the captured trace point. But often this is not sufficient to figure out what causes the long TTSP; it helps raise suspicions, but does not catch the code "red handed".

Yes... in the double checking I was doing before posting I eventually missed out this part from the response in the email! Thanks for calling it out.

I did note it in the ticket I opened on AP ( https://github.com/jvm-profiling-tools/async-profiler/issues/74 )

At the time of VMThread::execute necessarily other threads are either on or off cpu.

If ON:

we sample cpu to record what it was doing to delay the TTSP (thread 17998 in this example)

this is the "-e cpu-clock" part of the perf tracing command

If OFF:

It's not doing anything/ consuming resource but it needs to be scheduled in to come to safepoint which incurs it's own delay (although I can't find any discussion on this anywhere, so maybe wrong). It'll be pretty obvious what it was doing as the stack will say sleep or similar. In the example this was thread 17856 and the delay was surprisingly long in that particular test. Sched switch/wake tracing could provide definitive trace events about how this situation was reached if needed (we avoid tracing sched as it can get high overhead with high freq events).

In a production implementation of this you would likely want to only enable the cpu sampling once the VMThread::execute was hit. Or have it on but ignore the samples outside of the time of interest. Or classify them differently as 'safepoint cpu'.

So the stack of where it was during the safepoint sync delay (these cpu samples would normally be presented as a CPU flamegraph, although you're likely to see only ):

java 17998 2446326.506252: 250000 cpu-clock:

7c38b6 acl_CopyRight (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

7f788d108c2a Ljava/lang/System;::arraycopy (/tmp/perf-17855.map)

7f788d007ffd Interpreter (/tmp/perf-17855.map)

7f788d0004e7 call_stub (/tmp/perf-17855.map)

6941e3 JavaCalls::call_helper (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691747 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691d27 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

6d5864 thread_entry (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

acdfbb JavaThread::thread_main_inner (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

ace49f JavaThread::run (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

8f2f62 java_start (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

773a start_thread (/usr/lib64/libpthread-2.24.so)

acl_CopyRight is in hotspot/src/os_cpu/linux_x86/vm/linux_x86_64.s

arrayof_conjoint_jlongs

Sometimes that stack would change to variations on :

java 18874 2412872.636139: 659905 cycles:

7fff89216733 handle_mm_fault ([kernel.kallsyms])

7fff8906a6a6 __do_page_fault ([kernel.kallsyms])

7fff8906a970 do_page_fault ([kernel.kallsyms])

7fff898a3d08 page_fault ([kernel.kallsyms])

7c38a4 acl_CopyRight (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

7f59e1108c2a [unknown] (/tmp/perf-18853.map)

7f59e1007ffd [unknown] (/tmp/perf-18853.map)

7f59e10004e7 [unknown] (/tmp/perf-18853.map)

6941e3 JavaCalls::call_helper (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691747 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

691d27 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

6d5864 thread_entry (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

acdfbb JavaThread::thread_main_inner (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

ace49f JavaThread::run (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

8f2f62 java_start (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)

773a start_thread (/usr/lib64/libpthread-2.24.so)

Depending on the toxicity of the environment I was running it on.

You could then use system/kernel tracers to answer further questions like: how many/ rate of page_faults? what is the latency distribution of them over time? function argument distribution? etc

So this does provide complete and precise answers to TTSP diagnostic questions.

E.g. when your long TTSP is caused by a page fault incurred during on a mapped file access (which is a common case), the tracepoint for SafpointSynchronize block will not show the faulting operation on the stack. Instead, the SafpointSynchronize block trace will likely end up showing the stack trace for the back edge of a loop that contained (potentially deeply inlined) ByeBuffer.get() operations. E.g. when parsing the contents of a mapped file and taking a page fault, the trace would likely be in the body of Base64.Decoder.decode(ByteBuffer buffer) call, with no direct hint that the cause of the TTSP is a mapped file, or a page fault. If you examined the stack trace you may be able to guess (from the overall context) that the decode call was made on a ByteBuffer that was actually a MappedByteBuffer, and then suspect page faults and i/o as the culprit, but it won't be starting you in the face with a "this is what's going on" smile. And if the code where this occurred is not yours, but instead buried in some useful combinations of 3rd party maven central things you may be using (like a JSON parser lib being given a mapped byte buffer by library thing that for some reason uses mapped files to read it's config), the hunt may take a while...

see above.

IMO the key to identifying the long-TTSP-causing code paths in a way that is legible enough for people to identify or usefully speculate about the TTSP cause is to capture the stack trace during the offending operation (rather than at the safepoint polling opportunity that immediately followed it). An example technique for doing that would be to implement some TTSP-timeout notification flag that would capture stack traces when a safepoint has been requested but not yet arrived at within a timeout period. E.g. if HotSpot sent a signal to offending (not yet-reached-safepoint-after-too-long-a-time) threads and the signal handler that deals with that signal called a specific function (e.g. "report_long_ttsp"), tracing THAT function would give you the stack trace you want, and catch the TTSP-cause red handed. And the signal handler itself could obviously log the traces in the JVM.

Agree that attacking it from different angles will likely be useful as no one tool ever produces all the diag you need for all workloads/environments.

BTW, my BogusPauseReportingDemo and BogusPauseLoopReportingDemo were originally written to demonstrate TTSP behavior and the discrepancy in reporting it as pause time in GC logs.

Thanks for it. It provides a great workload for looking at this kind of thing.

Cheers,

Mark

Hi,

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.

To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward