On Dec 5, 2017, at 11:53 AM, Gil Tene <g...@azul.com> wrote:
Page faults in mapped file i/o and counted loops are certainly two common causes of long TTSP. But there are many other paths that *could* cause it as well in HotSpot. Without catching it and looking at the stack trace, it's hard to know which ones to blame. Once you knock out one cause, you'll see if there is another.
In the specific stack trace you showed [assuming that trace was taken during a long TTSP], mapped file i/o is the most likely culprit. Your trace seems to be around making the page write-able for the first time and updating the file time (which takes a lock), but even without needing the lock, the fault itself could end up waiting for the i/o to complete (read page from disk), and that (when Murh=phy pays you a visit) can end up waiting behind 100s other i/o operations (e.g. when your i/o happens at the same time the kernel decided to flush some dirty pages in the cache), leading to TTSPs in the 100s of msec.
As I'm sure you already know, one simple way to get around mapped file related TTSP is to not used mapped files. Explicit random i/o calls are always done while at a safepoint, so they can't cause high TTSPs.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
Hi,
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
In Zing, we have a built-in TTSP profiler for exactly this reason.
Hi Gil,
thanks for the response.
I'm fairly sure that interaction with the page-cache is one of the problems. When this is occurring, the free-mem is already hovering around vm.min_free_kbytes, and the mapped files are a significant fraction of system memory. From what I can see, each process that maps a file will get its own copy in the page-cache (kernel shared pages doesn't seem to apply to the page-cache, and is otherwise disabled on the test machine), so we probably have approaching total system memory in use by cached pages.
I had thought that pages that were last written-to before the vm.dirty_expire_centisecs threshold would be written out to disk by the flusher threads, but I read on lkml that the access times are maintained on a per-inode basis, rather than per-page. If this is the case, then the system in question is making it very difficult for the page-cache to work efficiently.
The system makes use of a "pre-toucher" thread to try to handle page-faults ahead of the thread that is trying to write application data to the mapped pages. However, it seems that it is not always successful, so I need to spend a bit of time trying to figure out why that is not working. It's possible that there is just too much memory pressure, and the OS is swapping out pages that have be loaded by the pre-toucher before the application gets to them.
That (each process having it's own copy) is surprising to me. Unless the mapping is such that private copies are required, I'd expect the processes to share the page cache entries.
Is your pre-toucher thread a Java thread doing it's pre-touching using mapped i/o in the same process? If so, then the pre-toucher thread itself will be a high TTSP causer. The trick is to do the pre-touch in a thread that is already at a safepoint (e.g. do your pre-touch using mapped i/o from within a JNI call, use another process, or do the retouch with non-mapped i/o).
That (each process having it's own copy) is surprising to me. Unless the mapping is such that private copies are required, I'd expect the processes to share the page cache entries.
I can't recreate this effect locally using FileChannel.map(); the library in use in the application uses a slightly more exotic route to get to mmap, so it could be a bug there; will investigate. I could also have been imagining it.
Is your pre-toucher thread a Java thread doing it's pre-touching using mapped i/o in the same process? If so, then the pre-toucher thread itself will be a high TTSP causer. The trick is to do the pre-touch in a thread that is already at a safepoint (e.g. do your pre-touch using mapped i/o from within a JNI call, use another process, or do the retouch with non-mapped i/o).
Yes, just a java thread in the same process; I hadn't considered that it would also cause long TTSP, but of course it's just as likely (or more likely) to be scheduled off due to a page fault. I could try using pwrite via FileChannel.write() to do the pre-touching, but I think it needs to perform a CAS (i.e. don't overwrite data that is already present), so a JNI method would be the only way to go. Unless just doing a FileChannel.position(writeLimit).read(buffer) would do the job? Presumably that is enough to load the page into the cache and performing a write is unnecessary.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Happily, this is a test environment, so I can be brave with profiling - what sort of instrumentation do you refer to - kernel jprobes or JVM hacking?The environment is indeed toxic: multiple JVMs mmap files on the order of 1/4 system memory, performing either read or write operations and the page-cache flusher threads are struggling to cope (not surprisingly). I don't really want to go down the route of using madvise/mlock.I've run the same workload with -XX:+KeepSafepointsInCountedLoops, and the problem persists, so I'm fairly certain it's not that (assuming the JVM is respecting the flags...). The application is GC-free, so the safepoints are only going to be coming from (non-exhaustive): guaranteed safepoints, JIT operation?, lock biasing, others?Hi Kirk,thanks for the response.
MarkCheers,
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
% java -Xmx1g -XX:+UseG1GC -cp target/classes -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime BogusPauseReportingDemo -t 100000
...
3.075: [GC cleanup 526M->526M(1024M), 0.0007277 secs]
3.075: Total time for which application threads were stopped: 0.0429391 seconds, Stopping threads took: 0.0421697 seconds
3.138: [GC pause (G1 Humongous Allocation) (young) 850M->514M(1024M), 0.0019372 secs]
3.140: Total time for which application threads were stopped: 0.0142784 seconds, Stopping threads took: 0.0122150 seconds
3.177: [GC pause (G1 Humongous Allocation) (young) (initial-mark) 516M->514M(1024M), 0.0023110 secs]
3.179: Total time for which application threads were stopped: 0.0386066 seconds, Stopping threads took: 0.0361923 seconds
% java -Xmx1g -XX:+UseG1GC -cp target/classes -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime BogusLoopPauseReportingDemo -t 100000
...
34.782: [GC cleanup 524M->524M(1024M), 0.0005976 secs]
34.783: Total time for which application threads were stopped: 1.4176907 seconds, Stopping threads took: 1.4170656 seconds
36.183: [GC pause (G1 Humongous Allocation) (young) 850M->514M(1024M), 0.0009254 secs]
36.184: Total time for which application threads were stopped: 1.3500081 seconds, Stopping threads took: 1.3489825 seconds
37.573: [GC pause (G1 Humongous Allocation) (young) (initial-mark) 516M->514M(1024M), 0.0008025 secs]
37.574: Total time for which application threads were stopped: 1.3897568 seconds, Stopping threads took: 1.3888889 seconds
...
% java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
MarkCheers,
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Alex, capturing the stack trace as the thread actually reaches a taken safepoint (via SafepointSynchronize::block) is definitely a cool trick. As a "where was I when I finally reached a safepoint after a long TTSP" tracer, it may well help in investigating the problematic code paths if you are able to deduce what they were by walking backwards from the captured trace point. But often this is not sufficient to figure out what causes the long TTSP; it helps raise suspicions, but does not catch the code "red handed".
java 17998 2446326.506252: 250000 cpu-clock:
7c38b6 acl_CopyRight (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
7f788d108c2a Ljava/lang/System;::arraycopy (/tmp/perf-17855.map)
7f788d007ffd Interpreter (/tmp/perf-17855.map)
7f788d0004e7 call_stub (/tmp/perf-17855.map)
6941e3 JavaCalls::call_helper (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
691747 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
691d27 JavaCalls::call_virtual (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
6d5864 thread_entry (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
acdfbb JavaThread::thread_main_inner (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
ace49f JavaThread::run (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
8f2f62 java_start (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.fc25.x86_64/jre/lib/amd64/server/libjvm.so)
773a start_thread (/usr/lib64/libpthread-2.24.so)
E.g. when your long TTSP is caused by a page fault incurred during on a mapped file access (which is a common case), the tracepoint for SafpointSynchronize block will not show the faulting operation on the stack. Instead, the SafpointSynchronize block trace will likely end up showing the stack trace for the back edge of a loop that contained (potentially deeply inlined) ByeBuffer.get() operations. E.g. when parsing the contents of a mapped file and taking a page fault, the trace would likely be in the body of Base64.Decoder.decode(ByteBuffer buffer) call, with no direct hint that the cause of the TTSP is a mapped file, or a page fault. If you examined the stack trace you may be able to guess (from the overall context) that the decode call was made on a ByteBuffer that was actually a MappedByteBuffer, and then suspect page faults and i/o as the culprit, but it won't be starting you in the face with a "this is what's going on" smile. And if the code where this occurred is not yours, but instead buried in some useful combinations of 3rd party maven central things you may be using (like a JSON parser lib being given a mapped byte buffer by library thing that for some reason uses mapped files to read it's config), the hunt may take a while...
IMO the key to identifying the long-TTSP-causing code paths in a way that is legible enough for people to identify or usefully speculate about the TTSP cause is to capture the stack trace during the offending operation (rather than at the safepoint polling opportunity that immediately followed it). An example technique for doing that would be to implement some TTSP-timeout notification flag that would capture stack traces when a safepoint has been requested but not yet arrived at within a timeout period. E.g. if HotSpot sent a signal to offending (not yet-reached-safepoint-after-too-long-a-time) threads and the signal handler that deals with that signal called a specific function (e.g. "report_long_ttsp"), tracing THAT function would give you the stack trace you want, and catch the TTSP-cause red handed. And the signal handler itself could obviously log the traces in the JVM.
BTW, my BogusPauseReportingDemo and BogusPauseLoopReportingDemo were originally written to demonstrate TTSP behavior and the discrepancy in reporting it as pause time in GC logs.
MarkCheers,
Hi,
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/tepoA7PRFRU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.