
Yes, it's black magic particularly when jitter is rare and can't be predictably caught in the act.
What linux kernel version is this? What hardware, including disk? What filesystem? When you say this happens during GC reference processing, are you specifically referring to the Reference (e.g. weak, soft) processing phase or a GC cycle in general? If the latter, is this Full GC? Which GC are you using? How many GC worker threads are configured? How much disk writeback is occuring?
You can get fairly long systemwide stalls when the kernel is performing lots of dirty page writebacks to a slow disk and you happen to hit a codepath that's dirtying a file backed memory page.
sent from my phone
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Gil,
thanks for your answers.
On Fri, Aug 7, 2015 at 5:48 PM, Gil Tene <g...@azulsystems.com> wrote:
>
> Simon, assuming the jHiccup log you plot above from a jHiccup run within the application's JVM (e.g. by adding at as an agent to the command line for that application's JVM launch), then the output is most likely indicative of your GC behavior within that JVM. While it *may* be caused by things outside of that JVM (like Linux-related artifacts), that's not something you can tell without comparing your in-application jHiccup log with the output from another jHiccup running at the same time in another (idle) process.
Right.
> Luckily, jHiccup has a "-c" flag option that launches just such an idle control process to compare logs with, since this is a very common need. If you launch jHiccup as an agent in your main application and give it the "-c" flag, you will end up with two separate log files. One for your application's process, and one for a co-start/co-terminating idle control process. Looking at both of these logs side by side will tell you a lot about where to look for the causes of externally observed latency spikes (in the JVM, in the system, or in your application code or network):
>
> - If both logs show spikes that happen at the same time, and are of a similar magnitude, your issue is not your code or the network, and it's not inside the JVM. It's your entire system that is glitching, and you'll want to look for the usual suspects: swapping, scheduling conflicts, paging, power settings, blame the hypervisor, etc. That's where the black magic part usually comes in.
I was thinking that if, for whatever reason, the main JVM burns all
the cores and has a large number of threads, the control JVM may not
have the chance to run, and therefore report a spike that is directly
caused by the main JVM, with similar magnitude. If that is true, then
I would look at the OS where it's actually the main JVM. That's why I
asked about isolation and pinning of the control JVM. Am I off path
here ?
The system is certainly writing *a lot* of application logs to disk,
but I'm not sure how to get the writeback information you asked, can
you help ?