So, what do '-XX:+PrintGCApplicationStoppedTime' and '-XX:+PrintGCApplicationConcurrentTime' imply?

3,485 views
Skip to first unread message

Darach Ennis

unread,
Jul 1, 2013, 8:21:31 AM7/1/13
to mechanica...@googlegroups.com
Hi guys,

It used to be the case that application stopped implied time from global safepoint to finishing GC activity yet it seems
that the meaning or semantics of application stopped may have changed more than once, and depending on your
interest a meaningful interpretation may be closer or further than preferable. The following bugs don't inspire much
confidence in the validity of my current interpretation (which seems consistent with Mr T's understanding in his recent
GC distilled article on InfoQ).


Has anyone here got a reasonably accurate and current (JDK7) understanding of what this means today and/or
concerns with using this on its own (no timestamp for application stopped log entries, prevents correlating it with
VM operations that may have been causal, for example) or in concern with other flags such as
PrintGCApplicationConcurrentTime.

Exactly what do these two flags mean? Finding a formal or normative description is coming up blank so if anyone
has recently reviewed the relevant code in hotspot could they share their understanding of what is currently implemented,
warts and all?

Cheers,

Darach.



Jean-Philippe BEMPEL

unread,
Jul 1, 2013, 11:40:34 AM7/1/13
to mechanica...@googlegroups.com
Basically PrintGCApplicationStoppedTime give you the time spent in safepoint (= all threads stopped).

PrintGCApplicationConcurrentTime should give you the time spent not in safepoints (I mean between 2 SP)

Regards
Jean-Philippe

Gil Tene

unread,
Jul 2, 2013, 1:19:37 AM7/2/13
to
I find no use for PrintGCApplicationConcurrentTime, but PrintGCApplicationStoppedTime is probably the only number in GC logs that you should sort-of-semi-believe.

I usually start by telling people that they should not blindly believe anything a JVM claims about stopped times (and that includes Zing), and that everyone should use observation-based measurements to keep the logs "honest". This is [partly] what I built jHiccup for. If the GC (and other pause time) logs seem to correlate with your observation-based numbers, then the report is probably fairly accurate. But if/when they don't, believe the observations and not the "I think this is how long I was doing something while I had your application paused" point of view of the JVM.

As to what PrintGCApplicationStoppedTime means: It is supposed to report the time the JVM spent in a stop-the-world safepoint of any kind. In Zing we work hard to make this number encompass the entire safepoint time, end-to-end, inclusively. Specifically, this means the time from when the very first thread was asked to get to a safepoint and stay there, until the very last thread was allowed to leave the safepoint. I believe that OpenJDK and Oracle HotSpot do a decent job of reporting with the same rules for this specific metric.

However, when it comes to other "stop time" reports, you will see potentially huge gaps in how "honest" JVMs are. Our GC times in Zing are as inclusive as PrintGCApplicationStoppedTime, but some others (e.g. the Oracle HotSpot and OpenJDK GCs) often report only the time the GC actually did work while at a safepoint, without including the time it took to get to the safepoint or the time it took to leave it (during which the GC was stalled waiting for all the mutators to arrive at the safepoint).

This may seem like nit-picking, until you realize that in many cases Time To SafePoint (a term we use at Azul so much we abbreviate it as TTSP) can dominate actual observed pause times when those drop into the handful of milliseconds. Note that this is not a "usually", "mostly", or "on the average" statement. It's a "sometimes" statement. While most of your GC-reported pause times may tend to come very close to the associated PrintGCApplicationStoppedTime output, "every once in a while" you can see a huge gap. Those are times when GC dramatically under-reported pause times. It's typically a situation where all threads but one are stalled, waiting for a laggard thread to arrive at the safepoint.

There are many causes for long TTSP in JVMs, and I consider TTSP to be "your next worst enemy" once GC pause work has been eliminated or addressed (as it is with Zing). At Azul, we've spent the past 5+ years beating TTSP times down in the Zing JVM, so we know a lot about them. The simplest example I can give is an array copy. Array copies (in regular HotSpot) do not have a safepoint in the middle. As a copy of a 200MB array represent a period of ~50-300msec (depending on HW) where a global, JVM-wide safepoint cannot be completed. If you are unlucky enough to need to get to a safepoint while one of those is going on (e.g. when GC is triggered, or if you load a class, or if a de-optimization happens), the TTSP can take end up being 10-100x as big as the GC work itself, with practically all your application seeing that length as an observed pause. In such a case, guess what the GC will report, and what PrintGCApplicationStoppedTime will report?

To take the guessing out of it, here is a small demo I happen to have laying around:

/**
 * HiccupMeter.java
 * Written by Gil Tene of Azul Systems, and released to the public domain,
 * as explained at http://creativecommons.org/publicdomain/zero/1.0/
 */


public class BogusPauseReportingDemo {


   
static final int COPYING_ARRAY_LENGTH = 256 * 1024 * 1024;
   
static final int ALLOCATING_ARRAY_LENGTH = 2 * 1024 * 1024;


   
static volatile boolean doRun = true;


   
static class Allocator extends Thread {
       
volatile byte[] array;


       
public void run() {
           
while (doRun) {
                array
= new byte[ALLOCATING_ARRAY_LENGTH];
           
}
       
}
   
}


   
static class Copier extends Thread {
       
volatile byte array1[] = new byte[COPYING_ARRAY_LENGTH];
       
volatile byte array2[] = new byte[COPYING_ARRAY_LENGTH];


       
public void run() {
           
while (doRun) {
               
System.arraycopy(array1, 0, array2, 0, COPYING_ARRAY_LENGTH);
           
}
       
}
   
}


   
public static void main(String [] args) {
       
Allocator allocator = new Allocator();
       
Copier copier = new Copier();


       
long runTimeMsec = 10000;
       
for (int i = 0; i < args.length; ++i) {
           
if (args[i].equals("-t")) {
                runTimeMsec
= Integer.parseInt(args[++i]);
           
} else {
               
System.out.println("Usage: java BogusPauseReportingDemo [-t runTimeMsec]");
               
System.exit(1);
           
}
       
}


       
System.out.println("BogusPauseReportingDemo started, will run for " + runTimeMsec + " msec");
        allocator
.start();
        copier
.start();
       
try {
           
Thread.sleep(runTimeMsec);
       
} catch (InterruptedException ex) {


       
}


        doRun
= false;


       
System.out.println("BogusPauseReportingDemo done...");
   
}
}


Here is sample output from a run with Oracle HotSpot on a 2 socket X5690 server:
(Note the occasional >10:1 under-reporting of pause time in the GC lines, compared to the application stopped lines)

qalperf2.azulsystems.com-9% ${JAVA_HOME}/bin/java -Xmx1g -verbosegc -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime BogusPauseReportingDemo -t 5000
BogusPauseReportingDemo started, will run for 5000 msec
0.853: [GC 784634K->526624K(1004928K), 0.0046340 secs]
Total time for which application threads were stopped: 0.0129990 seconds
1.172: [GC 787597K->526656K(1004928K), 0.0040280 secs]
Total time for which application threads were stopped: 0.0084650 seconds
1.492: [GC 788511K->526656K(1004928K), 0.0029970 secs]
Total time for which application threads were stopped: 0.0149710 seconds
1.822: [GC 787908K->526736K(1004928K), 0.0051360 secs]
Total time for which application threads were stopped: 0.0053990 seconds
2.136: [GC 787592K->526720K(957248K), 0.0024930 secs]
Total time for which application threads were stopped: 0.0028780 seconds
2.446: [GC 781160K->526640K(1002496K), 0.0051180 secs]
Total time for which application threads were stopped: 0.0123830 seconds
2.852: [GC 826026K->526592K(1002496K), 0.0027390 secs]
Total time for which application threads were stopped: 0.0214070 seconds
3.250: [GC 825849K->526592K(994112K), 0.0031760 secs]
Total time for which application threads were stopped: 0.0416690 seconds
3.648: [GC 817568K->526592K(986432K), 0.0035210 secs]
Total time for which application threads were stopped: 0.0472290 seconds
4.051: [GC 811367K->526592K(976960K), 0.0047570 secs]
Total time for which application threads were stopped: 0.0552170 seconds
Total time for which application threads were stopped: 0.0548840 seconds
4.452: [GC 803138K->528640K(1042112K), 0.0048630 secs]
Total time for which application threads were stopped: 0.0277950 seconds
4.932: [GC 868662K->528640K(1031680K), 0.0043900 secs]
Total time for which application threads were stopped: 0.0375160 seconds
BogusPauseReportingDemo done...


Peter Lawrey

unread,
Jul 2, 2013, 2:34:53 AM7/2/13
to Gil Tene, mechanica...@googlegroups.com
The JVM can be interrupted for long periods of time without pausing in this way.  


This test repeatedly calls System.nanoTime() and counts how often a running program is interrupted/loses context for at least 2 micro-seconds.  Without tuning the scheduler, the busy looping application can see delays into the milli-seconds (up to 5 ms)  Note: this program doesn't use multiple threads, create significant garbage, and cache misses can't explain delays into the thousands or millions of clock cycles.

I suspect if you measure the real pauses in the system this way you will see more than the JVM is aware of.

Peter.



On 2 July 2013 06:17, Gil Tene <g...@azulsystems.com> wrote:
I find no use for PrintGCApplicationConcurrentTime, but PrintGCApplicationStoppedTime is probably the only number in GC logs that you should sort-of-semi-believe.

I usually start by telling people that they should not blindly believe anything a JVM claims about stopped times (and that includes Zing), and that everyone should use observation-based measurements to keep the logs "honest". This is [partly] what I built jHiccup for. If the GC (and other pause time) logs seem to correlate with your observation-based numbers, then the report is probably fairly accurate. But if/when they don't, believe the observations and not the "I think this is how long I was doing something while I had your application paused" point of view of the JVM.

As to what PrintGCApplicationStoppedTime means: It is supposed to report the time the JVM spent in a stop-the-world safepoint of any kind. In Zing we work hard to make this number encompass the entire safepoint time, end-to-end, inclusively. Specifically, this means the time from when the very first thread was asked to get to a safepoint and stay there, until the very last thread was allowed to leave the safepoint. I believe that OpenJDK and Oracle HotSpot do a decent job of reporting with the same rules for this specific metric.

However, when it comes to other "stop time" reports, you will see potentially huge gaps in how "honest" JVMs are. Our GC times in Zing are as inclusive as PrintGCApplicationStoppedTime, but some others (e.g. the Oracle HotSpot and OpenJDK GCs) often report only the time the GC actually did work while at a safepoint, without including the time it took to get to the safepoint or the time it took to leave it (during which the GC was stalled waiting for all the mutators to arrive at the safepoint).

This may seem like nit-picking, until you realize that in many cases Time To SafePoint (a term we use at Azul so much we abbreviate it as TTSP) can dominate actual observed pause times when those drop into the handful of milliseconds. Note that this is not a "usually", "mostly", or "on the average" statement. It's a "sometimes" statement. While most of your GC-reported pause times may tend to come very close to the associated PrintGCApplicationStoppedTime output, "every once in a while" you can see a huge gap. Those are times when GC dramatically under-reported pause times. It's typically a situation where all threads by one are stalled, waiting for a laggard thread to arrive at the safepoint.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Gil Tene

unread,
Jul 2, 2013, 11:37:54 AM7/2/13
to
That's a cool set of measurements there. Nice blog entry.

There is a key difference between your application being susceptible to System jitter (as Peter's blog entry shows), and the same application being susceptible to JVM jitter. PrintGCApplicationStoppedTime gives you a feel for JVM jitter that the JVM actually admits to causing, even on a completely non-jittering system. 

System jitter is usually (not always) a thread local thing. JVM jitter is usually (not always) a stop-the-world event. Since JVM jitter tends to involve JVM-wide synchronization events, it is dominated by JVM work that needs to be done in a stop-the-world situation (e.g. GC), and by program code that ends up having a long TTSP (where one laggard thread can cause a JVM wide stall). These types of JVM jitter will happen even on a system with no significant System jitter, and with all the best scheduling isolation in the world turned on. In my experience, JVM jitter (on regular HotSpot JVMs) can easily reach levels that are orders of magnitude above System jitter, and therefor represent the low hanging watermelons to go after in jitter-reduction efforts. System Jitter that is measured in the milliseconds is usually worth chasing only after GC and TTSP have been beaten down, either through careful coding and tuning to avoid triggering them in a JVM, or by using a JVM like Zing that keeps JVM jitter down to the millisecond (and often sub-millisecond) level.

The most common form of System jitter usually comes in the form of stealing a CPU from single thread in some way. Sometimes it's plain scheduling contention (as I suspect the isolcpus difference exhibits), and sometimes its more nefarious. E.g. CPU going into deep power savings states, BIOS doing periodic memory scrubbing using your CPU, interrupts being handled by your CPU, and hypervisors time slicing your CPU away are all things the scheduler is normally unaware of. They key issue with things the scheduler is unaware of is that if the scheduler is aware that a vcore is busy (i.e. it is the one that made it busy by putting another thread on it), and another vcore is available and otherwise idle, the scheduler will tend to aggressively move runnable threads there (with only sub-sub-millisecond delays), usually as a result of the idle cores aggressively stealing threads off of busy work queues. However, when CPU time is stolen without scheduler awareness (your thread is "running" as far as the run queues are concerned), the impacted thread will not be moved to use other idle vcores, and will feel the full brunt of the theft. E.g. a 2+ msec memory scrub loop increment, managed by the BIOS, will not make a runnable thread move to another currently-idle core, and a 50msec hypervisior time-slice steal will similarly mess things up.

With that said, there are other ugly forms of System jitter that can reach into the 100s of milliseconds. File cache pressure and Transparent Huge Pages are the two bigger ones I've met in the wild so far. File cache pressure can do silly things like page your seemingly "cold" code pages out to disk because it wanted to avoid writing dirty log pages to disk, leading to many-millisecond cascaded paging delays when that cold code suddenly needs to run. Transparent Huge pages have this nasty habit of occasionally deciding to defragment an entire zone of memory using an unfortunate calling thread that just tried to do a simple malloc. In each of these cases, the System jitter can become large enough to dominate observable jitter in otherwise low-jitter enviornments.

ling ma

unread,
Mar 9, 2014, 1:05:32 PM3/9/14
to mechanica...@googlegroups.com
Very nice explanations of TTSP. any reason the array variables are defined as volatile type in this BogusPauseReportingDemo ?

On Tuesday, July 2, 2013 12:17:03 AM UTC-5, Gil Tene wrote:
I find no use for PrintGCApplicationConcurrentTime, but PrintGCApplicationStoppedTime is probably the only number in GC logs that you should sort-of-semi-believe.

I usually start by telling people that they should not blindly believe anything a JVM claims about stopped times (and that includes Zing), and that everyone should use observation-based measurements to keep the logs "honest". This is [partly] what I built jHiccup for. If the GC (and other pause time) logs seem to correlate with your observation-based numbers, then the report is probably fairly accurate. But if/when they don't, believe the observations and not the "I think this is how long I was doing something while I had your application paused" point of view of the JVM.

As to what PrintGCApplicationStoppedTime means: It is supposed to report the time the JVM spent in a stop-the-world safepoint of any kind. In Zing we work hard to make this number encompass the entire safepoint time, end-to-end, inclusively. Specifically, this means the time from when the very first thread was asked to get to a safepoint and stay there, until the very last thread was allowed to leave the safepoint. I believe that OpenJDK and Oracle HotSpot do a decent job of reporting with the same rules for this specific metric.

However, when it comes to other "stop time" reports, you will see potentially huge gaps in how "honest" JVMs are. Our GC times in Zing are as inclusive as PrintGCApplicationStoppedTime, but some others (e.g. the Oracle HotSpot and OpenJDK GCs) often report only the time the GC actually did work while at a safepoint, without including the time it took to get to the safepoint or the time it took to leave it (during which the GC was stalled waiting for all the mutators to arrive at the safepoint).

This may seem like nit-picking, until you realize that in many cases Time To SafePoint (a term we use at Azul so much we abbreviate it as TTSP) can dominate actual observed pause times when those drop into the handful of milliseconds. Note that this is not a "usually", "mostly", or "on the average" statement. It's a "sometimes" statement. While most of your GC-reported pause times may tend to come very close to the associated PrintGCApplicationStoppedTime output, "every once in a while" you can see a huge gap. Those are times when GC dramatically under-reported pause times. It's typically a situation where all threads but one are stalled, waiting for a laggard thread to arrive at the safepoint.

Kirk Pepperdine

unread,
Mar 9, 2014, 2:21:37 PM3/9/14
to mechanica...@googlegroups.com
I have to agree, TTSP is a performance killer. I once got a bench to consume 99% of CPU in user space (as it should have been able to do) by simply turning of biased locking (which is casual for many calls to safe point). Before that the best I could get is about 70%. Quite some time ago I suggested that anyone putting a call to safe point in the JVM code should have to pay a penalty equal to about one months salary. I don’t think I was harsh enough.

Regards,
Kirk

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gil Tene

unread,
Mar 9, 2014, 6:31:35 PM3/9/14
to mechanica...@googlegroups.com
Re: the volatile question below. It may not be necessary in all the specific case, but my default in in micro benchmark demos is to use volatiles for all state that I don't *need* a non-volatile for (rather than the other way around). This is mostly a means of keeping compiler optimizations from doing some strange stuff (like eliminate the allocation in the Allocator thread), but it's a default action for me rather than a well-considered site-by-site thing. Taking volatile off is the thing I spend more time thinking about.

And before people jump all over this with warnings about how volatiles are not enough - I fully agree, but since this is meant as a short standalone demo, and doesn't get to use a framework like JMH, I just follow some ground rules that [somewhat help]. Storing into a volatile is effectively like an "observable side effect" from current JIT compilers' point of view, since today's JITs don't try to prove away multi-threaded access to a volatile. The same is true for reading from a volatile - it eliminates many state caching optimizations (that would allow hoisting of reads out of loops and such).

In the specific case of the Copier, the arrays may not have to be volatile. But in the specific case of the Allocator thread in this code, the JIT compiler could have legitimately eliminated the allocation from the loop, and hoisted it out of the doRun loop, if the array field it was publishing the allocations to was not volatile.

ling ma

unread,
Mar 9, 2014, 6:57:30 PM3/9/14
to mechanica...@googlegroups.com
nice trick. I felt like watching the old movie A.I., one more point to add to the next C++ vs Java debate. ;)
Reply all
Reply to author
Forward
0 new messages