Why does the G1 GC shrink number of regions?

976 views
Skip to first unread message

Alexandr Nikitin

unread,
Jun 2, 2017, 2:39:01 PM6/2/17
to mechanical-sympathy
Hi all,

I encountered a strange behavior of JVM G1 GC. After a couple of hours of normal work it shrinks the number of regions by 10 times. That causes a lot of copying and promotions, GC times jump dramatically. After a dozen of those unsuccessful collections it brings the number of regions back to the previous normal number. Please help me understand its motivation.


Here's the transition part only. As you can see the first collection went almost as usual  - 180ms (usually 40-50ms) Then G1 decides to reduce number of regions from 3762 to 310. GC took 1.80 secs after that.
2017-06-02T18:46:56.010+0200: 10178.264: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 7902068736 bytes, new threshold 15 (max 15)
- age   1:    3862296 bytes,    3862296 total
- age   2:     459784 bytes,    4322080 total
- age   3:     587472 bytes,    4909552 total
- age   4:     383864 bytes,    5293416 total
- age   5:     399064 bytes,    5692480 total
- age   6:     363896 bytes,    6056376 total
- age   7:     323632 bytes,    6380008 total
- age   8:     357584 bytes,    6737592 total
- age   9:     323312 bytes,    7060904 total
- age  10:     328672 bytes,    7389576 total
- age  11:     986184 bytes,    8375760 total
- age  12:     222176 bytes,    8597936 total
- age  13:     188536 bytes,    8786472 total
- age  14:     207232 bytes,    8993704 total
- age  15:     120168 bytes,    9113872 total
 10178.275: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 834913, predicted base time: 89.27 ms, remaining time: 110.73 ms, target pause time: 200.00 ms]
 10178.275: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 3762 regions, survivors: 1 regions, predicted young region time: 8.28 ms]
 10178.275: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 3762 regions, survivors: 1 regions, old: 0 regions, predicted pause time: 97.55 ms, target pause time: 200.00 ms]
2017-06-02T18:46:56.121+0200: 10178.375: [SoftReference, 0 refs, 0.0011474 secs]2017-06-02T18:46:56.122+0200: 10178.376: [WeakReference, 62 refs, 0.0005960 secs]2017-06-02T18:46:56.123+0200: 10178.377: [FinalReference, 536 refs, 0.0023264 secs]2017-06-02T18:46:56.125+0200: 10178.379: [PhantomReference, 0 refs, 0 refs, 0.0014977 secs]2017-06-02T18:46:56.127+0200: 10178.381: [JNI Weak Reference, 0.0000471 secs], 0.1337364 secs]
   [Parallel Time: 97.7 ms, GC Workers: 28]
      [GC Worker Start (ms): Min: 10178276.8, Avg: 10178277.6, Max: 10178278.2, Diff: 1.4]
      [Ext Root Scanning (ms): Min: 0.0, Avg: 0.8, Max: 6.4, Diff: 6.4, Sum: 23.4]
      [Update RS (ms): Min: 86.6, Avg: 91.4, Max: 91.7, Diff: 5.2, Sum: 2559.4]
         [Processed Buffers: Min: 104, Avg: 118.4, Max: 143, Diff: 39, Sum: 3315]
      [Scan RS (ms): Min: 0.7, Avg: 1.0, Max: 1.1, Diff: 0.3, Sum: 26.9]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.3, Diff: 0.3, Sum: 0.7]
      [Object Copy (ms): Min: 2.3, Avg: 2.5, Max: 3.0, Diff: 0.6, Sum: 71.2]
      [Termination (ms): Min: 0.0, Avg: 0.4, Max: 0.6, Diff: 0.6, Sum: 12.1]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 28]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.3, Max: 0.5, Diff: 0.5, Sum: 7.6]
      [GC Worker Total (ms): Min: 95.7, Avg: 96.5, Max: 97.3, Diff: 1.6, Sum: 2701.4]
      [GC Worker End (ms): Min: 10178373.9, Avg: 10178374.1, Max: 10178374.3, Diff: 0.4]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 9.0 ms]
   [Other: 26.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 6.1 ms]
      [Ref Enq: 0.2 ms]
      [Redirty Cards: 0.3 ms]
      [Humongous Register: 1.7 ms]
      [Humongous Reclaim: 0.7 ms]
      [Free CSet: 6.5 ms]
   [Eden: 117.6G(117.6G)->0.0B(117.5G) Survivors: 32.0M->96.0M Heap: 138.2G(196.0G)->20.6G(196.0G)]
 [Times: user=2.99 sys=0.02, real=0.13 secs] 
2017-06-02T18:46:56.144+0200: 10178.398: Total time for which application threads were stopped: 0.2457828 seconds, Stopping threads took: 0.1110667 seconds
2017-06-02T18:46:56.145+0200: 10178.399: Total time for which application threads were stopped: 0.0006996 seconds, Stopping threads took: 0.0001565 seconds
2017-06-02T18:46:56.145+0200: 10178.399: Total time for which application threads were stopped: 0.0003017 seconds, Stopping threads took: 0.0000817 seconds
2017-06-02T18:46:56.146+0200: 10178.400: Total time for which application threads were stopped: 0.0002389 seconds, Stopping threads took: 0.0000372 seconds
2017-06-02T18:46:56.146+0200: 10178.400: Total time for which application threads were stopped: 0.0002567 seconds, Stopping threads took: 0.0000476 seconds
2017-06-02T18:46:59.820+0200: 10182.074: Total time for which application threads were stopped: 0.0006631 seconds, Stopping threads took: 0.0001533 seconds
2017-06-02T18:46:59.820+0200: 10182.074: Total time for which application threads were stopped: 0.0002674 seconds, Stopping threads took: 0.0000527 seconds
2017-06-02T18:47:06.354+0200: 10188.608: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 671088640 bytes, new threshold 15 (max 15)
- age   1:   49132824 bytes,   49132824 total
- age   2:    1745176 bytes,   50878000 total
- age   3:     417632 bytes,   51295632 total
- age   4:     411672 bytes,   51707304 total
- age   5:     371712 bytes,   52079016 total
- age   6:     377456 bytes,   52456472 total
- age   7:     344840 bytes,   52801312 total
- age   8:     317216 bytes,   53118528 total
- age   9:     331216 bytes,   53449744 total
- age  10:     302656 bytes,   53752400 total
- age  11:     282424 bytes,   54034824 total
- age  12:     733032 bytes,   54767856 total
- age  13:     221808 bytes,   54989664 total
- age  14:     170104 bytes,   55159768 total
- age  15:     179872 bytes,   55339640 total
 10188.608: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 898670, predicted base time: 105.76 ms, remaining time: 94.24 ms, target pause time: 200.00 ms]
 10188.608: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 310 regions, survivors: 3 regions, predicted young region time: 2434.15 ms]
 10188.608: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 310 regions, survivors: 3 regions, old: 0 regions, predicted pause time: 2539.91 ms, target pause time: 200.00 ms]
2017-06-02T18:47:08.146+0200: 10190.400: [SoftReference, 0 refs, 0.0010616 secs]2017-06-02T18:47:08.147+0200: 10190.401: [WeakReference, 24 refs, 0.0005880 secs]2017-06-02T18:47:08.148+0200: 10190.402: [FinalReference, 49 refs, 0.0006509 secs]2017-06-02T18:47:08.149+0200: 10190.402: [PhantomReference, 0 refs, 1 refs, 0.0010624 secs]2017-06-02T18:47:08.150+0200: 10190.404: [JNI Weak Reference, 0.0000308 secs], 1.8028711 secs]
   [Parallel Time: 1790.9 ms, GC Workers: 28]
      [GC Worker Start (ms): Min: 10188608.9, Avg: 10188609.3, Max: 10188609.6, Diff: 0.7]
      [Ext Root Scanning (ms): Min: 0.3, Avg: 0.7, Max: 2.6, Diff: 2.3, Sum: 19.7]
      [Update RS (ms): Min: 127.3, Avg: 128.9, Max: 129.7, Diff: 2.4, Sum: 3608.7]
         [Processed Buffers: Min: 89, Avg: 127.7, Max: 180, Diff: 91, Sum: 3576]
      [Scan RS (ms): Min: 582.3, Avg: 582.9, Max: 583.1, Diff: 0.9, Sum: 16322.1]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.4, Diff: 0.4, Sum: 0.5]
      [Object Copy (ms): Min: 1073.9, Avg: 1074.5, Max: 1077.6, Diff: 3.7, Sum: 30085.2]
      [Termination (ms): Min: 0.0, Avg: 3.2, Max: 3.7, Diff: 3.7, Sum: 90.0]
         [Termination Attempts: Min: 1, Avg: 780.6, Max: 959, Diff: 958, Sum: 21856]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.1, Sum: 2.6]
      [GC Worker Total (ms): Min: 1789.9, Avg: 1790.3, Max: 1790.7, Diff: 0.7, Sum: 50128.8]
      [GC Worker End (ms): Min: 10190399.5, Avg: 10190399.6, Max: 10190399.7, Diff: 0.2]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.7 ms]
   [Other: 11.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 3.7 ms]
      [Ref Enq: 0.2 ms]
      [Redirty Cards: 4.2 ms]
      [Humongous Register: 0.4 ms]
      [Humongous Reclaim: 0.1 ms]
      [Free CSet: 0.8 ms]
   [Eden: 9920.0M(9920.0M)->0.0B(8736.0M) Survivors: 96.0M->1280.0M Heap: 33.2G(196.0G)->24.9G(196.0G)]
 [Times: user=35.11 sys=3.42, real=1.80 secs] 


Thank you!

Kirk Pepperdine

unread,
Jun 2, 2017, 3:51:35 PM6/2/17
to mechanica...@googlegroups.com
Hi,

This is normal behavior as metrics are built up to come to a size decision that will balance overheads with pause time. It’s based on the premise that smaller space will be quicker to collect… and the weak generational hypothesis suggests that this is a false premise. Another factor is allocation rates. The number of regions allocated to young will rise and fall as allocation rates rise and fall. This can lead to some interesting pathologies.

Kind regards,
Kirk

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gil Tene

unread,
Jun 3, 2017, 6:31:56 PM6/3/17
to mechanica...@googlegroups.com
I think that the cause and effect are backwards here... The application caused this behavior directly, and what it does is "normal" and "typical". Given the Stop-the-world nature of G1's newgen collector, you actually "got off easy", and saw a pause of "only" 1.8 seconds...

The significant thing about the first collection with the lower region count [the 2nd collection in your post] is the fact that a large amount of recently allocated objects are alive at the time of collection. Specifically, ~1421MB [9920MB - (33.2GB - 24.9GB)] out of the 9920MB allocated in the ~10 seconds preceding the 1.8 second pause stayed alive, of which 1280MB was relocated to survivors, and the rest (~140MB) was presumably promoted. This is dramatically different behavior compared to the previous collection cycles (where only ~64MB out of ~117GB were alive at the time of collection). You can see in this application behavior continue through the next ~130 seconds (looking at the log file link): The next cycle after the "different": behavior started covered an allocation period of ~30 seconds, during which 8760MB were newly allocated, and of which ~1158MB stayed alive [8760MB - (33.4GB - 26.0GB)]. The next cycle after that covered an allocation period of ~100 seconds, during which 8736MB were newly allocated, and of which ~646MB stayed alive [8736MB - (34.5GB -26.6GB)].

We are looking at a period of time in the application execution that spans ~140seconds, during which a total of ~27416MB were newly allocated, and of which ~3225MB stuck around long enough to be copied around and promoted. This ~3.15GB of new [non ephemerally dying] state is very real, it's just not "typical" of other times in the log for the same application.

This sexplanation for this is usually quite simple: it suggests a phase-change in application behavior, with a large amount of new state that is being kept for a while. This is not "wrong" in any way. In fact, it it extremely common and typical to see such phase change behaviors across many application types.  E.g. a refresh of a cache, a reloading of a product catalog, a reindexing, a loading of data for an analytics computation, a rejoining of a cluster, the reaction to SS table compaction, or nodes coming and going in Cassandra, Elastic, Zookeeper, etc. etc. etc.

The earlier GC trigger with the smaller region count may very well be a reaction to a dramatic change in allocation rate, rather than the cause of the large pause. It actually "got lucky" here, In cases where such a phase change occurs but the allocation rate change is not "violent enough" to heuristically trigger an early newgen collection, the newgen pause on the phase change would have been much bigger [this is a common problem with large STW newgens, since some phase changes only effect data/state but may not change allocation rate dramatically].  In fact, the allocation rate going into the 2nd and 3rd collections in this ~140 second period is much more relaxed than the one leading to the first collection, and may not have been high enough to trigger more aggressive collection oil their own. If the collector waited for the whole 140 second before triggering the collection in this case, there would be  pause may have taken >2x longer because it would have had to deal with moving 3.15GB around in a single pause rather than "just" 1.4GB. 

At this point I often get questions form people that go something like this: "Isn't G1 supposed to break up these large pauses into smaller "increment" pauses?". While G1 does incremental STW compaction in *oldgen* (spreading the pause across increments), it's newgen collector is effectively monolithic: the whole of newgen must be collected in a single STW pause. Newgens are shorter primarily (only?) because they do not have to trace all live objects in the heap, and focus on tracing only the newgen objects and all roots to them. But this also means that newgens have no way to determine the amount of live matter that will be found in the newgen until after a monolithic cycle has completed, which fundamentally means that the set of pages in the newgen must be selected based on a "guess", and without actual knowledge. If the "guess" is right (e.g. "they are mostly dead" and "last time only X MB stayed alive"), you get a very efficient newgen with a relatively small pause. But if the "guess" was wrong (1421MB is found alive alive in 9.9GB of allocation, even tho last time only 64MB was found alive in 117GB of allocation, as was the case here), it's too late. You're in for a big pause.

There is a problem/dillema that you see here, which is inherent in all monolithic stop-the-world relocating newgen collectors. For all such collectors [at least currently], the only way to reduce the newgen pause in the presence of a high amount of newly allocated surviving objects is to significantly reduce the newgen size (e.g. trigger at ~1GB of allocation rather than after 10GB, in the hope that only 1/10th of the stuff would need to be relocated in a single pause). Unfortunately the collector has no idea how much newgen material will survive before it starts the collection, and its guesses based on prior collections are both inherently and dramatically wrong at a phase change. It is stuck with a choice between running very inefficient collections (keep newgen at a few 100 MB, even tho you have e.g. a 196GB heap, just in case a large amount of relatively-long-lasting state is allocated during a phase change) and accepting a huge pause on a phase change. The only known way out of this dilemma is to change how the collector deals with newgen: either break the newgen into several (more than 1 or 2) incremental-STW sub-parts (something no JVM has done to date, AFAIK), or use a concurrent relocating newgen (which by now is a well-proven technique).

Note that even during this different phase of operation the weak generational hypothesis is holding strong (85%+ of newly allocated objects have died. "Most objects die young" is still true). It's just that outside of such phase shifts, most applications demonstrate the weak generational hypothesis in a much more dramatic way (with 99%+ death rates, and with the rate growing with larger and larger newgens). So the newgen efficiency is still there (compared to dealing with the same workload with a non-generational or oldgen collection), but now it comes with huge newgen pauses...

To summarize your situation: Basically, the large amount of copying (either to survivors within newgen or via promotion to oldgen) is a result of program behavior. The collector is just reacting to that behavior. And unfortunately for all monolithic STW relocating newgen collectors, there isn't much you can do about that. Since several newly-allocated-but-surviving GBs need to be relocated in a relatively short period of time (10s of seconds), you are going to see pause times that amount to ~1-2sec per relocated GB over that period of time. This means that you will see significant pause artifacts in your run no matter what (as long as you are using a STW newgen).

And the actual bottom line is very simple: You need a concurrent newgen to handle this workload, with these phase changes, without any of those "hiccups". We can discuss that in other posts if you want ;-).

Kirk Pepperdine

unread,
Jun 4, 2017, 5:01:52 AM6/4/17
to mechanica...@googlegroups.com

On Jun 4, 2017, at 12:31 AM, Gil Tene <g...@azul.com> wrote:

I think that the cause and effect are backwards here... The application caused this behavior directly, and what it does is "normal" and "typical". Given the Stop-the-world nature of G1's newgen collector, you actually "got off easy", and saw a pause of "only" 1.8 seconds…


But of course there is a feedback loop in that GC behavior *does*  application behavior. The overall behavior is emergent which is what making tuning GC so difficult. In this instance I don’t think you’ve done too badly. The pause time picture maps perfectly the apparent phase shift that Gil has described here. Iv’e attached some screenshots to illustrate.

During times of trouble object copy and RSet refinement costs dominate as one would hope for.

Two odd things to note here. I can’t find a reason *why* Eden is maintained to the size it is. It should push towards 5% of total heap but in this case it’s not and there isn’t anything set that I can see that would prevent it from shrinking. You have neither a pause time goal set that may affect this nor do you have min eden set. The allocation rates are high but not exceptionally high in that they are only 3x what I’d consider acceptable. Second, during times when the pause times bloat, the kernel times are also abnormally high. I see that you’re using 131 so if maybe recent changes that have affect sizing ergonomics… not sure.

Regards,
Kirk






The significant thing about the first collection with the lower region count [the 2nd collection in your post] is the fact that a large amount of recently allocated objects are alive at the time of collection. Specifically, ~1421MB [9920MB - (33.2GB - 24.9GB)] out of the 9920MB allocated in the ~10 seconds preceding the 1.8 second pause, of which 1280MB was relocated to survivors, and the rest (~140MB) was presumably promoted. This is dramatically different behavior compared to the previous collection cycles (where only ~64MB out of ~117GB were alive at the time of collection). You can see in this application behavior continue through the next ~130 seconds (looking at the log file link): The next cycle after the "different": behavior started covered an allocation period of ~30 seconds, during which 8760MB were newly allocated, and of which ~1158MB stayed alive [8760MB - (33.4GB - 26.0GB)]. The next cycle after that covered an allocation period of ~100 seconds, during which 8736MB were newly allocated, and of which ~646MB stayed alive [8736MB - (34.5GB -26.6GB)].

We are looking at a period of time in the application execution that spans ~140seconds, during which a total of ~27416MB were newly allocated, and of which ~3225MB stuck around long enough to be copied around and promoted. This ~3.15GB of new [non ephemerally dying] state is very real, it's just not "typical" of other times in the log for the same application.

This sexplanation for this is usually quite simple: it suggests a phase-change in application behavior, with a large amount of new state that is being kept for a while. This is not "wrong" in any way. In fact, it it extremely common and typical to see such phase change behaviors across many application types.  E.g. a refresh of a cache, a reloading of a product catalog, a reindexing, a loading of data for an analytics computation, a rejoining of a cluster, the reaction to SS table compaction, or nodes coming and going in Cassandra, Elastic, Zookeeper, etc. etc. etc.

The earlier GC trigger with the smaller region count may very well be a reaction to a dramatic change in allocation rate, rather than the cause of the large pause. It actually "got lucky" here, In cases where such a phase change occurs but the allocation rate change is not "violent enough" to heuristically trigger an early newgen collection, the newgen pause on the phase change would have been much bigger [this is a common problem with large STW newgens, since some phase changes only effect data/state but may not change allocation rate dramatically].  In fact, the allocation rate going into the 2nd and 3rd collections in this ~140 second period is much more relaxes than the one leading to the first collections, and may not have been high enough to trigger more aggressive collection oil their own. If the collector waited for the whole 140 second before triggering the collection in this case, there would be  pause may have taken >2x longer because it would have had to deal with moving 3.15GB around in a single pause rather than "just" 1.4GB. 

At this point I often get questions form people that go something like this: "Isn't G1 supposed to break up these large pauses into smaller "increment" pauses?". While G1 does incremental STW compaction in *oldgen* (spreading the pause across increments), it's newgen collector is effectively monolithic: the whole of newgen must be collected in a single STW pause. Newgens are shorter primarily (only?) because they do not have to trace all live objects in the heap, and focus on tracing only the newgen objects and all roots to them. But this also means that newgens have no way to determine the amount of live matter that will be found in the newgen until after a monolithic cycle has completed, and to complete it, which fundamentally means that the set of pages in the newgen must be selected without that knowledge. If the "guess" is right (e.g. "they are mostly dead" and "last time only X MB stayed alive"), you get a very efficient newgen with a relatively small pause. But if the "guess" was wrong (1421MB is found alive alive in 9.9GB of allocation, even tho last time only 64MB was found alive in 117GB of allocation, as was the case here), it's too late. You're in amor a big pause.

There problem/dillema that you see here, which is inherent in all monolithic stop-the-world relocating newgen collectors. For all such collectors [at least currently], the only way to reduce the newgen pause in the presence of a high amount of newly allocated surviving objects is to significantly reduce the newgen size (e.g. trigger at ~1GB of allocation rather than after 10GB, in the hope that only 1/10th of the stuff would need to be relocated in a single pause). Unfortunately the collector has no idea how much newgen material will survive before it starts the collection, and it's guesses based on prior collections are dramatically and inherently wrong at a phase change. It is stuck with a choice between running very inefficient collections (keep newgen at a few 100 MB, even tho you have a 196GB heap, just in case a large amount of relatively-long-lasting state is allocated during a phase change) and accepting a huge pause on a phase change. The only ways out of this dilemma is to change how the collector deals with newgen: either break the newgen into several (more than 1 or 2) incremental-STW sub-parts (something no JVM has done to date, AFAIK), or use a concurrent relocating newgen (which by now is a well-proven technique).

Note that even during this different phase of operation the weak generational hypothesis is holding (85%+ of newly allocated objects have died). It's just that outside of such phase shifts, most applications demonstrate the weak generational hypothesis in a much more dramatic way (with 99%+ death rates, and with the rate growing with larger and larger newgens). So the newgen efficiency is still there (compared to dealign with the same workload with a non-generational or oldgen collection), but now it comes with huge newgen pauses...

To summarize your situation: Basically, the large amount of copying (either to survivors within newgen or via promotion to oldgen) is a result of program behavior. The collector is just reacting to that behavior. And unfortunately for all monolithic STW relocating newgen collectors, there isn't much you can do about that. Since several newly-allocated-but-surviving GBs need to be relocated in a relatively short period of time (10s of seconds), you are going to see pause times that amount to ~1-2sec per relocated GB over that period of time. This means that you will see significant pause artifacts in your run no matter what (as long as you are using a STW newgen).

The actual bottom line is very simple: You need a concurrent newgen to handle this workload, with these phase changes, without any of those "hiccups". We can discuss that in other posts if you want ;-).

Alexandr Nikitin

unread,
Jun 5, 2017, 9:58:40 AM6/5/17
to mechanical-sympathy
Gil, Kirk, Thank you for the great and thorough explanation! It is an awesome read! Almost everything fell into place.

Indeed, Gil is right about the application pattern change. It's a high-load API. There's a background thread that changes an internal state. The state is ~3.5G and requires some similar amount of memory to build itself. It happens in one background thread with lower priority. The change doesn't happen often, once per hour or so.

The G1 is new for me and I couldn't intuitively understand why it reacts to the allocation pattern change like that. There's the main allocation/ collection pattern that lasts for hours. And it needs just one background low priority thread (which allocates less memory than worker threads) to change it :) That region number change leads to more copying and promotions and longer pauses as a consequence (it changes max age threshold to 1 and happens more often)

And the actual bottom line is very simple: You need a concurrent newgen to handle this workload, with these phase changes, without any of those "hiccups". We can discuss that in other posts if you want ;-).

Do you mean Azul Zing by that?

Kirk Pepperdine

unread,
Jun 5, 2017, 2:47:56 PM6/5/17
to mechanica...@googlegroups.com
On Jun 5, 2017, at 3:58 PM, Alexandr Nikitin <nikitin.a...@gmail.com> wrote:

Gil, Kirk, Thank you for the great and thorough explanation! It is an awesome read! Almost everything fell into place.

Indeed, Gil is right about the application pattern change. It's a high-load API. There's a background thread that changes an internal state. The state is ~3.5G and requires some similar amount of memory to build itself. It happens in one background thread with lower priority. The change doesn't happen often, once per hour or so.

The G1 is new for me and I couldn't intuitively understand why it reacts to the allocation pattern change like that.

All collectors in the JVM will react in a similar way. I’ve just spent 2 hours with a client explaining to them that in their case GC wasn’t at fault. It was simply shouting very loudly that something bad was happening in the application that needed to be investigated.

There's the main allocation/ collection pattern that lasts for hours. And it needs just one background low priority thread (which allocates less memory than worker threads) to change it :) That region number change leads to more copying and promotions and longer pauses as a consequence (it changes max age threshold to 1 and happens more often)

And the actual bottom line is very simple: You need a concurrent newgen to handle this workload, with these phase changes, without any of those "hiccups". We can discuss that in other posts if you want ;-).

Do you mean Azul Zing by that?

Of course he does :-)

And he’s not wrong.

— Kirk

PS, Gil, I’ll collect my commission on this lead once you close it. ;-)

Gil Tene

unread,
Jun 5, 2017, 2:53:37 PM6/5/17
to mechanica...@googlegroups.com
Yup. 

Zing's newgen would happily copy and promote those 3.5GB concurrently, and wouldn't break a sweat doing it. It will take a couple of seconds, but not stop-the-world seconds. Since all the collection work is done in the background and is concurrent, you won't feel it. And since Zing doesn't need to worry about a "bigger pause" happening if it lets newgen grow, it doesn't have to hurry up and collect too early, so collections would still be ~110GB of allocation apart given your heap size and apparent live set. At your allocation rate, that looks like newgens can stay several hundreds of seconds apart, keeping them very efficient. Newgen will end up copying those 3.5GB once (to promote), or maybe twice (if the collection happened less than 2 seconds after the new 3.5GB of state was allocated). But not several times. Zing doesn't need to keep things in newgen for multiple cycles to age them. We use time instead.

In fact, you could probably run this workload (with this behavior) on Zing with about about half or 2/3 the heap you are using for HotSpot, and still not break a sweat.

We'll probably also end up running an oldgen every 10 minutes or so as well. Just for fun. Not because of any pressure, but simply because the collector gets bored after a while of not doing oldgens when other stuff gets to have fun. Since oldgen is concurrent too (including compaction), it's ok to do it lazily to make sure we don't delay some strange work to later. We like to make sure that unless you are idle, you've seen the full behavior of the system if you've been running for an hour or so. Our rule is: if an newgen happens, and no oldgen has run in the past ten minutes, do one.

Gil Tene

unread,
Jun 5, 2017, 3:02:59 PM6/5/17
to mechanical-sympathy


On Monday, June 5, 2017 at 9:47:56 PM UTC+3, Kirk Pepperdine wrote:

On Jun 5, 2017, at 3:58 PM, Alexandr Nikitin <nikitin.a...@gmail.com> wrote:

Gil, Kirk, Thank you for the great and thorough explanation! It is an awesome read! Almost everything fell into place.

Indeed, Gil is right about the application pattern change. It's a high-load API. There's a background thread that changes an internal state. The state is ~3.5G and requires some similar amount of memory to build itself. It happens in one background thread with lower priority. The change doesn't happen often, once per hour or so.

The G1 is new for me and I couldn't intuitively understand why it reacts to the allocation pattern change like that.

All collectors in the JVM will react in a similar way. I’ve just spent 2 hours with a client explaining to them that in their case GC wasn’t at fault. It was simply shouting very loudly that something bad was happening in the application that needed to be investigated.

All collectors in the *HotSpot* JVM (Oracle or OpenJDK) will react in a similar way. All collectors in the Zing JVM (we only have one) won't. It takes a lot more than a few GB of promotion and a few GB/sec of allocation rate to make the C4 collector lose it's temper and complain that the programmer is doing it wrong ;-).

Kirk Pepperdine

unread,
Jun 5, 2017, 3:32:39 PM6/5/17
to mechanica...@googlegroups.com
On Jun 5, 2017, at 9:02 PM, Gil Tene <g...@azul.com> wrote:



On Monday, June 5, 2017 at 9:47:56 PM UTC+3, Kirk Pepperdine wrote:

On Jun 5, 2017, at 3:58 PM, Alexandr Nikitin <nikitin.a...@gmail.com> wrote:

Gil, Kirk, Thank you for the great and thorough explanation! It is an awesome read! Almost everything fell into place.

Indeed, Gil is right about the application pattern change. It's a high-load API. There's a background thread that changes an internal state. The state is ~3.5G and requires some similar amount of memory to build itself. It happens in one background thread with lower priority. The change doesn't happen often, once per hour or so.

The G1 is new for me and I couldn't intuitively understand why it reacts to the allocation pattern change like that.

All collectors in the JVM will react in a similar way. I’ve just spent 2 hours with a client explaining to them that in their case GC wasn’t at fault. It was simply shouting very loudly that something bad was happening in the application that needed to be investigated.

All collectors in the *HotSpot* JVM (Oracle or OpenJDK) will react in a similar way. All collectors in the Zing JVM (we only have one) won't. It takes a lot more than a few GB of promotion and a few GB/sec of allocation rate to make the C4 collector lose it's temper and complain that the programmer is doing it wrong ;-).

Not an allocation issue.. more of workload backup.. not sure even C4 would react to this very well ;-)

Regards,
Kirk


 

There's the main allocation/ collection pattern that lasts for hours. And it needs just one background low priority thread (which allocates less memory than worker threads) to change it :) That region number change leads to more copying and promotions and longer pauses as a consequence (it changes max age threshold to 1 and happens more often)

And the actual bottom line is very simple: You need a concurrent newgen to handle this workload, with these phase changes, without any of those "hiccups". We can discuss that in other posts if you want ;-).

Do you mean Azul Zing by that?

Of course he does :-)

And he’s not wrong.

— Kirk

PS, Gil, I’ll collect my commission on this lead once you close it. ;-)

Reply all
Reply to author
Forward
0 new messages