HeapDump on OOM shows Garbage

Sunitha Beeram

unread,

Jun 3, 2018, 3:18:38 AM6/3/18

to mechanical-sympathy

Hi,

We have java server process running with a 20G heap managed via G1 (default settings, except for enabling parallel-ref-proc). JDK is 1.8.0.5 (quite a bit old given the number of fixes for G1 that have gone into later versions).

We are seeing frequent OOMs ("java.lang.OutOfMemoryError: java heap space" errors), and we have enabled heap-dumps on OOM. When analyzing the gc logs, the GC log just before the OOM indicated that ~16.7G was used up and based on the stack trace and the app state, it appears that there was an attempt to allocate a very small hash-map. The heapdump analysis shows about 9+G live objects and about 7.5+G garbage.

- Shouldn't a GC cycle have been triggered if the allocation request could not be satisfied. In this case heapdump shouldn't have shown garbage, right?

- It *appears* that even otherwise there was some memory still available - so the heap space OOM is a bit confusing. Are there any other indicators I should look for to debug this further?

Thanks!

Sunitha

Ben Evans

unread,

Jun 3, 2018, 8:51:41 AM6/3/18

to mechanica...@googlegroups.com

Upgrade your JDK.

There is literally no point in trying to debug this - G1 on any JDK
prior to 8u40 should not be considered fit for production use.

Upgrade your JDK, and if the problem persists, download a trial copy
of Censum (https://www.jclarity.com/) and see what it can show you in
the GC logs.

Thanks,

Ben

> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-symp...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jason Tedor

unread,

Jun 3, 2018, 9:13:44 AM6/3/18

to mechanica...@googlegroups.com

These look like the symptoms of the problems that G1 has with what are
termed humongous allocations. Briefly: G1 divides the heap into
equal-sized regions, and any allocation that is larger than half of
the size of a region will end up consuming an integer number of full
regions (i.e., if an allocation is for bytes equal to f regions and f
> 0.5 then the allocation will actually consume ceiling(f) regions).
You can think of this as a form of fragmentation. You can verify this
is the case by investigating the GC logs where humongous allocations
are logged (with-XX:+PrintAdaptiveSizePolicy or for more detail add
-XX:+UnlockExperimentalVMOptions
-XX:+G1ReclaimDeadHumongousObjectsAtYoungGC -XX:G1LogLevel=finest
-XX:+G1TraceReclaimDeadHumongousObjectsAtYoungGC). One way to address
this if you verify this is indeed the case is to increase the region
size so that the humongous allocations are no longer humongous.

Sunitha Beeram

unread,

Jun 3, 2018, 9:28:02 AM6/3/18

to mechanical-sympathy

Thanks Ben and Jason for the responses. We are looking to upgrade the JDK, and specifically for what Jason noted: humongous allocation issues. Also, our allocations are slightly over 32MB in quite a few cases, so region size tuning won't help - we are planning to chase that on the app side once we upgrade the JDK.

I just wanted to make sure my understanding/expectation is right: Bugs aside, before the JVM decides its really OOM, it should have gone through a Full GC and decided based on the allocation size that there is no way the allocation can be fulfilled. And hence in such cases, the heapdump should not have had garbage in it. Is that correct?

> To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

Kirk Pepperdine

unread,

Jun 3, 2018, 1:40:40 PM6/3/18

to mechanica...@googlegroups.com

Hi,

I’ve noted this behavior in a number of applications. You get a quick build up of occupancy in tenured that cannot be cleaned up by mixed collections which eventually leads to either an OOME or a Full collection. The full collection is a precise collection and thus the occupancy after that collection tends to be slightly less than the observed live data set size. The work-around for this is to increase the size of heap. I’ve tried tons of different experiments and increasing heap size is the only one that has helped mitigate the problem. IOWs, you still get it but you’ll survive it.

Ultimately this is a bug (Oracle claims it isn’t) that I’ve been trying to get to the bottom of for some time. I believe it has to do with reference processing and/or array processing. I believe that regions are simply for some reason not being included in the CSet when the should be. All I do know is that I see it in many applications running in many different environments.... For me this isn’t a top priority issue at the moment but I am interested in solving it some time soon.

Kind regards,

Kirk

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Sunitha Beeram

unread,

Jun 4, 2018, 9:00:34 AM6/4/18

to mechanica...@googlegroups.com

Thanks Kirk.

It looks like the older jdk is making matters worse, but based on your observations, it appears that we could expect some unexpected behavior even after an upgrade. Will chase this further after the upgrade.

> To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscribe...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward