vert.x OOM Error

337 views
Skip to first unread message

Ron Turner

unread,
Sep 4, 2017, 11:08:25 PM9/4/17
to vert.x
I'd like to ask the community for assistance in solving an Out Of Memory error that our vert.x-based application suffers on a periodic basis.  We believe the error is netty related as analysis of .hprof dumps created when the application suffers an out of memory condition reveal the following:

One instance of "java.util.PriorityQueue" loaded by "<system class loader>"occupies 1,175,276,432 (56.65%) bytes. The instance is referenced by io.netty.util.concurrent.ScheduledFutureTask @ 0x80587790 , loaded by "sun.misc.Launcher$AppClassLoader @ 0xaaad67b8". The memory is accumulated in one instance of "java.lang.Object[]" loaded by "<system class loader>".
 

The application uses relatively few vert.x features, essentially just handling GET and POST requests from web clients and shuttling data back and forth from a MongoDB database configured with a 3 member replica set (primary, secondary and arbiter servers).  The following is a typical heap usage summary (from jmap) after failure:

Debugger attached successfully.

Server compiler detected.

JVM version is 25.144-b01


using thread-local object allocation.

Mark Sweep Compact GC


Heap Configuration:

   MinHeapFreeRatio         = 40

   MaxHeapFreeRatio         = 70

   MaxHeapSize              = 2147483648 (2048.0MB)

   NewSize                  = 715784192 (682.625MB)

   MaxNewSize               = 715784192 (682.625MB)

   OldSize                  = 1431699456 (1365.375MB)

   NewRatio                 = 2

   SurvivorRatio            = 8

   MetaspaceSize            = 21807104 (20.796875MB)

   CompressedClassSpaceSize = 1073741824 (1024.0MB)

   MaxMetaspaceSize         = 17592186044415 MB

   G1HeapRegionSize         = 0 (0.0MB)


Heap Usage:

New Generation (Eden + 1 Survivor Space):

   capacity = 644218880 (614.375MB)

   used     = 644213392 (614.3697662353516MB)

   free     = 5488 (0.0052337646484375MB)

   99.99914811562182% used

Eden Space:

   capacity = 572653568 (546.125MB)

   used     = 572653560 (546.1249923706055MB)

   free     = 8 (7.62939453125E-6MB)

   99.99999860299482% used

From Space:

   capacity = 71565312 (68.25MB)

   used     = 71559832 (68.2447738647461MB)

   free     = 5480 (0.00522613525390625MB)

   99.99234265896864% used

To Space:

   capacity = 71565312 (68.25MB)

   used     = 0 (0.0MB)

   free     = 71565312 (68.25MB)

   0.0% used

tenured generation:

   capacity = 1431699456 (1365.375MB)

   used     = 1431699456 (1365.375MB)

   free     = 0 (0.0MB)

   100.0% used


9914 interned Strings occupying 946920 bytes.

As well, and I don't know if these are related or only coincidental, but the application log will always show entries like the following:


[2017-09-04T14:34:02.553+0000] Exception in monitor thread during notification of server description state change

java.lang.OutOfMemoryError: Java heap space



For historical reasons, the application uses less than the latest version of quite a few components, e.g.:


mongo-java-driver:   3.4.2

vertx-core:          2.1.4      

vertx-platform:      2.1.4

vertx-hazelcast:     2.1.4

hazelcast:           3.2.3 

netty-all:           4.0.21 

jackson-core:        2.2.2  

jackson-annotations: 2.2.2

jackson-databind:    2.2.2             

commons-codec:       1.9

  


Any insight or suggestions regarding this issue would be most welcome!


Ron



Thomas SEGISMONT

unread,
Sep 5, 2017, 3:54:30 AM9/5/17
to ve...@googlegroups.com
Can you inspect a heap dump with Eclipse MAT or a similar tool? Perhaps you can get more information about the queue and the elements it contains.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/3d076f52-0b34-495f-9cfd-b10dfc246381%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martijn Verburg

unread,
Sep 5, 2017, 11:13:09 AM9/5/17
to ve...@googlegroups.com
And send in your GC log :-)

Cheers,
Martijn

Ron Turner

unread,
Sep 5, 2017, 2:49:32 PM9/5/17
to vert.x
Thanks for the suggestions!  I've enabled GC logging and will post here after the next OOM event.  As well, I'll examine one of our heap dumps to see if I can discover anything more about the nature of this problem.

Ron

Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Ron Turner

unread,
Sep 6, 2017, 12:50:13 PM9/6/17
to vert.x
While I wait for another OOM event, I've enabled GC logging on the server where we're having issues - I'll post that log after the next event.

Meanwhile, I took a look at a heap dump using the Eclipse MAT.  The most ominous thing in there was the 10,030,333 instances of io.netty.util.concurrent.ScheduledFutureTask, each of which was ultimately related to an instance of io.netty.handler.ssl.SslHandler.  Our web clients do communicate over SSL, so that makes sense.  As well, OOM events appear to be related to the number of calls handled by the server, so that makes sense too.  So I'm left wondering what ScheduledFutureTask is being created by each instance of SslHandler and why is it never executed?  As well, why are so many SslHandler instances created?  I don't have the application log to corroborate what I'm seeing in the heap dump but I'd find it hard to believe that there were over 10 million calls handled by the server from the time it was started to the time it suffered the OOM event (unless there was some sort of nefarious attack being mounted against it...).  How do these SslHandler instances relate to the traffic received by the server?  I assume that there's one instance of SslHandler created for each call handled by the server, but I don't know that for a fact.  The digging continues...

Ron

Julien Viet

unread,
Sep 6, 2017, 2:29:08 PM9/6/17
to ve...@googlegroups.com
Hi,

normally there is one SslHandler per Netty Channel.

Each SslHandler is a Netty handler and is therefore attached to a Netty pipeline

Each Channel has other handlers in the pipeline, usually the final handler is Vert.x handler.

So in short:

SslHandler -[ctx]-> ChannelHandlerContext -[pipeline]-> DefaultChannelPipeline -[tail]-> last AbstractChannelHandlerContext which can a Vert.x handler

for instance io.vertx.core.http.impl.ServerHandler, etc…

I hope this can be useful.

Julien


Ron Turner

unread,
Sep 6, 2017, 2:44:19 PM9/6/17
to vert.x
Thanks for your explanation, Julien - definitely helps fill in the picture of what's happening :))

Ron

Ron Turner

unread,
Sep 7, 2017, 1:01:27 PM9/7/17
to vert.x
Well, I thought I'd made some progress on this yesterday but, as of this morning, our server had once again suffered an out of memory error.  I've attached 2 GC logs: 
  • gc.log0 - captured before I made a minor change in our code
  • gc.log1 - captured after I made the change
gc.log1 starts out quite differently and, as far as I can tell, was indicating that the situation had definitely improved.  However, the situation eventually changes and we see the same kind of progress to the OOM state that we see (almost immediately) in gc.log0.

The change that I made was based on a comment I'd seen from Tim Fox regarding memory leaks caused by closing over a variable marked as 'final'.  We didn't have exactly that situation but we were specifying a handler whose handle() method was taking an instance of HttpServerRequest that was marked as final, e.g.

myServer.requestHandler(new Handler<HttpServerRequest>() {
   
public void handle(final HttpServerRequest req) {
     
... request handler code ...
   
}
});


None of the vert.x examples I found did this, so I thought it would be worth a try to remove the 'final' designation...

The digging continues...

Ron
gc.log0
gc.log1

Martijn Verburg

unread,
Sep 8, 2017, 4:28:50 AM9/8/17
to ve...@googlegroups.com
Hi Ron,

So sorry I should have mentioned this earlier, but are you able to produce GC logs with:

-XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintReferenceGC -Xloggc:<location of log>

That might give me a little more insight.


Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.

Ron Turner

unread,
Sep 8, 2017, 12:30:24 PM9/8/17
to vert.x
Hi Martijn,

Thanks for the clarification - I will produce another GC log using those options :)

Ron

Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Jez P

unread,
Sep 8, 2017, 2:17:07 PM9/8/17
to vert.x
On the assumption that we shouldn't be creating a new SSLHandler for each connection, you could run locally in debug mode with a breakpoint in the instantiation of SSLHandler and see if that sheds any light on why they are proliferating. If you hit it every time you create a new request/connection to your application then that on its own might tell you something - at the very least it should point you through the stack to the code which triggers it (look to see if your application classes are in that stack, if so, that may imply your app is doing something unintended).

Ron Turner

unread,
Sep 8, 2017, 2:27:12 PM9/8/17
to vert.x
Hi Martijn,

I've attached a more informative GC log :)

Ron
gc.log2

Ron Turner

unread,
Sep 8, 2017, 2:30:18 PM9/8/17
to vert.x
Hi Jez,

That a good suggestion, thanks!  I'll set it up locally this afternoon and see what I can learn :)

Ron

Jez P

unread,
Sep 8, 2017, 4:02:54 PM9/8/17
to vert.x
I hope it proves informative (if it's vertx/netty all the way down the stack view it won't be, but if you see your code in there somewhere you may get at least a clue that something in your code needs a second look). I meant to learn netty a while back but never got round to it, so don't know if the need for an SSL handler is per connection or not (if it is, then something would appear not to be cleaning up, if it isn't, then that implies some setup might be being run far more times than it should be). If it proves uninformative, sorry for the bad suggestion, but that approach has helped me on occasion in the past. 

Jez P

unread,
Sep 8, 2017, 4:08:10 PM9/8/17
to vert.x
Looking at this https://maxrohde.com/2013/09/07/setting-up-ssl-with-netty/ (old link I know) it would appear that we wouldn't expect lots of SSLHandlers to be created.

Ron Turner

unread,
Sep 8, 2017, 5:20:01 PM9/8/17
to vert.x
Thanks, Jez!  Just finished a debugging session where I ran the server both with SSL and without.  In both cases, hitting a breakpoint in my handler code shows only vertx and netty code down the stack.  As well, as we typically run 4 instances of our verticle, I debugged with '-instances 4' but didn't notice any difference in the stack.

Jez P

unread,
Sep 8, 2017, 6:04:23 PM9/8/17
to vert.x
Was worth a try, but sorry it didn't get you anything. I hope the GC logs turn up something more fruitful. 

Ron Turner

unread,
Sep 9, 2017, 2:58:23 PM9/9/17
to vert.x
Agreed!  Definitely worth a try :)

Martijn Verburg

unread,
Sep 11, 2017, 4:20:16 AM9/11/17
to ve...@googlegroups.com
Hi Ron,

I'm afraid I don't have much more insight.  The Full GC's are picking up an enormous amount of Real CPU time (which indicates something outside of the JVM is eating up the CPU and the JVM can't get it to complete the Full GC as quickly as it would like).  However, that's all just really a knock on effect of your JVM throwing Full GC's constantly and then eventually dying.

Have you tried to record the Allocation Stack Trace when performing the memory profiling?  Usually that will give you the call stack back to 'your' code on who is creating those millions of Future Tasks.

Given your JVM will be thrashing badly as you try to memeory profile you may wish to increase -Xmx to about 4GB which *might* give you enough head room to catch the problem in flight as you memory profile it.


Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.

Ron Turner

unread,
Sep 11, 2017, 12:03:51 PM9/11/17
to vert.x
Thanks for the analysis and the suggestion, Martijn - I'll give the allocation stack trace approach a try :)  I'm guessing that the path to the creation of those FutureTasks is somewhat indirect as I don't see them anywhere in the stack when I set a breakpoint in our handler code.  Offhand, do you know under what conditions they're created?

Thanks!

Ron

Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Martijn Verburg

unread,
Sep 12, 2017, 4:39:42 AM9/12/17
to ve...@googlegroups.com
Hi Ron,

Hmm, it may be the case that the creation of those are being handed off to another thread which would make the allocation stack trace profiling difficult to say the least.  Still worth a go!   I'm not sure where the Schedule Futures get created (I'm not that knowledgable about the internals of Vertx).

Another suggestion.  Start up a toy Vertx program (maybe follow the getting started guide) and see if you can profile that (might be easier to spot the common path to the creation of those futures).

Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.

Ron Turner

unread,
Sep 12, 2017, 11:55:40 AM9/12/17
to vert.x
Thanks, Martijn - (another) good suggestion :))

Ron

Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Martijn Verburg

unread,
Sep 13, 2017, 4:12:25 AM9/13/17
to ve...@googlegroups.com
I'm actually meeting Clement from the Vert.x team in RedHat today - I'll see if we can do some extra digging.

Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.

Ron Turner

unread,
Sep 13, 2017, 11:48:01 AM9/13/17
to vert.x
That would be wonderful, Martijn - thanks!!!

Ron

Cheers,
Martijn

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.

Ron Turner

unread,
Sep 15, 2017, 12:29:24 PM9/15/17
to vert.x
Just an update...  It's been a busy week and I haven't had a chance to do any memory profiling while running the app, but...  I did have time to complete a monitor program I'd been working on and put it into service.  Among other things, this program sends a ping to the server periodically and, if it doesn't receive a response in a timely manner, kills the existing server instance and restarts it.  Well, since the monitor program went into service a couple of days ago, we haven't seen an OOM condition on the server. Makes me wonder if the more or less consistent activity on the server is some how preventing those ScheduledFutureTasks from accumulating?  Just a thought... <g>

Ron

Julien Viet

unread,
Sep 17, 2017, 4:30:11 AM9/17/17
to ve...@googlegroups.com
Hi,

have you been able to correlate this behavior with CPU and load on your system, or the network activity (number of connections) ?

Ron Turner

unread,
Sep 18, 2017, 12:13:49 PM9/18/17
to vert.x
Hi Julien,

The monitor was installed to ensure availability of the server application - we haven't had a chance to correlate the OOM problem with CPU or other measure of load, yet.  In reviewing the monitor log this morning, it looks like it did restart the server on Saturday so it's not definitively preventing accumulation of the ScheduledFutureTasks.  More analysis required :))

Ron  

Cheers,
Martijn


Cheers,
Martijn


Cheers,
Martijn

Ron

Cheers,
Martijn


Cheers,
Martijn


For more options, visit<span
Reply all
Reply to author
Forward
0 new messages