Resolving memory leak

940 views
Skip to first unread message

Arik Bron

unread,
Jun 14, 2017, 3:09:09 PM6/14/17
to vert.x
Hi,

We're fully dedicated to Vert.x and run its latest 3.4.1 version. We use Hazelcast 3.6.3 (not the latest one).

In our solution we have 2 types of VMs:
1) Those that process GET requests and send messages to the second type of VMs over shared EventBus,
2) Those that only process event messages that are received over the shared EventBus and send messages to the first type of VMs over the shared EventBus.


We have no issues with the 1st type of VMs - they process transactions (GET requests) and work like a charm.

We do have a huge issue with the 2nd type of VMs - they introduce a memory leak that happens within JVM.

We use NewRelic to monitor status of VMs and NewRelic clearly shows that the memory usage of Java process is increasing with time, until JVM runs out of heap memory and throws an exception.

We tried using periodical memory status prints to locate the source of the problem but it looks like the problem is not caused by the code that we wrote. Therefore we suspect that one of the Vert.x libraries is causing this.

----

We'd appreciate any suggestions on how to approach this problem:
- What tools to use to identify the source of the leak,
- Recommendation on libraries versions,
- Anything else that may be relevant.

Arik

Jez P

unread,
Jun 14, 2017, 3:28:37 PM6/14/17
to vert.x
Can you replicate the problem outside of production? If so, you should be able to run JVisualVM or similar and attach it to your running process. You can then get a very clear view of what your heap is doing. In addition if you think it's vertx-related, a reproducer which just fires a bunch of messages into the application should replicate the problem. 

Finally, how are you deploying your applications? What heap settings are you using on them? This could be important too, especially if using containers.

Arik Bron

unread,
Jun 14, 2017, 3:52:39 PM6/14/17
to vert.x
Thanks Jez!

- "Can you replicate the problem outside of production"
I believe it will be extremely hard since in production we work with high volumes of transactions per minute

- "you should be able to run JVisualVM"
We'll look into JVisualVM. Do you think it can be run in production without degrading too much the latency?

- how are you deploying your applications? What heap settings are you using on them?
We don't run Vert.x classes as a fat jar - we simply run the MainVerticle.java which in turn deploys automatically all the rest of the verticals (Workers). We don't use any explicit setting for JVM heap - both types of VMs run with the following command:
vertx run MainVerticle.java -cp classes -conf conf/my-conf.json -cluster

Jez P

unread,
Jun 14, 2017, 4:54:02 PM6/14/17
to vert.x
The other thing you could do is set your JVM flags to give you a heap dump on OOM, which you can then analyse using java tools. I'm not sure what are current best tools for that. I'm not sure you would normally want to expose your JMX stuff in prod. A better way would be to reduce your heap settings in another environment and fire fewer messages at it. It is just possible that the volume of processing you're doing could be outpacing the volume you can handle in which case you would presumably fill up message queues. 

Honestly though, production isn't the first place you should be hitting volume. You need to be running throughput tests in other environments to get a feel for what the heap requirements are, and also to give yourself the capability to simulate failures under load. You're going to find it very hard to troubleshoot in production.

Arik Bron

unread,
Jun 14, 2017, 4:54:08 PM6/14/17
to vert.x
After extensive digging into the code we suspect that the leak is caused by Vert.x WebClient. We use it to communicate with external resources.

We switched from periodical creation of the class instance to reuse of the same instance of WebClient, and at the moment the memory usage seems to be stable.

I will update the post later with more observations.

Jez P

unread,
Jun 14, 2017, 4:57:51 PM6/14/17
to vert.x
If you keep creating WebClients, then maybe that's the leak. Are you disposing of them properly? Could you be leaving stuff lying around? I thought normal practice would be one per verticle instance (because you need it on the same context as the calling verticle presumably). So it doesn't seem to make sense to create them periodically.

Arik Bron

unread,
Jun 14, 2017, 5:06:56 PM6/14/17
to vert.x
You're probably right, and that is why we switched to a singletone per JVM that holds 1 instance of the WebClient and constantly reuses it.

Yet, one would expect that even if the WebClient is created in the "speed of light", GC would be smart enough to free the dead ones. Unfortunately, this is not the case. I suspect that somewhere in the WebClient itself something is not released properly, without any relation to our own code.

Jez P

unread,
Jun 14, 2017, 6:20:07 PM6/14/17
to vert.x

Were you calling close() on the client when done with it? That's what you're supposed to do. If not, then you're almost certainly preventing GC from doing its job properly. Under the WebClient is an HttpClient and you have to call close() on those when done with them. 

From the HttpClient docs:-

void close()
Close the client. Closing will close down any pooled connections. Clients should always be closed after use.

Note the close down of pooled connections. By not explicitly calling close you probably leave the connections with references to them (possibly because close() enables breaking of cyclic references). Given that WebClient also has a close() method, I suspect very strongly that just assuming letting a WebClient go out of scope will clean up all its connections is a serious mistake. 

Arik Bron

unread,
Jun 14, 2017, 6:44:10 PM6/14/17
to vert.x
Thank you so much Jez! Wasn't aware of the fact that it has to be closed.

Added close() call on both HTTPClient and WebClient classes used in the project.

I think this topic can be closed :)

Jez P

unread,
Jun 15, 2017, 4:12:41 AM6/15/17
to vert.x
I think the WebClient main docs (rather than JavaDoc) are missing a clear instruction to close() WebClients when you want to dispose of them. Might be worth submitting a PR to make it clear that close() should be called before disposal (assuming this was the problem you were running into - let's ensure that it's resolved now, before assuming my shot in the dark was right).

Julien Viet

unread,
Jun 15, 2017, 5:39:13 AM6/15/17
to ve...@googlegroups.com
beyond memory leaks, a client can hold file descriptors and it’s always important to close the client to release them.

-- 
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/3ced75c3-0e34-4fc0-9a19-9d1d15c3b2b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arik Bron

unread,
Jun 15, 2017, 6:21:58 AM6/15/17
to vert.x
Thanks Julain.

And Jez is right - someone needs to add close() method call to WebClient documentation - http://vertx.io/docs/vertx-web-client/java/. It's simply not there.

Tim Fox

unread,
Jun 15, 2017, 6:47:50 AM6/15/17
to vert.x
I think the reason it's not there, is that a client instance is intended to be reused between requests, so typically it will be created in your verticle start() method. If that is the case, it will be automatically closed when your verticle is undeployed.

You only have to explicitly close it if you're doing something like creating the client, firing one or more requests then closing it again, which would be an anti-pattern.

Arik Bron

unread,
Jun 15, 2017, 6:53:27 AM6/15/17
to vert.x
If that's the case, WebClient's implementation should be changed to a singleton within Vert.x. Otherwise each developer will have to wrap it into his own singleton implementation.

With Workers approach, when you have a number of them running in parallel, it doesn't make much sense having only 1 WebClient as this can slow down each Worker's performance and in a way jeopardize their independence.

Jez P

unread,
Jun 15, 2017, 7:10:14 AM6/15/17
to vert.x
I'm not sure that works either. Don't you want one per verticle instance that survives for the lifetime of that instance? Which is exactly what Tim suggested. In your case presumably each "worker" is a verticle instance though if you're making non-blocking web calls they don't need to be worker verticles unless you're following up with blocking i/o (but the logical approach there is to pass the result of the web call over the eventbus to a worker verticle whose job is to manage the blocking i/o).

Tim Fox

unread,
Jun 15, 2017, 12:47:14 PM6/15/17
to vert.x
I said one per instance not a singleton, the two things are different :)

Tim Fox

unread,
Jun 15, 2017, 12:49:59 PM6/15/17
to vert.x

ad...@cs.miami.edu

unread,
Jun 15, 2017, 1:11:35 PM6/15/17
to vert.x
>> WebClient's implementation should be changed to a singleton within Vert.x

As I understand it, each client keeps its own pool.  So there could be situations where we want more than one client instance because we want more than one pool, which would be impossible if it were a true singleton.

However, it seems that this may not have been as intuitive as it could have been in the docs.  Perhaps we could add a finalize() method to webclient that cleans up the pool data and other resources when it is set to be GC'ed.  I know it is not good practice to rely on finalize methods, but in this case it does not seem like it would harm anything.  It might be a nice fallback that could prevent a severe memory leak if the user had misunderstood the necessity to close the client.

-Adam

Tim Fox

unread,
Jun 15, 2017, 2:16:58 PM6/15/17
to vert.x


On Thursday, 15 June 2017 18:11:35 UTC+1, ad...@cs.miami.edu wrote:
>> WebClient's implementation should be changed to a singleton within Vert.x

As I understand it, each client keeps its own pool.  So there could be situations where we want more than one client instance because we want more than one pool, which would be impossible if it were a true singleton.

However, it seems that this may not have been as intuitive as it could have been in the docs.  Perhaps we could add a finalize() method to webclient that cleans up the pool data and other resources when it is set to be GC'ed.

That already exists, so references to the client must have been maintained in the user's code.

ad...@cs.miami.edu

unread,
Jun 15, 2017, 3:38:31 PM6/15/17
to vert.x
>> That already exists, so references to the client must have been maintained in the user's code

Doh!  I missed that in the docs!

Arik Bron

unread,
Jun 15, 2017, 3:47:04 PM6/15/17
to vert.x
It might be the case that HTTPClient is properly documented.

I was referring to WebClient - we can't use HTTPClient since we need patch() method. If I'm not mistaken, the part that describes how HTTPClient should be handled is not present in the WebClient doc...

There is a short section called "Re-cap on Vert.x core HTTP client" that explicitly says that it's worthwhile to dive into HTTPClient's doc as well. But due to the lack of time I skipped it, as someone else might :)

Tim Fox

unread,
Jun 16, 2017, 4:21:53 AM6/16/17
to vert.x
The web client document does explain that it is based on the http client and inherits its features, but I do agree it could be made clearer :)

Tim Fox

unread,
Jun 16, 2017, 4:26:20 AM6/16/17
to vert.x
I should add.. there is nothing _illegal_ about using the client that way you are doing (create, send request, get response, close), I just wouldn't recommend it for most users as you can't take advantage of pooling and keep alive, so performance can suffer.
Reply all
Reply to author
Forward
0 new messages