Default worker pool size to minimum if no blocking operations are there in vertx application

858 views
Skip to first unread message

ankur gupta

unread,
Apr 6, 2018, 2:37:12 AM4/6/18
to vert.x
Hi ,

I have developed a vertx application and deployed as a standard verticle with 4 instances on production and there are no blocking operations in our application.
But in thread dumps I can see there are exact 20 worker threads present in WAITING state that might be because default worker pool size is 20 , can we reduce number of default worker pool size if no blocking operation in application. 

marouane gazanayi

unread,
Apr 6, 2018, 3:37:19 AM4/6/18
to vert.x
I think this Thread could interest you : https://groups.google.com/forum/#!topic/vertx/oo5LvNWGts4

Julien Viet

unread,
Apr 6, 2018, 4:02:41 AM4/6/18
to ve...@googlegroups.com
yes you can.

that being said, it's not really a problem to have these threads, specially if they are in waiting state.



--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/9696d9b8-6afd-401e-a287-c91ca51a4e1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ankur gupta

unread,
Apr 6, 2018, 6:30:59 AM4/6/18
to vert.x

Let me share more details of the problem we are facing.

We are always getting high CPU utilisation on our production machines.
And the configuration is 4 app servers , each with 8 cpu cores and running 5 instances of vert.x verticle on each server.

Below are the observations from the thread dump.
1. 20 vert.x-worker-thread present in WAITING state
2. 20 vert.x-internal-blocking threads in WAITING state
3. 16 vert.x-eventloop-thread all in RUNNABLE state.
4. 1 vertx-blocked-thread-checker in TIMED_WAITING state. 

Please suggest which could be the reason of high CPU utilization and what improvements can be done for high cpu utilization.

Jez P

unread,
Apr 6, 2018, 6:44:49 AM4/6/18
to vert.x
Have you done any heap monitoring or GC examination? You only have 5 verticles, so you shouldn't get to a high total CPU usage on an 8 core. Your worker/internal blocking threads are all waiting, so doing no work. This implies something else is using the CPU, and the obvious guess (under load) is GC. 

Do you know how much GC is going on?

Julien Viet

unread,
Apr 6, 2018, 8:02:41 AM4/6/18
to ve...@googlegroups.com
what's your current vertx version ?

ankur gupta

unread,
Apr 6, 2018, 8:11:53 AM4/6/18
to vert.x
vert.x version is 3.5.0

ankur gupta

unread,
Apr 6, 2018, 8:24:11 AM4/6/18
to vert.x

Attaching the thread dump also here.


Also we are using Aerospike server for database in this application. 2 aerospike servers are there in cluster with all the app servers connecting however to just one aerospike server.

please let me know if you find something.

ankur gupta

unread,
Apr 9, 2018, 7:24:57 AM4/9/18
to vert.x
Did some one found anything from thread dumps , please suggest if you find anything.

Grey Seal

unread,
Apr 18, 2018, 3:03:34 AM4/18/18
to vert.x
Did you get a chance to look at GC as suggested by Julien.

ankur gupta

unread,
Apr 18, 2018, 5:03:22 AM4/18/18
to vert.x

arindam goswami

unread,
Apr 19, 2018, 12:19:38 PM4/19/18
to vert.x
To add to Ankur's post, analysis of some of the GC logs using online tools flagged continuous full GCs running. Though we didn't get any OutOfMemory errors, still considering that this could be leading to high CPU, we used the following arguments while starting our application to set heap and metaspace sizes:

-Xms1024m -Xmx5120m -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=1024M

But even with these, the CPU util is still high. In 12 hours time it reaches upto 60%.

Another point to note is that system CPU usage and user CPU usage is same in our case. And we are seeing a lot of context switches - almost 10K on an 8 core server.

Also, memory distribution seems to be skewed - almost 6GB each for cached and used memory, and free memory just 2GB - this on a 16GB RAM server.

Available memory reduces rapidly after application restart to reach around 50% within 1 hour.

Our system processes around 150-200 requests per minute. And that too is split up over 4 servers. So the traffic load is also not that high.

We use aerospike for caching, and even for some persistent storage. Right now we have stopped using aerospike for persistent storage, to see if that is in any way causing high CPU utilisation. But that too has not reduced the CPU utilisation.

Any pointers would be really helpful.

@Julien, even with your patch, it didn't reduce the CPU util.

arindam goswami

unread,
Apr 19, 2018, 12:52:00 PM4/19/18
to ve...@googlegroups.com
Another thing:

In our aerospike java client, we use the following in our AsyncClient and AsyncClientPolicy:

ThreadPoolExecutor threadPoolExecutor = new ThreadPoolExecutor(

asyncPolicy.asyncMaxCommands / 2, asyncPolicy.asyncMaxCommands,

60L, TimeUnit.MILLISECONDS, new ArrayBlockingQueue<Runnable>(

asyncPolicy.asyncMaxCommands + 50));

threadPoolExecutor.allowCoreThreadTimeOut(true);

asyncPolicy.asyncTaskThreadPool = threadPoolExecutor;


asyncPolicy.asyncMaxCommands is set to 2000

and 

asyncPolicy.asyncSelectorThreads is set to 5 on an 8 CPU core server.


We tried multiple variants of this; even using a cached thread pool - to no avail though. 

Can this be causing unnecessary spawning of threads, and hence more context switches?


And are these context switches causing more CPU usage, or are we debugging down the wrong path?



To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.

arindam goswami

unread,
Apr 24, 2018, 9:53:53 AM4/24/18
to vert.x
Any insights anybody?

Julien Viet

unread,
Apr 24, 2018, 11:42:48 AM4/24/18
to ve...@googlegroups.com
my patch was a blind guess against an existing bug that will be fixed in 3.5.2

are you able to make some profiling and sample the JVM ?

Jez P

unread,
Apr 24, 2018, 11:49:50 AM4/24/18
to vert.x
Continuous full GCs will use a lot of CPU. Might be worth triggering a heap dump at some point so you can see what your java heap looks like.

arindam goswami

unread,
Apr 26, 2018, 9:33:24 AM4/26/18
to vert.x
We realised it was the CircuitBreaker which was causing the CPU hike. So we stopped using that. For now, we are using the default http client timeouts (which are quite high - 60 seconds)

With that though, we see api calls timing out at regular intervals for the same api.

Julien Viet

unread,
Apr 27, 2018, 6:40:22 AM4/27/18
to ve...@googlegroups.com
Hi,

can you elaborate on the circuit breaker CPU issue ?



--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

ankur gupta

unread,
May 29, 2018, 8:41:02 AM5/29/18
to vert.x

Hi Julien ,

We were using circuit breaker framework for http rest client calls to manage connections pooling and timeouts.
After disabling circuit breaker framework the CPU utilization was quite resolved for our application , as earlier our servers needed restart twice in a day but after this change cpu utilization reduced to below 1 % through out the day.
After one week , thinking that issue is resolved we reduced our instance types to 2 core 8 gb from 8 core 16 gb.
With this reduced infra also cpu utilization was normal (3-4%) but again after 4-5 days it started increasing and we had to restart the servers.

So with this reduced infra also we were restarting our application in every 4-5 days which is still far better than restarting twice in a day.

Now we found from this link that a fix is released for a memory leak issue in vertx and we updated our vert.x version from 3.5.1 to 3.5.2.CR2.
And after this change went live CPU utilization started increasing continuously and we had to restart the server with in 12 hours and after reverting back to previous version it is normal now.Please find below screen shot for reference.
 





This version update change went live at 7.00 o'clock and we reverted it back to 3.5.1 at 16.00 o'clock.I believe there is some issue of memory leak again in this release please check and let me know when stable version is released.

Julien Viet

unread,
May 29, 2018, 8:50:31 AM5/29/18
to ve...@googlegroups.com
Hi Ankur,

it is very hard to know without more details, can you provide a heap dump ?

can you try with 3.5.2.CR3 ?

Julien

ankur gupta

unread,
Jun 14, 2018, 5:10:12 AM6/14/18
to vert.x
Hi Julien ,

We updated to version 3.5.2 assuming the memory leak is fixed in this and released to production.But CPU usage started increasing after few hours of release.
And we have found 2 memory leaks after heap dump analysis. refer attached screenshots.

Julien Viet

unread,
Jun 14, 2018, 5:12:26 AM6/14/18
to ve...@googlegroups.com
Hi,

I can see in your screen shots that you have more than 100,000 HttpClient created

is that expected ?

I'm happy to analyse the heap dump if you can provide it.

Julien

ankur gupta

unread,
Jun 14, 2018, 5:17:58 AM6/14/18
to vert.x

Julien Viet

unread,
Jun 14, 2018, 5:18:43 AM6/14/18
to ve...@googlegroups.com
you haven't answered my other question.

ankur gupta

unread,
Jun 14, 2018, 5:32:52 AM6/14/18
to vert.x
Hi Julien , 

Yes we are creating new httpclient for each request .
Can we create a singleton for httpclient and use across the application.

Julien Viet

unread,
Jun 14, 2018, 5:37:37 AM6/14/18
to ve...@googlegroups.com
On 14 Jun 2018, at 11:32, ankur gupta <ankurgu...@gmail.com> wrote:

Hi Julien , 

Yes we are creating new httpclient for each request .

you should avoid this in the primary place.

or if you do you should close the client then.

Can we create a singleton for httpclient and use across the application.

you should create on client per verticle, that would give best performance.

are you using verticles ?

ankur gupta

unread,
Jun 14, 2018, 5:42:46 AM6/14/18
to vert.x
yes we are having a single verticle on our application.

Julien Viet

unread,
Jun 14, 2018, 5:43:45 AM6/14/18
to ve...@googlegroups.com
can you use one HttpClient per verticle and then test again ?

ankur gupta

unread,
Jun 14, 2018, 5:46:19 AM6/14/18
to ve...@googlegroups.com
Yes will put that code fix in next release and will update the version to 3.5.2

To unsubscribe from this group and stop receiving emails from it, send an email to vertx+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "vert.x" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/vertx/MWdV8d8hGWk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to vertx+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Ankur Gupta,
Senior Software Developer
Payu Payments Private Ltd 
+919096863964

Julien Viet

unread,
Jun 14, 2018, 5:49:41 AM6/14/18
to ve...@googlegroups.com
thanks, it should also improve your performance because creating one HttpClient per request implies you are not using connection keep-alive at all, and that's a very important point for web performance with HTTP/1.1

the HttpClient in 3.5.2 improved the handling of HTTP responses keep-alive, which is why now each client has a periodic timer associated with it.

ankur gupta

unread,
Jun 14, 2018, 5:54:46 AM6/14/18
to vert.x
If we are creating a single httpclient then is there any handling required for connection keep alive or it is handled in vertx itself.

--
You received this message because you are subscribed to a topic in the Google Groups "vert.x" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/vertx/MWdV8d8hGWk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to vertx+un...@googlegroups.com.



--
Ankur Gupta,
Senior Software Developer
Payu Payments Private Ltd 
+919096863964

Julien Viet

unread,
Jun 14, 2018, 8:04:43 AM6/14/18
to ve...@googlegroups.com
vertx will handle keep-alive at the client level with a maximum connection per host defined by the HttpClientOptions#maxPoolSize

ankur gupta

unread,
Jun 14, 2018, 8:13:41 AM6/14/18
to vert.x
What value we should keep for maxPoolSize ideally as we are not defining it right now and default size in HttpClientOptions is 5 for maxPoolSize.

Julien Viet

unread,
Jun 14, 2018, 8:20:50 AM6/14/18
to ve...@googlegroups.com
right now you have like no limit.

determining the correct value is important and it depends on your resource and the latency of the backend.

so it depends on the concurrency you want to achieve (performance) and the limit your application can handle (ports, file desctiptors) and also the target server will allow you. Some servers might restrict you on the number of concurrent connections you can open.

right now if you want the same behavior, just use a large value

Julien


ankur gupta

unread,
Jun 14, 2018, 8:30:43 AM6/14/18
to vert.x
Hi ,

I just wanted to understand the difference it will create , when value is small and large for maxPoolSize.

And by large value you mean in hundreds or thousands ?

Julien Viet

unread,
Jun 14, 2018, 9:08:58 AM6/14/18
to ve...@googlegroups.com
here is how this parameter influences your application (in a simplified manner):

let's suppose the server has a latency response of 10ms from the client perspective (that means the client sends a request and 10ms) if has finished to receive the response.

it means that a single HTTP connection can process at most 100 request / second

so with the default client you would get 5 x 100 r/q

if your server takes more time, then you need to increase the number of concurrent connection to get the same throughput and latencies.

my opinion is that you should set a reasonable value for it (let's say 10) and then see how your application is behaving.

you can use vertx-dropwizard-metricz project to get client information about wether or not your client is queueing for each endpoint, for instance if your host is "foo" then you can see the metric endpoint.foo:80.queue-delay the tells you if the client are queuing. If you have small queue-delay it is fine and you can try to use a lower value, if you see a bigger value it means that your http client request are waiting and are delayed and so you can try increase max pool size to reduce the queue delay.

HTH

Julien

ankur gupta

unread,
Jun 28, 2018, 11:31:56 AM6/28/18
to vert.x
Hi Julien ,

Thanks for the help and suggestions , our CPU utilization issue is resolved now after updating to 3.5.2 and making HttpClient singleton.

But after this release we observed from logs that around 200-300 requests failed daily because of connection timeout , although this number is small as compared to total traffic but still we need this issue to be solved.

We have configured connection timeout as 20 seconds and idle timeout as 45 seconds and max pool size 10 in HttpClientOptions.Could you help me if there is any issue with our HttpClient implementation or configuration.

Julien Viet

unread,
Jun 28, 2018, 11:41:49 AM6/28/18
to ve...@googlegroups.com
Hi,

it looks like you need a larger timeout value ?

Reply all
Reply to author
Forward
0 new messages