What is the "recipe" for deploying multiple verticle instances?

1,461 views
Skip to first unread message

Alex Z.

unread,
Sep 27, 2017, 1:49:57 PM9/27/17
to vert.x
Hello!

The documentation regarding specifying number of verticle instances says 
"...This is useful for scaling easily across multiple cores. For example you might have a web-server verticle to deploy and multiple cores on your machine, so you want to deploy multiple instances to take utilise all the cores..."

1. What are the "best practices" or guidelines to deploy multiple instances of a verticle? 
2. What factors should be taken into a decision? 
3. Is there a formulae that considers number of event loops and number of cores in order to deduce the number of verticle instances?
4. Can you have "too many" instances?

For example, if I have a machine with 8 cores and the default number of event loops configured (2 * Runtime.getRuntime().availableProcessors()), 
how many instances should I be deploying?


Regards,
Alex

Julien Viet

unread,
Sep 27, 2017, 5:14:28 PM9/27/17
to ve...@googlegroups.com
On Sep 27, 2017, at 7:49 PM, Alex Z. <azagn...@gmail.com> wrote:

Hello!

The documentation regarding specifying number of verticle instances says 
"...This is useful for scaling easily across multiple cores. For example you might have a web-server verticle to deploy and multiple cores on your machine, so you want to deploy multiple instances to take utilise all the cores..."

1. What are the "best practices" or guidelines to deploy multiple instances of a verticle? 

it depends whether we consider of event loop (default) or blocking verticles

for event loop, this is mainly CPU driven, if you deploy one verticle on a single event loop and you see that its core activity saturates or if you see that the response time increase, you should deploy this verticle on more than one event loop.

for workers basically you can consider that deploying N worker will create a processing queue with at most N concurrent processing. Here the size depends on concurrency you want to throw at it and also the underlying blocking resource service time. For instance if need the capacity of serving 1000 requests per second and the average service time is 10 milliseconds, you need 10 instances. Of course that’s an ideal thing and in reality it’s a bit different but you get the idea.

2. What factors should be taken into a decision? 

for worker : think of a queue, so the concurrency and the service time.
for event loop : the load and the response time latency

3. Is there a formulae that considers number of event loops and number of cores in order to deduce the number of verticle instances?

not sure what you mean, however you can determine the number of event loops created by Vertx using VertxOptions#DEFAULT_EVENT_LOOP_POOL_SIZE


4. Can you have "too many" instances?

yes you can, having too many instances of a worker can potentially starve other worker instances by depleting the worker pool (that being said you can allocate one worker pool per deployment to avoid it)

for the event loop it’s usually less a problem, however again it depends on what you want to do. You might want to deploy one HTTP server on two event loops and another HTTP server on 4 event loops depending on the service you have.


For example, if I have a machine with 8 cores and the default number of event loops configured (2 * Runtime.getRuntime().availableProcessors()), 
how many instances should I be deploying?


Regards,
Alex

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/2b5558f3-87bf-4a39-932a-1ca417cadbe9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Z.

unread,
Sep 28, 2017, 3:06:56 PM9/28/17
to vert.x
Thanks, Julien.

To talk a little more about your example: "For instance if need the capacity of serving 1000 requests per second and the average service time is 10 milliseconds, you need 10 instances". 

According to my understanding when I deploy X instances of a non-blocking verticle on a single event loop, each instance runs on its own thread, even if it is a non-blocking verticle, right? Otherwise, how the verticle 10 instances get deployed?

So, back to your example: do I configure single event loop with 10 instances of a non-blocking verticle or 10 event-loops with 10 instances of a non-blocking verticle?

Regards,
Alex

Jez P

unread,
Sep 28, 2017, 3:55:49 PM9/28/17
to vert.x
You are wrong. Multiple instances can share event loop threads.

You will have n event loop threads in a vert.x instance (where n defaults to 2 * cores).

If you deploy 10 instances and have 2 event loop threads, they will distribute to around 5 per event loop thread. Multiple instances can and do bind to a single event loop thread. You do not get one thread per verticle instance.

http://vertx.io/blog/an-introduction-to-the-vert-x-context-object/ and the links therein to further information

Jez P

unread,
Sep 28, 2017, 3:56:58 PM9/28/17
to vert.x
In Julien's example, assuming you had 10 event loop threads configured, you would deploy at least 10 verticle instances. You would hope to end up with at least one on each thread (multiple bound to the same thread isn't fundamentally a problem, however). 

Alex Z.

unread,
Sep 28, 2017, 5:07:34 PM9/28/17
to vert.x
Jez,

This is great, thank you and I appreciate you pointing out that I am wrong. This is great.

In a perfect scenario we want to achieve thread affinity, so if we have 8 cores, we want to have 8 event loops (an event loop thread per core) and as you pointed out - at least an one (non-blocking) verticle instance per event loop thread. 

What are the cases where we would want to have 4 (or N) instances per core? Does even make sense and what benefits are we getting from it? 

Jez P

unread,
Sep 29, 2017, 4:48:44 AM9/29/17
to vert.x
I'm not sure I agree with your concept of a perfect scenario. For example, if you have 8 different verticles, with different load requirements, why would each instance of those verticles need a dedicated thread? Your hypothetical only appears to have any relevance in the scenario where all verticle instances require the same load, which is unlikely to be the case in any real-world scenario, except for the trivial "only one web server verticle" scenario. You might want one verticle to be on only one event loop thread (so you only deploy one instance of it) because you know full well its likely load will not saturate a core. You will deploy more instances of other verticles which have higher throughput requirements to maintain overall throughput. 

Alex Z.

unread,
Sep 29, 2017, 3:36:40 PM9/29/17
to vert.x
Hi Jez,

Thank you and please allow me to add additional information to clarity:

I am talking about the trivial "only one web server verticle" scenario, which is actually my case. My non-worker verticle reaches out to its downstream provider (a database) using a NIO client and returns some data to the caller. So, in this scenario all verticle instances would require the same load. My verticle does not do CPU intensive operations.

I deploy N instances of the web server verticle which all start an HTTP server on some port, let's say 8080. According to my understanding, under the cover Vert.x only creates a single HTTP server which actually listens on the address and round robins between the different instances. Since I have an 8 core machine, I am configuring one event loop per machine core and at least one verticle instance per event loop. 

Using an example where we need to serve 2000 RPS where the average service time is 10 milliseconds, I will need to deploy 20 instances of my verticle. In that case, since I have 8 event loops, I deploy 3 verticle instances per event loop (which would give me 24 instances in total), right?

Regards

Jez P

unread,
Sep 29, 2017, 4:32:04 PM9/29/17
to vert.x
I think I'd expect 8 instances to suffice in that case, if we could be sure that each would bind to a different event loop thread (I'm not sure we can be sure about that). I'm not sure you need three per event loop. During your 10ms latency on the response an instance could service another request unless I myself have misunderstood things. However, I'm not sure how the instances auto distribute between threads (I.e how a new instance determines which event loop to bind to). Julian et al should be able to comment on that. However since your io is non blocking, if you could guarantee an even distribution then surely one per thread would be sufficient.

Alex Z.

unread,
Sep 29, 2017, 5:47:45 PM9/29/17
to vert.x
Thanks Jez.

In the current scenario and based on your explanation, do you think that 8 loops and 8 instances (where each instance is binded to each loop) will have less throughput than 8 loops and 16 instances (where two instances of the same verticle are binded to each loop)? 
I guess what I am asking here: is there a use case when more instances of the same non-worker verticle, on 8 loops would allow higher throughput?

I really appreciate your patience and thank you for your responses!

Jez P

unread,
Sep 29, 2017, 6:05:09 PM9/29/17
to vert.x
As long as all I/o were non blocking then no. But I'm not sure if there's a guarantee that each instance will bind to a different event loop thread.

Expect one of the core team to arrive soon and explain that there's something I'm misunderstanding!

Alex Z.

unread,
Sep 30, 2017, 12:21:48 AM9/30/17
to vert.x
No worries, Jez... Thank you for having this discussion with me!

Jez P

unread,
Sep 30, 2017, 6:12:09 AM9/30/17
to vert.x
Of course it might be instructive just to try an experiment and see what happens :)

Alex Z.

unread,
Oct 1, 2017, 7:07:16 PM10/1/17
to vert.x
Yeah, I am running a few experiments these days to see what makes sense. 

Cheers

Che Yonggang

unread,
Jan 15, 2019, 4:26:57 PM1/15/19
to vert.x
Any findings?

This thread is a good example to discuss how to maximize Vertx performance for non-worker Verticles instances deployment. The document for this topic is insufficient.
Like to hear more insights.

hygin...@gmail.com

unread,
Feb 26, 2019, 7:07:09 PM2/26/19
to vert.x
Hi Alex,

Did you complete your experiments, i would be interested what is the correct configuration for your below requirement

Does 8 loops and 8 instances (where each instance is binded to each loop) will have less throughput than 8 loops and 16 instances (where two instances of the same verticle are binded to each loop).
Reply all
Reply to author
Forward
0 new messages