Event bus send/reply timeouts?

Chris Micali

unread,

Aug 13, 2013, 5:29:33 PM8/13/13

to ve...@googlegroups.com

Is there a standard/expected way to have a eventbus send/reply timeout?

My case is I have a AccountsAPI HTTP service that sends an "account.get" message on the eventbus to be handled by an Account verticle. If the Account verticle(s) are unreachable or fail for some reason, the eventbus.send() in AccountsAPI currently never calls the completion Handler (and as such, doesn't call the request end() method so the HTTP request hangs)

Or - is there something more fundamental I am not understanding?

Thanks!

-c

petermd

unread,

Aug 15, 2013, 6:46:02 AM8/15/13

to ve...@googlegroups.com

Hi Chris,

I don't think there's a standard way, but it is a very common issue for me at least.

There are three cases I know of where an eventbus verticle will not respond in a timely fashion

no-one listening on that address due to config issue or verticle dying (no response)
handling verticle throws an exception while processing (no response)
handling verticle never responds / or responds too late for sender (no response or response too late)

As you say the outcome for a HTTP server is the server connection will hang (consuming resources / connections which may trigger a cascading failure) and the server becoming in general unresponsive (great fun when debugging)

For 1.3.1 I've made a patch to EventBus (https://github.com/meez/vert.x) that addresses these issues by allowing the sender to provide an additional failure handler (method) and modifying EventBus to

return a failure signal if there is no handler at an address
return a failure signal or if the handler throws an exception
allow the sender to specify a timeout that will guarantee one response (either success, failure signal or timeout signal) inside a given time period

This ensures that VertX will remain responsive and also aid debugging by providing detailed info on all failures to the calling verticle / client (eg when exception thrown). It would also provide the framework for building other more sophisticated handling (eg supervisor hierarchies)

I've not applied this patch to V2 because it requires some core API changes and maintaining them away from master is not practical. It is something I would definitely like to see addressed though, so perhaps it's something that could be discussed on the planning mtg today?

Regards,
Peter

Chris Micali

unread,

Aug 15, 2013, 7:54:28 AM8/15/13

to ve...@googlegroups.com

Peter,

Thanks for the reply - we are evaluating stacks that can be alternatives to standard java tech (dropwizard, http, etc) in our server infrastructure. Vert.x looks really promising but the combination of our inexperience and lack of familiarity with hazel cast, uncertainty about the event bus performance / failure modes, and a few omissions like these has ruled out using Vert.x for us.

I would definitely +1 this for inclusion, and a few fixes like this plus a few examples/testimonials of people who have relied upon the event bus in production systems would go a long way to making Vert.x an easier choice.

That said, we're going to trial it for one part of our infrastructure that requires many long-running connections and see how it fares.

-c

--
You received this message because you are subscribed to a topic in the Google Groups "vert.x" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/vertx/vRysgHkLNko/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to vertx+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tim Fox

unread,

Aug 15, 2013, 9:41:48 AM8/15/13

to ve...@googlegroups.com

Chris,

On 15/08/13 12:54, Chris Micali wrote:
> Peter,
>
> Thanks for the reply - we are evaluating stacks that can be alternatives to standard java tech (dropwizard, http, etc) in our server infrastructure. Vert.x looks really promising but the combination of our inexperience and lack of familiarity with hazel cast, uncertainty about the event bus performance / failure modes, and a few omissions like these has ruled out using Vert.x for us.

Other than event bus timeouts (which is a fairly simple feature to add)
can you be a bit more specific here about your other concerns?

I find with these kinds of evaluations I find it often makes sense to
engage with the development team at the beginning (this is the first
I've heard of this) so we can help you work through issues. Some of
which might turn out to be non-issues.

>
> I would definitely +1 this for inclusion, and a few fixes like this plus a few examples/testimonials of people who have relied upon the event bus in production systems would go a long way to making Vert.x an easier choice.
>
> That said, we're going to trial it for one part of our infrastructure that requires many long-running connections and see how it fares.

I would like to hear how you get on here. Tuning for many connections
requires tuning both the OS (file handles and TCP settings) and Vert.x -
there is some info in the main manual on this. But again, if you keep in
touch with us we can help you out.

>> To unsubscribe from this group and all of its topics, send an email to vertx+un...@googlegroups.com (mailto:vertx+un...@googlegroups.com).

petermd

unread,

Aug 15, 2013, 11:10:49 AM8/15/13

to ve...@googlegroups.com

Hey Chris,

It wasn't my intention to put you off VertX!

I'd be one of those people who is deploying VertX in a production system (today as it happens), and have had a very positive experience with the code and the community. Of course there are things that could be improved but that is a reality for every tech platform.

I'd suggest that if you do have concerns you raise them with the group and see if they can be addressed before you discount it completely.

Thanks,
Peter

javajosh

unread,

Aug 15, 2013, 11:44:23 AM8/15/13

to ve...@googlegroups.com

Hi Chris,

Yes, this is an issue we've run into as well.

Short answer: the solution I've used is to wrap the event bus in an application specific wrapper that handles dispatching to the event bus and firing off a (generally short-lived) timer that gets cancelled if the response comes back soon enough. This works great for point to point communication. I will also add it's really nice having a place in your application code that centralizes event bus access because it gives you all kinds of flexibility and visibility as to what your application is doing.

Long answer: in the non-point to point case, detecting "success" is more difficult. Consider a "publish". Success is that every verticle listening on an address received the message, which would require each verticle acknowledging receipt of the message...over the event bus. At that point you must throw away eventBus.publish and only use eventBus.send, iterating over recipient verticles in application code. Another solution, which isn't so extreme, is to adopt an application-wide convention that the publisher of an event will always receive some sort of response to the publish. In our case, we adopt the policy that we send() to a verticle which then does a publish() - and our sending verticle is listening on the publish address. We consider the message to be successfully published if we hear our own message back within a certain amount of time. This is not perfect, since other verticles might not have received the published message, but it does give us some assurance of delivery without incurring too much overhead. But for very sensitive communication that cannot fail, you really only want to do wrapped, point-to-point messaging with a timeout timer.

I'll see if I can publish some code on this later.

HTH

javajosh

unread,

Aug 15, 2013, 11:47:35 AM8/15/13

to ve...@googlegroups.com

On Thursday, August 15, 2013 6:41:48 AM UTC-7, Tim Fox wrote:

(this is the first I've heard of this) so we can help you work through issues.

That is very surprising, as reliability of communication between nodes in a distributed computing environment is a fundamental concern. I always assumed that you'd decided to punt on the issue and provide bare-bones framework simply to leave the solution to application developers.

Chris Micali

unread,

Aug 19, 2013, 9:44:16 AM8/19/13

to ve...@googlegroups.com

Tim,

Please see inline:

On Thursday, August 15, 2013 9:41:48 AM UTC-4, Tim Fox wrote:

Chris,

On 15/08/13 12:54, Chris Micali wrote:
> Peter,
>
> Thanks for the reply - we are evaluating stacks that can be alternatives to standard java tech (dropwizard, http, etc) in our server infrastructure. Vert.x looks really promising but the combination of our inexperience and lack of familiarity with hazel cast, uncertainty about the event bus performance / failure modes, and a few omissions like these has ruled out using Vert.x for us.

Other than event bus timeouts (which is a fairly simple feature to add)
can you be a bit more specific here about your other concerns?

Mainly just lack of expereince with Hazelcast. Not something that we have heard of or used before, though that said, it looks fairly mature looking at the web site.

We are planning an internal infrastrucutre of loosely-coupled services (10-15 services) and one of the things that drew us to Vert.x was the eventbus. Building services using that as the transport meant scaling (adding nodes), service discovery, and load balancing get handled automatically. The concerns that have come up are error handling, failure modes, and monitoring. On error handling / failure modes I have the least answers, but on monitoring it looked like hazelcast has some commercial products around that?

Either way, would be fantastic to learn about any examples of people deploying services like this and hear about their experience of dealing with timeouts, network issues, what happens when queues fill up, etc - the real stuff that always ends up happening in production - and how they dealt with it.

I find with these kinds of evaluations I find it often makes sense to
engage with the development team at the beginning (this is the first
I've heard of this) so we can help you work through issues. Some of
which might turn out to be non-issues.

Really appreciate this!

>
> I would definitely +1 this for inclusion, and a few fixes like this plus a few examples/testimonials of people who have relied upon the event bus in production systems would go a long way to making Vert.x an easier choice.
>
> That said, we're going to trial it for one part of our infrastructure that requires many long-running connections and see how it fares.

I would like to hear how you get on here. Tuning for many connections
requires tuning both the OS (file handles and TCP settings) and Vert.x -
there is some info in the main manual on this. But again, if you keep in
touch with us we can help you out.

Absolutely. I expect we'll get to implemening this part in about a month, so I will definitely keep the communication open as we make progress and provide feedback (and contribute where we can, of course)

Chris Micali

unread,

Aug 19, 2013, 9:45:29 AM8/19/13

to ve...@googlegroups.com

Thanks HTH - Would love to see how you attacked this. This was one of our options on addressing the issue but since it felt more fundamental I figured I would ask here first.

Reply all

Reply to author

Forward