EventBus "No Pong on server" when lots of messages are being sent.

Julien P

unread,

Jan 3, 2014, 8:28:38 AM1/3/14

to ve...@googlegroups.com

Hi,

I’m a new user of vertx, actually testing a way to send lots of messages in the event bus.

My test application has 2 modules :

One of them is an http server: it gets an URL like http://IP:PORT/message/test/[numberOfMessages]/[numberOfMessageByPaquet] and dispatch a JSON message within a loop.
The other one listens to the event bus and counts the number of messages received with a static attribute.

I read lots of google groups messages such as

and followed the different advice, but when I put more than 2 million messages in my program, an exception appears :

No pong from server IP:60074 - will consider it dead, timerID: 391 holder org.vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@fd660b5

I actually have 2 clusters on 2 computers and run my application with -> vertx runzip messenger-0.0.1-SNAPSHOT-mod.zip -ha (-cluster does the same thing).

I suppose my code blocks the event loop but I don’t know how to do it any differently.

My source code is available at:

https://gist.github.com/anonymous/bac4b7f5249e9b161b6f

Here is the end of my logging file on my node 1:

2014-01-03 11:54:42,093 [vert.x-eventloop-thread-11] INFO   - Message number 1800000 and plus
2014-01-03 11:54:42,539 [vert.x-eventloop-thread-11] INFO   - Message number 1900000 and plus
2014-01-03 11:54:42,982 [vert.x-eventloop-thread-11] INFO   - Message number 2000000 and plus
2014-01-03 11:54:42,982 [vert.x-worker-thread-3] INFO   - Messenger end send: 00-00-020(20950)
2014-01-03 11:55:11,166 [vert.x-worker-thread-1] WARN   - No pong from server 192.168.1.33:61176 - will consider it dead, timerID: 199 holder org.vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@b3c80aa

Does anybody have an idea on that subject ?

Thank you.

Best regards,

Julien

Julien P

unread,

Jan 3, 2014, 8:49:50 AM1/3/14

to ve...@googlegroups.com

Sorry, a little mistake in my post, there are only 1 cluster on 2 computer.

Ryan Chazen

unread,

Jan 4, 2014, 3:58:14 AM1/4/14

to ve...@googlegroups.com

You are blocking the event loop and I don't think you'd ever want to do something like that in practice (you'd rather make 1 message and include all the data inside it?)

However if you do want to get something like this and not block the runloop, you would change this part here

				for (int i = count; i < count + step; i++) {
					final JsonObject jsonMessage = new JsonObject();
					jsonMessage.putString("token", TOKEN_ROOT + i);
					vertx.eventBus().send("com.eogile.akka.core.Notifier.stat.plus", jsonMessage);
				}

to something like this

private int mCount, mStep;

sendNext(final int step) {
JsonObject jsonMessage = new JsonObject();
jsonMessage.putString("token", "TOKEN_ROOT" + mCount);
vertx.eventBus().send("..", jsonMessage, new Handler<Message>() {

            @Override
            public void handle(Message event) {
                if (step+1 < count)
                   sendNext(step+1);
            }
        };

Julien P

unread,

Jan 6, 2014, 3:18:19 AM1/6/14

to ve...@googlegroups.com

Hi Ryan,

Thank you for your help, I will try that this morning.

Tim Fox

unread,

Jan 6, 2014, 8:16:54 AM1/6/14

to ve...@googlegroups.com

Basically you are flooding the event bus and blocking the event loop, so it's not surprising that things don't work.

I recommend taking a look at the eb_perf example to see how to implement flow control on the event bus.

On 03/01/14 13:28, Julien P wrote:

Hi,

Iï¿½m a new user of vertx, actually testing a way to send lots of messages in the event bus.

My test application has 2 modules :

One of them is an http server: it gets an URL like http://IP:PORT/message/test/[numberOfMessages]/[numberOfMessageByPaquet] and dispatch a JSON message within a loop.

The other one listens to the event bus and counts the number of messages received with a static attribute.

I read lots of google groups messages such as

https://groups.google.com/forum/#!topic/vertx/SPRV0Yd4WQUï¿½or

https://groups.google.com/forum/#!msg/vertx/4X2ylZqxCa4/Qr-j4nwMj7cJ

and followed the different advice, but when I put more than 2 million messages in my program, an exception appears :

No pong from server IP:60074 - will consider it dead, timerID: 391 holder org.vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@fd660b5

I actually have 2 clusters on 2 computers and run my application with -> vertx runzip messenger-0.0.1-SNAPSHOT-mod.zip -ha (-cluster does the same thing).

I suppose my code blocks the event loop but I donï¿½t know how to do it any differently.

My source code is available at:

https://gist.github.com/anonymous/bac4b7f5249e9b161b6f

Here is the end of my logging file on my node 1:

ï¿½
2014-01-03 11:54:42,093 [vert.x-eventloop-thread-11] INFO ï¿½ - Message number 1800000 and plus 2014-01-03 11:54:42,539 [vert.x-eventloop-thread-11] INFO ï¿½ - Message number 1900000 and plus 2014-01-03 11:54:42,982 [vert.x-eventloop-thread-11] INFO ï¿½ - Message number 2000000 and plus 2014-01-03 11:54:42,982 [vert.x-worker-thread-3] INFO ï¿½ - Messenger end send: 00-00-020(20950) 2014-01-03 11:55:11,166 [vert.x-worker-thread-1] WARN ï¿½ - No pong from server 192.168.1.33:61176 - will consider it dead, timerID: 199 holder org.vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@b3c80aa

Does anybody have an idea on that subject ?

Thank you.

Best regards,

Julien

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tim Fox

unread,

Jan 6, 2014, 8:20:03 AM1/6/14

to ve...@googlegroups.com

On 04/01/14 08:58, Ryan Chazen wrote:

You are blocking the event loop and I don't think you'd ever want to do something like that in practice (you'd rather make 1 message and include all the data inside it?)

However if you do want to get something like this and not block the runloop, you would change this part here
				for (int i = count; i < count + step; i++) {
					final JsonObject jsonMessage = new JsonObject();
					jsonMessage.putString("token", TOKEN_ROOT + i);
					vertx.eventBus().send("com.eogile.akka.core.Notifier.stat.plus", jsonMessage);
				}
to something like this

private int mCount, mStep;

sendNext(final int step) {
JsonObject jsonMessage = new JsonObject();
jsonMessage.putString("token", "TOKEN_ROOT" + mCount);
vertx.eventBus().send("..", jsonMessage, new Handler<Message>() {

ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ @Override
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ public void handle(Message event) {
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ if (step+1 < count)
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ sendNext(step+1);
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ }
ï¿½ï¿½ï¿½ ï¿½ï¿½ï¿½ };
}

Yes, this is correct, but by itself it won't prevent the event bus being overwhelmed.

He'd also need to implement some kind of flow control e.g. by using tokens as in the eb_perf example.

On Friday, January 3, 2014 3:28:38 PM UTC+2, Julien P wrote:

Hi,

Iï¿½m a new user of vertx, actually testing a way to send lots of messages in the event bus.

My test application has 2 modules :

One of them is an http server: it gets an URL like http://IP:PORT/message/test/[numberOfMessages]/[numberOfMessageByPaquet] and dispatch a JSON message within a loop.

The other one listens to the event bus and counts the number of messages received with a static attribute.

I read lots of google groups messages such as

https://groups.google.com/forum/#!topic/vertx/SPRV0Yd4WQUï¿½or

https://groups.google.com/forum/#!msg/vertx/4X2ylZqxCa4/Qr-j4nwMj7cJ

and followed the different advice, but when I put more than 2 million messages in my program, an exception appears :

No pong from server IP:60074 - will consider it dead, timerID: 391 holder org.vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@fd660b5

I actually have 2 clusters on 2 computers and run my application with -> vertx runzip messenger-0.0.1-SNAPSHOT-mod.zip -ha (-cluster does the same thing).

I suppose my code blocks the event loop but I donï¿½t know how to do it any differently.

My source code is available at:

https://gist.github.com/anonymous/bac4b7f5249e9b161b6f

Here is the end of my logging file on my node 1:

ï¿½
2014-01-03 11:54:42,093 [vert.x-eventloop-thread-11] INFO ï¿½ - Message number 1800000 and plus 2014-01-03 11:54:42,539 [vert.x-eventloop-thread-11] INFO ï¿½ - Message number 1900000 and plus 2014-01-03 11:54:42,982 [vert.x-eventloop-thread-11] INFO ï¿½ - Message number 2000000 and plus 2014-01-03 11:54:42,982 [vert.x-worker-thread-3] INFO ï¿½ - Messenger end send: 00-00-020(20950) 2014-01-03 11:55:11,166 [vert.x-worker-thread-1] WARN ï¿½ - No pong from server 192.168.1.33:61176 - will consider it dead, timerID: 199 holder org.vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@b3c80aa

Does anybody have an idea on that subject ?

Thank you.

Best regards,

Julien

--

Julien P

unread,

Jan 6, 2014, 8:55:11 AM1/6/14

to ve...@googlegroups.com

Hi Tim,

Thank you for your help.

I was going to do some tests with Ryan's solution. I will take a look ASAP eb_perf example.

Julien

Julien P

unread,

Jan 6, 2014, 10:18:35 AM1/6/14

to ve...@googlegroups.com

@Ryan,

I tried your solution but the pong error is always here and it's seem to be slower.

@Tim

I looked your tips at https://github.com/vert-x/vertx-examples/tree/master/src/raw/javascript/eb_perf but I do the same thing like you. I use a recusive method in runOnContext method.

May be 20 seconds it's too short for me ? May be I could extend DefaultEventBus ?

Tim Fox

unread,

Jan 6, 2014, 10:21:37 AM1/6/14

to ve...@googlegroups.com

Can you show me your code?

Julien P

unread,

Jan 6, 2014, 10:22:53 AM1/6/14

to ve...@googlegroups.com

My source code is available at:

https://gist.github.com/anonymous/bac4b7f5249e9b161b6f

Tim Fox

unread,

Jan 6, 2014, 10:44:39 AM1/6/14

to ve...@googlegroups.com

You also need to do your own flow control or you will overwhelm the event bus. Take a look how it's done in the eb_perf example, for example :)

Julien P

unread,

Jan 6, 2014, 11:05:04 AM1/6/14

to ve...@googlegroups.com

Thank you Tim.

I agree that my flow control is a little bit simple but why are you choose 20 seconds to determine than the event bus is down ? Is it an arbitrarily number ?

Do you thing it's a good idea to extends DefaultEventBus to do customization ?

Julien

Iván Zahoránszky

unread,

Dec 1, 2014, 11:12:18 AM12/1/14

to ve...@googlegroups.com

Hi Guys,

I think sending a huge number of messages from a verticle is a valid use case. E.g. if I would like to process the result of an

SQL query.

Anyhow just to make sure. So in Java the patter which should be followed (in case of bulk message sending) is something like this (

using Java "closure"):

public class BulkSend2 extends Verticle {

@Override

public void start() {

vertx.eventBus().registerHandler("start", (Message<JsonObject> msg) -> {

JsonArray employees = msg.body().getArray("result");

final Iterator it = employees.iterator();

vertx.runOnContext(new Handler<Void>() {

@Override

public void handle(Void e) {

if (!it.hasNext()) {

return;

}

JsonObject employee = (JsonObject) it.next();

container.logger().info("Emp: " + employee);

vertx.eventBus().send("process_it", employee);

vertx.runOnContext(this);

}

});

}

Thx,
Ivan

Tim Fox

unread,

Dec 1, 2014, 11:14:31 AM12/1/14

to ve...@googlegroups.com

No, you need to do some flow control, as in the eb_perf example that I mentioned in my previous reply :)

For more options, visit https://groups.google.com/d/optout.

Iván Zahoránszky

unread,

Dec 1, 2014, 3:22:21 PM12/1/14

to ve...@googlegroups.com

Of course I understand that flow control is needed but finally in your example (sender.js) the sendMessage() function does the same as my

handler. The difference is that you send (batchSize / 2 but max credits) pieces of messages in one run. In my example I do it one by one just to be simple.

The flow is that some verticle sends a message to the "start" address (the message contains the elements to process). Then the BulkSend2

verticle sends a message (for each element) to a 3rd verticle which listens on the "process_it" address and processes the element (employee).

So sorry if I do not understand something but I don't know why you wrote it was not the pattern to follow :|

Thx,

Ivan

Tim Fox

unread,

Dec 1, 2014, 3:29:45 PM12/1/14

to ve...@googlegroups.com

There's no flow control in your example - it simply sends each message on a different traversal of the event loop. If you do this enough, you will probably run out of memory.

In the eb_perf example messages are only sent where there are available credits - this is the flow control part. Credits are sent from the receiver to the sender.

Iván Zahoránszky

unread,

Dec 3, 2014, 7:20:50 AM12/3/14

to ve...@googlegroups.com

I have just realized that when you say flow control you mean some kind of (over)load control/protection. And I thought that it was about message/work flow control. In this case

I understand what you wrote and I accept that I have no flow control in my example. However I only wanted to be sure about the pattern I should follow when using the runOnContext method.

thx,

Ivan

Tim Fox

unread,

Dec 3, 2014, 7:25:03 AM12/3/14

to ve...@googlegroups.com

This is what I mean by flow control: http://en.wikipedia.org/wiki/Flow_control_%28data%29

On 03/12/14 12:20, Iván Zahoránszky wrote:

I have just realized that when you say flow control you mean some kind of (over)load control/protection. And I thought that it was about message/work flow control.

Same thing really.

Iván Zahoránszky

unread,

Dec 30, 2014, 2:08:21 PM12/30/14

to ve...@googlegroups.com

Hi All,

Is there anything else than "overwhelming the event bus" what can cause "No pong" issue? Have you ever encountered this problem before because of

some network issue, deadlock, or whatever other reason? We are not able to get rid of it in our project. We have already introduced flow control so

theoretically flooding of the event bus cannot happen. Maybe somebody has some idea what else it can be.

Thx

Tim Fox

unread,

Dec 30, 2014, 2:34:17 PM12/30/14

to ve...@googlegroups.com

If you can create a reproducer, someone will investigate.

--

Iván Zahoránszky

unread,

Jan 4, 2015, 11:45:57 AM1/4/15

to ve...@googlegroups.com

This is the problem. The error is quite rare and very hard to reproduce. Thats why I asked about any potential root causes. As soon as I have a reproducer I will send it.

On Friday, January 3, 2014 2:28:38 PM UTC+1, Julien P wrote:

Iván Zahoránszky

unread,

Jan 6, 2015, 10:07:37 AM1/6/15

to ve...@googlegroups.com

I did some debugging and it seems that sending the ping messages and the pong responses are executed by the same event loop or worker threads which are used by normal (non worker) or worker verticles. A worker is supposed to be slow. If all the worker threads on one node are executing some slow operation it may happen that there is no pong sent back because there is no more available worker thread. However the node should not be considered to be dead. Is it intentional? And if it is how should I handle this situation?

On Friday, January 3, 2014 2:28:38 PM UTC+1, Julien P wrote:

Iván Zahoránszky

unread,

Jan 7, 2015, 4:25:59 AM1/7/15

to ve...@googlegroups.com

More precisely:

If I have a worker module:

{

"main": "nopong.Sen",

"worker": true,

"multi-threaded": true

}

with one worker verticle in it then sending the ping message in DefaultEventBus:

new PingMessage(serverID).write(holder.socket);

and canceling the pong timeout timer:

// Got a pong back
vertx.cancelTimer(timeoutID);

are executed by one of the available worker threads. If all of the threads are busy (which is acceptable in case of worker verticles)

then is it possible that the timer (responsible for pong timeout) is not cancelled in time and the other node is considered dead despite

it sent the pong in time??? Is it possible that the "canceling the timer" job is starving this way?

On Friday, January 3, 2014 2:28:38 PM UTC+1, Julien P wrote:

Iván Zahoránszky

unread,

Jan 8, 2015, 4:38:27 AM1/8/15

to ve...@googlegroups.com

I rephrase my question since there was no answer till now.

Is it possible that a "ping sending", "pong receiving" or "ping timeout canceling" code is starving and it causes the problem?

I suppose that these tasks have higher priority in the event loops and worker threads than event bus messages.

On Friday, January 3, 2014 2:28:38 PM UTC+1, Julien P wrote:

Tim Fox

unread,

Jan 8, 2015, 4:45:55 AM1/8/15

to ve...@googlegroups.com

I had a quick look at your code...

It looks like you are dumping a load of messages very quickly on the event bus. If you do this you should expect things not to work very well. You could run out of memory, or maybe the messages just back up waiting to be delivered which prevents ping messages getting through causing Vert.x to think other nodes are dead.

In short: Don't do this!

The event bus doesn't have flow control built in, so you will need to handle that yourself. I recommend searching this group where that has been discussed before :)

HTH

--

Iván Zahoránszky

unread,

Jan 30, 2015, 2:16:39 AM1/30/15

to ve...@googlegroups.com

Hi,

Can you describe what is the purpose of this ping-pong stuff? Is it for checking whether the event bus is healthy? Cluster node crash is detected on the cluster

manager (Hazelcast) level I suppose so it should be implemented because of any other reason. It seems to me that vertx could work properly without

this ping-pong stuff.

Thx,

Ivan

Julien Viet

unread,

Jan 30, 2015, 2:35:04 AM1/30/15

to ve...@googlegroups.com, Iván Zahoránszky

The eventbus used Hazelcast to know the cluster topology but then manages NetSocket connections to peers for sending events. The ping/pong is used by this for closing sockets when no pong response is received.

--
Julien Viet
www.julienviet.com

On 30 Jan 2015 at 08:16:41, Iván Zahoránszky (ivan.zah...@gmail.com) wrote:
> Hi,
>
> Can you describe what is the purpose of this ping-pong stuff? Is it for
> checking whether the event bus is healthy? Cluster node crash is detected
> on the cluster
> manager (Hazelcast) level I suppose so it should be implemented because of
> any other reason. It seems to me that vertx could work properly without
> this ping-pong stuff.
>
> Thx,
> Ivan
>
> On Friday, January 3, 2014 at 2:28:38 PM UTC+1, Julien P wrote:
> >
> > Hi,
> >
> > I’m a new user of vertx, actually testing a way to send lots of messages
> > in the event bus.
> >
> > My test application has 2 modules :
> >

> > - One of them is an http server: it gets an URL like

> > http://IP:PORT/message/test/[numberOfMessages]/[numberOfMessageByPaquet]
> > and dispatch a JSON message within a loop.

> > - The other one listens to the event bus and counts the number of

> > messages received with a static attribute.
> >
> > I read lots of google groups messages such as
> >

> > - https://groups.google.com/forum/#!topic/vertx/SPRV0Yd4WQU or
> > - https://groups.google.com/forum/#!msg/vertx/4X2ylZqxCa4/Qr-j4nwMj7cJ

> >
> > and followed the different advice, but when I put more than 2 million
> > messages in my program, an exception appears :
> >
> >

> > *No pong from server IP:60074 - will consider it dead, timerID: 391 holder
> > org.vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@fd660b5*

> > I actually have 2 clusters on 2 computers and run my application with ->
> > vertx runzip messenger-0.0.1-SNAPSHOT-mod.zip -ha (-cluster does the same
> > thing).
> >
> > I suppose my code blocks the event loop but I don’t know how to do it any
> > differently.
> >
> > My source code is available at:
> >
> > https://gist.github.com/anonymous/bac4b7f5249e9b161b6f
> >
> >
> > Here is the end of my logging file on my node 1:
> >
> >
> > 2014-01-03 11:54:42,093 [vert.x-eventloop-thread-11] INFO - Message
> > number 1800000 and plus
> > 2014-01-03 11:54:42,539 [vert.x-eventloop-thread-11] INFO - Message
> > number 1900000 and plus
> > 2014-01-03 11:54:42,982 [vert.x-eventloop-thread-11] INFO - Message
> > number 2000000 and plus
> > 2014-01-03 11:54:42,982 [vert.x-worker-thread-3] INFO - Messenger end
> > send: 00-00-020(20950)
> > 2014-01-03 11:55:11,166 [vert.x-worker-thread-1] WARN - No pong from
> > server 192.168.1.33:61176 - will consider it dead, timerID: 199 holder org
> > .vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@b3c80aa
> >
> >
> >
> > Does anybody have an idea on that subject ?
> >
> > Thank you.
> >
> > Best regards,
> >
> > Julien
> >
>

Iván Zahoránszky

unread,

Jan 30, 2015, 4:14:19 AM1/30/15

to ve...@googlegroups.com

Thx Julian. So it basically does not allow the system to have a lot of open (not responding) tcp connections, does it? But why does not it try to establish a new connection after closing? Why does the whole node stop?

On Friday, January 3, 2014 at 2:28:38 PM UTC+1, Julien P wrote:

Hi,

I’m a new user of vertx, actually testing a way to send lots of messages in the event bus.

My test application has 2 modules :

One of them is an http server: it gets an URL like http://IP:PORT/message/test/[numberOfMessages]/[numberOfMessageByPaquet] and dispatch a JSON message within a loop.

The other one listens to the event bus and counts the number of messages received with a static attribute.

I read lots of google groups messages such as

https://groups.google.com/forum/#!topic/vertx/SPRV0Yd4WQU or
https://groups.google.com/forum/#!msg/vertx/4X2ylZqxCa4/Qr-j4nwMj7cJ

and followed the different advice, but when I put more than 2 million messages in my program, an exception appears :

No pong from server IP:60074 - will consider it dead, timerID: 391 holder org.vertx.java.core.eventbus.impl.DefaultEventBus$ConnectionHolder@fd660b5

Tim Fox

unread,

Jan 30, 2015, 12:47:41 PM1/30/15

to ve...@googlegroups.com

On 08/01/15 09:38, Iván Zahoránszky wrote:

I rephrase my question since there was no answer till now.

Is it possible that a "ping sending", "pong receiving" or "ping timeout canceling" code is starving and it causes the problem?

No. Most probably you are overwhelming the event bus with messages and not doing flow control. Do you have a reproducer?

--

Iván Zahoránszky

unread,

Feb 3, 2015, 3:18:54 AM2/3/15

to ve...@googlegroups.com

Sorry for the slow reaction. The problem is that it is really hard to reproduce and I cannot send you our whole application. However I try to make a reproducer which emulates our app.

Reply all

Reply to author

Forward