websocket : event bus locking up

310 views
Skip to first unread message

Yves Peckstadt

unread,
Jul 11, 2016, 9:42:02 PM7/11/16
to vert.x
Hi,

Currently I am creating a game server with Vertx and websocket.

On normal load all is well and everything runs as it should.  Currently I am using my stress test client to test higher to mid loads and here is when things are going wrong.

I am still working on a reproducer so for now all I can give is the scenario: 

5000 clients connecting to the server (5 clients per room - 1000 rooms) - joining a room - once joined start sending STREAM messages (player position etc) with an 100 ms interval.

WebsocketVerticle : 
  • step 1 : creates http Server and serverWebSocket.handler()
  • step 2 : in the handler convert the message to JsonObject and send it to MessageProcessingVerticle by eventbus.sendObservable. Sending it to a different verticle so in cluster mode any server can process any message
MessageProcessingVerticle: 
  • step 3 :vertx.eventBus().consumer(PROCESS_MESSAGE, this::processMessage);
  • step 4 :in processMessage get player room and room player list from redis (with vertx-redis-client in rxJava chain)
  • step 5 :vertx.eventBus().sendObservable(nodeId + MessageOutServiceVerticle.MESSAGE_OUT_GROUP,  message.toString());  Send message to MessageOutServiceVerticle instance of the node the player is connected to
  • step 6 :Message.reply(...) after sendObservable
MessageOutServiceVerticle:
  • responsible for sending out the final message to the client/player
if (vertx.isClustered()) {
      vertx
.eventBus().consumer(GameClusterManager.getInstance().getNodeID() + MESSAGE_OUT_GROUP, this::sendMessage);

       
} else {
            vertx
.eventBus().consumer(DEFAULT_NODE_ADDRESS + MESSAGE_OUT_GROUP, this::sendMessage);
       
}


  • is deployed on every server but as you can see the consumer address is node based
  • step 7 :send the final message to the websocket client by vertx.eventBus().send(playerHandlerId, json);  I use the websocket textHandlerId as the identifier for the player.
  • step 8: message reply
step 2 receives step 6 reply.  It doesn't wait for step 8.
  • All verticles are deployed in normal mode and only one instance.  
  • Can run in cluster mode but currently I am testing on one node only to reduce complexity.
  • HttpServer is deployed(for testing) with setKeepAlive(true) - setIdleTimeoutInSeconds(<high value ms>)
The performance hit comes from the STREAM messages as for every incoming STREAM message n messages will go out.  n = room player count - 1(self). If I disable the sending of the message to everyone in the room everything goes well. (even when not disabling the redis data retrieval) I can connect thousands of websocket clients without a hitch,join a room and not one disconnects by timeout or any other problem.

However if I do send out the message to everyone in the group, at around 1200-1400 clients I start getting main thread blocked errors,websocket connections start dropping and the following thread blocked error from vertx after a few seconds:

io.vertx.core.VertxException: Thread blocked
        at java
.lang.Object.hashCode(Native Method) ~[?:1.8.0_92]
        at java
.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1012) ~[?:1.8.0_92]
        at java
.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006) ~[?:1.8.0_92]
        at io
.vertx.core.impl.ConcurrentHashSet.add(ConcurrentHashSet.java:74) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.impl.ContextImpl.addCloseHook(ContextImpl.java:116) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.impl.VertxImpl.scheduleTimeout(VertxImpl.java:390) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.impl.VertxImpl.setTimer(VertxImpl.java:267) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.eventbus.impl.HandlerRegistration.<init>(HandlerRegistration.java:62) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.eventbus.impl.EventBusImpl.createReplyHandlerRegistration(EventBusImpl.java:401) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.eventbus.impl.EventBusImpl.sendOrPubInternal(EventBusImpl.java:413) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.eventbus.impl.EventBusImpl.send(EventBusImpl.java:94) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.eventbus.impl.EventBusImpl.send(EventBusImpl.java:84) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.rxjava.core.eventbus.EventBus.send(EventBus.java:84) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.rxjava.core.eventbus.EventBus.sendObservable(EventBus.java:107) ~[demo-1.0-SNAPSHOT.jar:?]
        at
***.***.***.game.websocket.WebsocketVerticle.onMessageReceived(WebsocketVerticle.java:133) ~[demo-1.0-SNAPSHOT.jar:?]
        at
***.***.***.game.websocket.WebsocketVerticle.lambda$null$487(WebsocketVerticle.java:91) ~[demo-1.0-SNAPSHOT.jar:?]
        at
***.***.***.game.websocket.WebsocketVerticle$$Lambda$226/2124547560.handle(Unknown Source) ~[?:?]
        at io
.vertx.rxjava.core.http.ServerWebSocket$1.handle(ServerWebSocket.java:153) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.rxjava.core.http.ServerWebSocket$1.handle(ServerWebSocket.java:151) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.WebSocketImplBase.handleFrame(WebSocketImplBase.java:166) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.ServerConnection.handleWsFrame(ServerConnection.java:327) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.ServerConnection.processMessage(ServerConnection.java:412) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.ServerConnection.handleMessage(ServerConnection.java:134) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.HttpServerImpl$ServerHandler.doMessageReceived(HttpServerImpl.java:482) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.HttpServerImpl$ServerHandler.doMessageReceived(HttpServerImpl.java:420) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.VertxHttpHandler.lambda$channelRead$18(VertxHttpHandler.java:80) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.VertxHttpHandler$$Lambda$128/298235518.run(Unknown Source) ~[?:?]
        at io
.vertx.core.impl.ContextImpl.lambda$wrapTask$16(ContextImpl.java:333) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.impl.ContextImpl$$Lambda$12/1048855692.run(Unknown Source) ~[?:?]
        at io
.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:225) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.http.impl.VertxHttpHandler.channelRead(VertxHttpHandler.java:80) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:124) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:368) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) ~[demo-1.0-SNAPSHOT.jar:?]
        at io
.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) ~[demo-1.0-SNAPSHOT.jar:?]
        at java
.lang.Thread.run(Thread.java:745) ~[?:1.8.0_92]

My thread dump shows some blocked threads when sending things on the event bus

"vert.x-eventloop-thread-1" #19 prio=5 os_prio=0 tid=0x00007f7c80717000 nid=0x2854 waiting for monitor entry [0x00007f7c6049b000]
   java
.lang.Thread.State: BLOCKED (on object monitor)
 at sun
.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:205)
 
- locked <0x0000000085d8f360> (a java.lang.Object)
 at io
.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:643)
 at io
.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:679)
 at io
.vertx.core.impl.EventLoopContext.executeAsync(EventLoopContext.java:38)
 at io
.vertx.core.impl.ContextImpl.runOnContext(ContextImpl.java:243)
 at io
.vertx.core.eventbus.impl.EventBusImpl.deliverToHandler(EventBusImpl.java:494)
 at io
.vertx.core.eventbus.impl.EventBusImpl.deliverMessageLocally(EventBusImpl.java:368)
 at io
.vertx.core.eventbus.impl.EventBusImpl.deliverMessageLocally(EventBusImpl.java:345)
 at io
.vertx.core.eventbus.impl.EventBusImpl.sendOrPub(EventBusImpl.java:320)
 at io
.vertx.core.eventbus.impl.EventBusImpl$SendContextImpl.next(EventBusImpl.java:448)
 at io
.vertx.core.eventbus.impl.EventBusImpl.sendOrPubInternal(EventBusImpl.java:416)
 at io
.vertx.core.eventbus.impl.EventBusImpl.send(EventBusImpl.java:94)
 at io
.vertx.core.eventbus.impl.EventBusImpl.send(EventBusImpl.java:84)
 at io
.vertx.rxjava.core.eventbus.EventBus.send(EventBus.java:83)
 at io
.vertx.rxjava.core.eventbus.EventBus.sendObservable(EventBus.java:104)
 at
***.***.***.game.services.message.processor.impl.MessageProcessorImpl.messageForEveryoneInRoom(MessageProcessorImpl.java:83)
 at
***.***.***.game.services.message.processor.StreamMessageProcessor.processFuture(StreamMessageProcessor.java:21)
 at
***.***.***.game.services.MessageProcessingServiceVerticle.processMessage(MessageProcessingServiceVerticle.java:107)
 at
***.***.***.game.services.MessageProcessingServiceVerticle$$Lambda$39/1127943851.handle(Unknown Source)
 at io
.vertx.rxjava.core.eventbus.EventBus$3.handle(EventBus.java:203)
 at io
.vertx.rxjava.core.eventbus.EventBus$3.handle(EventBus.java:201)
 at io
.vertx.core.eventbus.impl.HandlerRegistration.deliver(HandlerRegistration.java:213)
 at io
.vertx.core.eventbus.impl.HandlerRegistration.handle(HandlerRegistration.java:192)
 at io
.vertx.core.eventbus.impl.EventBusImpl.lambda$deliverToHandler$3(EventBusImpl.java:499)
 at io
.vertx.core.eventbus.impl.EventBusImpl$$Lambda$235/179655894.handle(Unknown Source)
 at io
.vertx.core.impl.ContextImpl.lambda$wrapTask$3(ContextImpl.java:359)
 at io
.vertx.core.impl.ContextImpl$$Lambda$15/2074820378.run(Unknown Source)
 at io
.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:339)
 at io
.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:393)
 at io
.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
 at java
.lang.Thread.run(Thread.java:745)


At first I thought it might be this issue https://github.com/eclipse/vert.x/issues/1343 but it doesn't seem like its a problem during the handshake.  The event bus seems to lock up basically.

Therefore I am wondering if the event bus (maybe in general or in specific for websockets) could be blocked just by sheer performance. Obviously it can and probably will be a coding error but as the code is relatively simple I cant seem to pinpoint the problem.

Any people who have run into the same problem by chance.  Any feedback or tips would be helpful.

Thanks.

Yves Peckstadt

unread,
Jul 11, 2016, 9:55:11 PM7/11/16
to vert.x
Forgot to mention that I am using vert 3.3.2

Julien Viet

unread,
Jul 12, 2016, 3:20:19 AM7/12/16
to ve...@googlegroups.com
Hi,

can you provide a full thread dump ?

I notice also that the stack traces shows code that does not correspond to Vert.x 3.3.2, for example:

io.vertx.core.impl.VertxImpl.setTimer(VertxImpl.java:267)

correspond to this line:


which is rather Vertx 3.2.1 :


--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/b551e89c-3a79-4c71-83f1-611f8aec856b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yves Peckstadt

unread,
Jul 12, 2016, 4:18:15 AM7/12/16
to vert.x
Thanks.  I will provide a full thread dump as fast as possible. 

Today I tested with some code changes that reduce the event bus load and while the problem didn't go away, the problem appeared at  much higher traffic.  Really showing the problem that in some way I am overloading the event bus.   

As for the 3.2.1 code: I checked all the dependencies and looks like everything is set to 3.3.2.  The only thing I can think of is the self build vertx redis client version that I build after Paulo fixed the following problem: https://github.com/vert-x3/vertx-redis-client/issues/61 .  At that time the fix wasn't merged yet. Although I think that was already snapshot 3.3.0 and obviously the stack trace error has nothing to do with the redis client.

I will re-check this again, maybe a deployment issue.


Paulo Lopes

unread,
Jul 12, 2016, 4:24:09 AM7/12/16
to vert.x
The redis fixes are available as of 3.3.1

Yves Peckstadt

unread,
Jul 13, 2016, 12:35:42 AM7/13/16
to vert.x
Thanks Paulo, removed my redis client build and I am using the latest version now.

As requested the full thread dump from start of the stress test until vertx throws its Thread blocked error.

As the file is too big too attach here I put it in my google drive 


Thanks.

Julien Viet

unread,
Jul 13, 2016, 3:41:56 AM7/13/16
to ve...@googlegroups.com
I see many dumps in this file, can you just give one (done with kill -3 or ctrl+alt+brk on windows) when this issue happens ?

it makes troubleshooting easier.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Yves Peckstadt

unread,
Jul 13, 2016, 3:53:32 AM7/13/16
to vert.x
My apologies. I appended a new thread dump to the same file for every second while running the test. Hence the large file.

This file only includes the last thread dump where the problem occurs. (taken with jstack but kill output should be the same)


jstack.log

Julien Viet

unread,
Jul 13, 2016, 4:06:55 AM7/13/16
to ve...@googlegroups.com
are you running two instances of vertx in the same VM ?


On Jul 13, 2016, at 9:53 AM, Yves Peckstadt <pecksta...@gmail.com> wrote:

My apologies. I appended a new thread dump to the same file for every second while running the test. Hence the large file.

This file only includes the last thread dump where the problem occurs. (taken with jstack but kill output should be the same)



--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

For more options, visit https://groups.google.com/d/optout.
<jstack.log>

Yves Peckstadt

unread,
Jul 13, 2016, 4:32:43 AM7/13/16
to vert.x
Yes. 

This is our project structure:

- Core
- Auth server
- Game server (websocket server and services)
- Demo project

Currently we are deploying and running one instance of the demo jar on the vm.   The demo project creates the Auth and Game server in one project. However both projects are intended to be able to run on separate servers so they both create a new Vertx instance. For testing purposes we are deploying them on one server now.

Could this give problems?

Julien Viet

unread,
Jul 13, 2016, 4:34:08 AM7/13/16
to ve...@googlegroups.com
if you use it correctly no, however it makes debugging harder as threads have same names.

could you give a dump with just one vertx instance per vm ?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Yves Peckstadt

unread,
Jul 13, 2016, 4:46:25 AM7/13/16
to vert.x
Ok, I will provide this asap.  (as I am in Japan this will be tomorrow for you guys).

Thank you for the fast feedback as always.

Julien Viet

unread,
Jul 13, 2016, 4:53:07 AM7/13/16
to ve...@googlegroups.com
I will be away for a few days, I’ll try to have a look though.

On Jul 13, 2016, at 10:46 AM, Yves Peckstadt <pecksta...@gmail.com> wrote:

Ok, I will provide this asap.  (as I am in Japan this will be tomorrow for you guys).

Thank you for the fast feedback as always.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Yves Peckstadt

unread,
Jul 14, 2016, 6:26:00 AM7/14/16
to vert.x
I haven't had the time yet to create the thread dump. However today I deployed both projects to a separate VM.

Doing that I immediately got hit with lots of JWT authentication errors  "iat > now ***". This happens when I validate the websocket connection before doing request.upgrade.

I then changed my settings to JWTOptions.setNoTimeStamp(true) and suddenly all problems are gone.  My stress test scenario, 5000 clients, is running without any problems or random disconnects.


This tells me following things:

* the iat claim error was causing everything to lock up.  A bit strange as before they error just didn't happen or at least did not show up in any error logging
* it is just a performance issue and creating two VM's might seem to fix it but just masquerades the problem because of the increased resources

Tomorrow I will test on 1 single VM again to pinpoint the problem.
Message has been deleted

Paulo Lopes

unread,
Jul 15, 2016, 5:48:25 AM7/15/16
to vert.x
the jwt as the root cause of the lock is also strange since that specific part of the code does:

verify if a key is present on a map, if true fetch it and compare it against a config value read at start time. The comparison is around 2 long values so it should not be too hard on the CPU either.

Now you're saying you're doing load testing and the the iat > now I can think that you are facing a clock synchronization issue (if using multiple machines) since JWT times are seconds so all times are always truncated forcing this issue to appear.

Could that be the case? (you could test it be forcing the issuer machine to have its time a few seconds behind the consumer.

Yves Peckstadt

unread,
Jul 18, 2016, 10:00:47 AM7/18/16
to vert.x
I made an euphoric post before thinking all problems were resolved. However after some more testing on one VM the same problems start to appear after multiple tests. Tomorrow I will return to work and take the thread dump.

I will test your suggestion as well and also use an NTP daemon.






Message has been deleted

Yves Peckstadt

unread,
Jul 19, 2016, 3:38:47 AM7/19/16
to vert.x
As promised a new thread dump. both projects running on one VM. After about 1200 connections everything locks up again.

Thanks.
jstack.16591.log

Julien Viet

unread,
Jul 19, 2016, 5:25:50 PM7/19/16
to ve...@googlegroups.com
can you tell what is your JDK version, OS and give the log file as well ?

On Jul 19, 2016, at 9:38 AM, Yves Peckstadt <pecksta...@gmail.com> wrote:

As promised a new thread dump. both projects running on one VM. After about 1200 connections everything locks up again.

Thanks.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

For more options, visit https://groups.google.com/d/optout.
<jstack.16591.log>

Yves Peckstadt

unread,
Jul 20, 2016, 3:56:08 AM7/20/16
to vert.x
CentOS Linux release 7.1.1503 (Core)

java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)

Attached server log file (implementation).  In the log file sometimes a replyException for CreateRoom is thrown. This is by(bad)design and is not causing any problems.





game-server.log

Julien Viet

unread,
Jul 20, 2016, 5:36:47 PM7/20/16
to ve...@googlegroups.com
can you try to set the system property “io.netty.selectorAutoRebuildThreshold” with different values lower value than 512 (the default) ?


--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

For more options, visit https://groups.google.com/d/optout.
<game-server.log>

Yves Peckstadt

unread,
Jul 20, 2016, 9:10:10 PM7/20/16
to vert.x
I tested with multiple values lower than 512 but it does not seem to make a difference. 

Julien Viet

unread,
Jul 21, 2016, 6:25:38 AM7/21/16
to ve...@googlegroups.com
can you make a flight recorder session of the server ?

On Jul 21, 2016, at 3:10 AM, Yves Peckstadt <pecksta...@gmail.com> wrote:

I tested with multiple values lower than 512 but it does not seem to make a difference. 

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Yves Peckstadt

unread,
Jul 25, 2016, 12:22:14 AM7/25/16
to vert.x

Julien Viet

unread,
Jul 25, 2016, 5:28:58 PM7/25/16
to ve...@googlegroups.com
Hi,

I inspected the report and it looks like you are doing your benchmark with server and the client on the same host which is not recommended.

It seems to be a CPU issue, with Vert.x spending lot of time cancelling timers (event bus reply handler), also your 8 eventloop threads seems to process events in an unfair manner (2 are very loaded) which seems suspicious.

your client might also take a lot of CPU and probably you are benching on localhost which is not realistic.

could you do that with two different machines ?

On Jul 25, 2016, at 6:22 AM, Yves Peckstadt <pecksta...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Yves Peckstadt

unread,
Jul 25, 2016, 8:59:24 PM7/25/16
to vert.x
Hi,

Thank you for looking at the report.

The test client and the server are running on a separate VM though and no testing on localhost is being done. Of course unless you mean that both VM's should be running on a different physical server.

VM1 : running one JVM which is the stress test client
VM2: currently running 2 JVM's : authentication server and game server(of which I send the report)
VM3: reserved to run the game server JVM on a separate VM so VM2 only runs the authentication server JVM. Currently not using this VM for testing purposes.  However in this scenario all tests run fine.

A CPU issue is also what I fear is the cause of the problem. It is correct that two two event loops are doing most of the work:

for example when a client sends a message to join a game room the following happens:

WebsocketVerticle (receive message on httpServer)
--> send message to MessageProcessingVerticle so the WebsocketVerticle is only responsible for receiving the message and keep it as free as possible.   
--> The MessageProcessingVerticle then checks all messages by type and processes them: this means for example for a join room message that separate Verticles(acting as services) like PlayerServiceVerticle and RoomServiceVerticle are called. (for example for getting player and room data). These services are also called over the event bus.  
--> Once processed send the message to a MessageOutVerticle and do a message.reply so the WebsocketVerticle does get a reply message. I can then use message queue system if needed.
--> The MessageOutVerticle is the one responsible for finally sending the message to the client  by way of eventbus.send(websocket.textHandlerId,<data>)  


I first tried to skip the last step,the MessageOutVerticle, and just send the reply websocket message (or new) to the client in the WebsocketVerticle reply handler in for processing the message.  This, while reducing the load on the event bus for sure,as expected resulted in all messages being processed quite slow as the WebsocketVerticle increased in load.   

I could create a new MessageProcessingVerticle for every type instead of one now which does all the work.  

So the current chain for let's say a join room message

WebsocketVerticle -->  MessageProcessingVerticle (by type creating a JoinRoomMessageProcessor class instance that actually does the work) --> MessageOutVerticle

would become

WebsocketVerticle --> JoinRoomMessageProcessingVerticle --> MessageOutVerticle

This would spread the work more however it would not really fix the high event bus load and the time its needing to cancel timers.  As the same amount of timers would need to be cancelled. 

Another option would be to convert my service verticles (like PlayerServiceVerticle) into normal classes.

Thanks.


Julien Viet

unread,
Jul 26, 2016, 3:37:14 AM7/26/16
to ve...@googlegroups.com
in the report I also see GC doing longer VM pauses at the end of the report : during the last 30 seconds there are 6 old collections with a total pause time of 13 seconds. One of these pause takes more than 2 seconds.

so I would also look at this and try to fix this GC issue that may trigger the problem, i.e queue events on the bus.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Yves Peckstadt

unread,
Jul 26, 2016, 4:32:00 AM7/26/16
to vert.x
Thanks for the feedback. I am taking a look at this.   Hopefully I can pinpoint the GC problem more. Has been a pretty annoying experience so far finding this problem lol.  Not the fault of Vertx of course.


A few maybe stupid questions:

* I guess the only way to get the java.util.PriorityQueue usage down is to use the eventBus less?
* Regarding spreading the load more over multiple event loops.  This would require me to create more Verticles and probably over engineer everything a bit. Would creating too many verticles be cause of concern performance wise?  I know that one verticle = 1 thread and event loop and same threads are used by multiple event loops so I am wondering that by doing this I would cause for slowdown instead of performance improvements as too many event loops would be using the same threads at the same time.

Thanks.


Julien Viet

unread,
Jul 26, 2016, 5:35:08 AM7/26/16
to ve...@googlegroups.com
On Jul 26, 2016, at 10:32 AM, Yves Peckstadt <pecksta...@gmail.com> wrote:

Thanks for the feedback. I am taking a look at this.   Hopefully I can pinpoint the GC problem more. Has been a pretty annoying experience so far finding this problem lol.  Not the fault of Vertx of course.


A few maybe stupid questions:

* I guess the only way to get the java.util.PriorityQueue usage down is to use the eventBus less?

it’s hard to say for now where it comes from, given that you have 2 thread overloaded and the other are quiet.

* Regarding spreading the load more over multiple event loops.  This would require me to create more Verticles and probably over engineer everything a bit. Would creating too many verticles be cause of concern performance wise?  I know that one verticle = 1 thread and event loop and same threads are used by multiple event loops so I am wondering that by doing this I would cause for slowdown instead of performance improvements as too many event loops would be using the same threads at the same time.


Thanks.



--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.

Yves Peckstadt

unread,
Jul 26, 2016, 9:05:27 PM7/26/16
to vert.x
Aah yes instead of splitting the tasks I can just increase the instances.  I will give that a try.
Reply all
Reply to author
Forward
0 new messages