Performance tuning/setup for high throughput/concurrent connections

1,383 views
Skip to first unread message

Daum

unread,
Feb 13, 2016, 3:57:01 PM2/13/16
to vert.x
Hi Guys -

I'm working on building an application which will have between 5-6k concurrent connections.  Between all these connections roughly 150k requests a second will be sent.  I wanted to just test Vertx(coming from netty) to see how it'd perform.  Right now I follow suit with the hello world example where it sets the number of Verticle instances to 2x the processors.  However, whenever I start the app, I immediately get non-stop connection reset by peer errors:

Feb 13, 2016 3:53:11 PM io.vertx.core.net.impl.ConnectionBase
SEVERE: java.io.IOException: Connection reset by peer


They seem to just continue non-stop.  In my app I also query a database via an async call.  Since I do need to get the record back to respond to the request I understand I should use the executeBlocking call.  I just wanted to make sure that is the proper way to do it.  I use a future for the result with a timeout.  


I was curious on what parameters you'd recommend tweaking?  Also if you could give me pointers on the reset by peer issue I'm seeing that'd be appreciated.  Any other general pointers are always helpful!

Thanks,
Daum

Melvin Ross

unread,
Feb 13, 2016, 4:56:04 PM2/13/16
to vert.x
What are you using to simulate clients? I get similar behaviour(Connection Reset) when using wrk, though upping the timeout gets rid of most of them. I've yet to see the behaviour with actual client connections.

Daum

unread,
Feb 13, 2016, 5:06:02 PM2/13/16
to vert.x
It's actually another company that connects to our app.  We just throw up the app in their test environment, which is very close to production environment throughput. Locally with much fewer clients/qps we're not getting the error.

Daum

Melvin Ross

unread,
Feb 13, 2016, 10:05:20 PM2/13/16
to vert.x
If I were you, I'd catch the exception and log how long it took for the request to get served before it disconnected. I'd bet dollars to doughnuts it's an exceptionally long time. Even if you aren't blocking the event loop, there's only so much work a thread can do. 

-Melvin

Tim Fox

unread,
Feb 14, 2016, 4:17:28 AM2/14/16
to ve...@googlegroups.com
I recommend you look at the techempower Vert.x PRs for how to setup Vert.x for high throughput. 5-6k connections is not very many btw :)

"connection reset by peer" just means the other side (in your case the client) terminated the connection.
--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/vertx.
To view this discussion on the web, visit https://groups.google.com/d/msgid/vertx/f991cf7e-806b-427a-ab6b-4b25f22c6302%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daum

unread,
Feb 14, 2016, 7:41:06 AM2/14/16
to vert.x
Tim -

I actually even tried the benchmark code (https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Java/vertx/src/main/java/vertx/WebServer.java) to see how it performed.  The only modification I made was made it return slightly different text so that it'd keep the test environment getting back "valid" responses.  I saw the same issue there.  My application code simplified is this: http://dpaste.com/3BS8JP3 .  I even tried flipping it over so that instead of querying the DB it just always sends back the response "none" and was getting the connection reset error.


Melvin - what's the best way to catch the the exception?  I tried in the req.response().exceptionHandler() but it doesn't appear to be getting called?  

Thanks for the help!
Daum

Arnaud Estève

unread,
Feb 14, 2016, 7:52:17 AM2/14/16
to vert.x
I'm really interested into this exception catching, too.

I'm wondering if such an exception logging couldn't lead to issues / vulnerabilities from a malicious client (deliberately closing connections in a "harsh" way to generate stacktraces / logging). Maybe I'm off topic, but if you find a proper way, interception point, to  catch these exceptions in a clean+efficient way, please let us know.

Same goes with websocket frames exceptions (a client sending a huge amount of malformed frames).

Thanks.

Tim Fox

unread,
Feb 14, 2016, 8:28:19 AM2/14/16
to ve...@googlegroups.com
http://stackoverflow.com/questions/1434451/what-does-connection-reset-by-peer-mean

You can always disable logging of the error in your log config if it bothers you :)

Daum

unread,
Feb 14, 2016, 8:38:52 AM2/14/16
to vert.x
Tim - 

How would I do that?  I'm a little worried that I'm seeing so many of those, but can try to disable the logging to just see what the request/sec are.  That'll help me confirm how well we're doing as the clients will back off if they think the server isn't able to keep up.

Also any other tips on parameters to tweak?
Daum

Tim Fox

unread,
Feb 14, 2016, 8:46:50 AM2/14/16
to ve...@googlegroups.com
On 14/02/16 13:38, Daum wrote:
Tim - 

How would I do that?

That depends on what logging library you are using, e.g. JUL and log4J do this different. But in either case, the relevant docs should show you how.




 I'm a little worried that I'm seeing so many of those,

These are happening because your client (I've no idea what that is) is sending RST packets. That's not really something Vert.x has any control over.

Daum

unread,
Feb 14, 2016, 8:54:31 AM2/14/16
to vert.x
I'm not actually outputting the error messages, it appears it is coming from the Vertx core, which appears to be JUL.    Looking at the docs: http://vertx.io/docs/vertx-core/java/#_logging not sure what the proper way is to turn off just one of the messages?  I would want to disable that message but be able to view other messages in case there is something else going on.  Also do you you know why the exceptionHandler on the response itself isn't getting triggered?  

Daum

Tim Fox

unread,
Feb 14, 2016, 8:58:35 AM2/14/16
to ve...@googlegroups.com
On 14/02/16 13:54, Daum wrote:
I'm not actually outputting the error messages, it appears it is coming from the Vertx core, which appears to be JUL.  

Default is JUL but you can config it to use Log4J/SLF4J or whatever if you prefer.


 Looking at the docs: http://vertx.io/docs/vertx-core/java/#_logging not sure what the proper way is to turn off just one of the messages?

Take a look at the JUL documentation ;)


I would want to disable that message but be able to view other messages in case there is something else going on.  Also do you you know why the exceptionHandler on the response itself isn't getting triggered?

Probably because the response is already complete by that point.

Tim Fox

unread,
Feb 14, 2016, 9:04:24 AM2/14/16
to ve...@googlegroups.com
On 14/02/16 13:58, Tim Fox wrote:
On 14/02/16 13:54, Daum wrote:
I'm not actually outputting the error messages, it appears it is coming from the Vertx core, which appears to be JUL.  

Default is JUL but you can config it to use Log4J/SLF4J or whatever if you prefer.

 Looking at the docs: http://vertx.io/docs/vertx-core/java/#_logging not sure what the proper way is to turn off just one of the messages?

Take a look at the JUL documentation ;)

I would want to disable that message but be able to view other messages in case there is something else going on.  Also do you you know why the exceptionHandler on the response itself isn't getting triggered?

Probably because the response is already complete by that point.

I'd guess your client is opening a connection, sending a request, getting a response, then "hard" closing the connection (sending RST packet). This would cause a "connection reset by peer" exception on the server.

If this is the case, it's a pretty inefficient way for a client to interact. TCP connection setup requires a 3/4 way handshake and is pretty slow. If that's happening for each request you're not going to get great performance.

All this is pure guesswork as I've no idea what your client is actually doing.

Daum

unread,
Feb 14, 2016, 9:09:24 AM2/14/16
to vert.x
Looking into the logging docs.  The client should be doing just regular keep-alive queries.  It'll do around 300-500 requests (if not more) per connection before disconnecting.    We don't have access to the client code itself so it's a bit of a black box unfortunately.

Daum

Daum

unread,
Feb 14, 2016, 9:19:30 AM2/14/16
to vert.x
Ah it appears that we're seeing similar messages on our Netty implementation, we just didn't output them.  The different part here though is that we're seeing about a 5x slower average response time (0.25ms versus 1ms).  Should I be changing the blockedThreadCheckInterval or something similar?  The load on these machines is only 6 (they are 32 core machines), so we should be able to get much lower response times.  It seems as if they blocked threads aren't being tended to often enough maybe?  

Thanks again for the help on this,
Daum

Tim Fox

unread,
Feb 14, 2016, 9:31:25 AM2/14/16
to ve...@googlegroups.com
These kinds of issues are extremely hard to diagnose over email when the amount of information I have to go on is so limited.

I wouldn't expect to see a 400% difference between Netty and Vert.x, the techempower benchmarks show about a 25% difference:

https://www.techempower.com/benchmarks/#section=data-r8&hw=ec2&test=plaintext

So most probably it's some kind of experimental/environment setup issue but really I'm just guessing here.

radoslaw busz

unread,
Feb 15, 2016, 4:57:14 AM2/15/16
to vert.x
Hi guys,
It might be a long shot but I would double check if everything is fine with the amount of opened connections. I mean to check whether vert.x application is properly closing connections (maybe there's some error which leaves dangling connections) or even try to tweak the operating system to increase the opened files limit to allow more opened connections.

Radek

Anil

unread,
Feb 16, 2016, 6:47:27 AM2/16/16
to vert.x
HI Daumn,

Could you please check if you are seeing timedout reply exception in the logs.

I noticed connection reset by peer exception when there is eventbus timedout exception. 

Regards,
Anil

Anil

unread,
Feb 16, 2016, 6:49:20 AM2/16/16
to vert.x

Daum

unread,
Feb 16, 2016, 6:50:53 AM2/16/16
to vert.x
It was just the client connecting to us closing the connection after a certain number of requests.  We were not using the event bus.
Reply all
Reply to author
Forward
0 new messages