Performance issue with vertx.setTimer()

赵普明

unread,

Jun 20, 2012, 10:29:54 PM6/20/12

to ve...@googlegroups.com

Hi :

A little back ground first:

Our company is starting a new project and needs a server that is fast, highly concurrent (design goal is 1+ billion request per day), and robust.

The server logic is not complicated:

1. get a request
2. dispatch a message out to n remote recipients
3. wait for them to process it
4. then
    - when all the results are received
        -> 4.1. calculate all results and sum up
    - when a timeout occurs
        -> 4.2. make an empty response
5. response back

We have been using a cluster of lighttpd for a similar server, but this our the company is willing to see whether this can be done with JVM (we have other projects in JVM).

I heard that vert.x is very performant, and after looking into the docs, felt that it really has a simple and elegant design, so I wish we can adopt vert.x. I told our team about vert.x and they would like to see some benchmarks that cover over basic scenario.

I'm using AB as the benchmark tool because our team have been using it and it is simple. (Though not very good at concurrent benchmarking)

The first benchmark is same as Tim's benchmark, return a 200 OK immediately.

public class ServerVerticle extends Verticle {

    @Override
    public void start() throws Exception {
        HttpServer server = vertx.createHttpServer();

        RouteMatcher route = new RouteMatcher();
        route.get("/", new Handler<HttpServerRequest>() {
            @Override
            public void handle(HttpServerRequest req) {
                req.response.statusCode=200;
                req.response.end();
            }
        });
        server.requestHandler(route);
        server.listen(8080);
    }

}

This one looks good even on my old computer.

ab -n 100000 -c 400 http://localhost:8080/

Requests per second: 10960.33 [#/sec] (mean)
Time per request: 36.495 [ms] (mean)

The second bechmark is a simple timeout, for each request, I set a timeout that waits 100ms and then response back.

public class ServerVerticle extends Verticle {

    @Override
    public void start() throws Exception {
        HttpServer server = vertx.createHttpServer();
        RouteMatcher route = new RouteMatcher();
        route.get("/", new Handler<HttpServerRequest>() {
            @Override
            public void handle(final HttpServerRequest req) {
                vertx.setTimer(100, new Handler<Long>() {
                    @Override
                    public void handle(Long timerId) {
                        req.response.statusCode = 200;
                        req.response.end();
                    }
                });
            }
        });
        server.requestHandler(route);
        server.listen(8080);
    }

}

and the Results:

Requests per second: 2246.05 [#/sec] (mean)
Time per request: 178.090 [ms] (mean)

Problem with this timeout scenario :

1. Throughput is drastically reduced
2. Request time is higher. 178 - 100 = 70ms per request, while the first bechmark is about 36ms per request.
3. many of the requests is slowed down:

Percentage of the requests served within a certain time (ms)
50%    123
66%    128
75%    133
80%    137
90%    152
95%    341
98%   1122
99%   1130
100%   3133 (longest request)

Only 90% of the requests is servred within a reasonable time. And in the first benchmark ,99% is below 50ms, which is OK

And this problem get worse when the concurrent level is higher (I know ab is not really concurrent, but I tried running several ab at the same time, this problem is still there, and the total request/second is some how better, rounded up to about 4500req/s

4. some times there are errors when the :

apr_socket_recv: Connection reset by peer (104)

#3 and #4 are showstoppers. I can't even talk to my team if this is really a problem.

More benchmarks will be needed after these problems solved.

I'm not quite familiar with the internals of NIO. So I came here for advice.

I'll try a similar benchmark with raw Netty , but I guess it will not be much different as vert.x is using Netty.

I'd really like to see vert.x used in our project. So please help me make it happen :)

Ben Kelly

unread,

Jun 20, 2012, 11:39:48 PM6/20/12

to ve...@googlegroups.com

It looks like vert.x uses netty's HashedWheelTimer under the hood. In addition, it uses the default 512 ticks per wheel setting. Just a shot in the dark, but perhaps this is poorly tuned for such a large number of timer tasks.

- Ben

> --
> You received this message because you are subscribed to the Google Groups "vert.x" group.
> To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/gnEN68il-1gJ.
> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to vertx+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/vertx?hl=en-GB.

John Reece

unread,

Jun 21, 2012, 2:27:27 AM6/21/12

to ve...@googlegroups.com

You're not measuring what you want to measure by using setTimer like this - as noted, this essentially tests the implementation of setTimer, not the performance of vertex itself.

Instead, why don't you introduce a 'wait' in a more 'vertex-like' way ... for example, issue a request to a worker, or even one of the supplied busmods (like the Mongo persistor) ...

-JR

Tim Fox

unread,

Jun 21, 2012, 3:41:38 AM6/21/12

to ve...@googlegroups.com

I'll take a look tomorrow (I'm out today)

Tim Fox

unread,

Jun 21, 2012, 3:51:31 AM6/21/12

to ve...@googlegroups.com

A quick question - what's the reason you are setting so many timers - are you trying to simulate the processing time of the request?

Is this something you would be doing in real-life?

Cheers

--
You received this message because you are subscribed to the Google Groups "vert.x" group.

To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/19FNWhW6884J.

赵普明

unread,

Jun 21, 2012, 4:34:42 AM6/21/12

to ve...@googlegroups.com

We have a timing requirement for each request:

Every request must return before a certain time, say, 200ms,

if at that time no recipient have returned, we need to return some default value.

I have no idea how else can I meet this requirement

在 2012年6月21日星期四UTC+8下午3时51分31秒，Tim Fox写道：

To unsubscribe from this group, send email to vertx+unsubscribe@googlegroups.com.

赵普明

unread,

Jun 21, 2012, 4:44:51 AM6/21/12

to ve...@googlegroups.com

You're right, I'm not too worried about the raw performance of vert.x, I think it meets our requirement.

The problem is this, we have to somehow restrict the time of each request-response cycle, that is a must meet requirement of our system. We need to messages to remote recipients and wait for their replies, but we can't control how long will they return. I can write the timer in the busmod that communicating with the recipients, but that is still one or more timer for each request.

Can you think of any way to deal with this? Thanks

在 2012年6月21日星期四UTC+8下午2时27分27秒，John Reece写道：

赵普明

unread,

Jun 21, 2012, 4:52:10 AM6/21/12

to ve...@googlegroups.com

Another thing to note,

our process:

1. get a request
2. dispatch a message out to n remote recipients
3. wait for them to process it
4. then
    - when all the results are received
        -> 4.1. calculate all results and sum up
    - when a timeout occurs
        -> 4.2. make an empty response
5. response back

the steps #2 and #3 are not under our control, we can not know how long will the remote recipient return their calculations, but all the other calculation is localized and could be tuned to meet the time requirement without the aid of a timer.

in step #2, we are sending messages to remote machines with vert.x HttpClient, as my other post have suggested, if HttpClientRequest.setTimeout is efficiently implemented (I have no idea whether it will use a Timer), then I could leave out the timer in step #4.

Thanks :)

在 2012年6月21日星期四UTC+8下午3时51分31秒，Tim Fox写道：

赵普明

unread,

Jun 21, 2012, 5:09:28 AM6/21/12

to ve...@googlegroups.com

Thanks for the tip. I'll try Netty with this Timer problem later :)

If that is the case, then I think vert.x should provide some API for tuning with the timer :)

- Puming

在 2012年6月21日星期四UTC+8上午11时39分48秒，Ben Kelly写道：

Tim Fox

unread,

Jun 21, 2012, 5:10:09 AM6/21/12

to ve...@googlegroups.com

On 21/06/2012 09:34, 锟斤拷锟斤拷锟斤拷 wrote:

We have a timing requirement for each request:

Every request must return before a certain time, say, 200ms,

if at that time no recipient have returned, we need to return some default value.

Can you do the timeout on the client side?

I have no idea how else can I meet this requirement

锟斤拷 2012锟斤拷6锟斤拷21锟斤拷锟斤拷锟斤拷锟斤拷UTC+8锟斤拷锟斤拷3时51锟斤拷31锟诫，Tim Fox写锟斤拷锟斤拷

A quick question - what's the reason you are setting so many timers - are you trying to simulate the processing time of the request?

Is this something you would be doing in real-life?

Cheers

On 21/06/2012 08:41, Tim Fox wrote:

I'll take a look tomorrow (I'm out today)

To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/gJl6S86dAP0J.

To post to this group, send an email to ve...@googlegroups.com.

To unsubscribe from this group, send email to vertx+un...@googlegroups.com.

赵普明

unread,

Jun 21, 2012, 6:10:38 AM6/21/12

to ve...@googlegroups.com

you mean the remote recipient in step #2 ? No. Those are the servers of our business clients. We can not control their behavior. All I can do is set a timeout on my HttpClientRequest, which communicates with them.

My problem is that I don't have the knowledge of the internals of vert.x, netty and NIO. I'm a newbie on these things.

Another thing that I found today is that even in the simplest server (the one that returns 200 OK), when I set ApacheBench's concurrent level higher, the average request time will increase

ab -n 100000 -c 500 http://localhost:8070/

Requests per second: 7989.97 [#/sec] (mean)
Time per request: 62.578 [ms] (mean)

ab -n 100000 -c 1000 http://localhost:8070/

Requests per second: 7145.44 [#/sec] (mean)
Time per request: 139.949 [ms] (mean)

ab -n 100000 -c 2000 http://localhost:8070/

Requests per second: 6398.20 [#/sec] (mean)
Time per request: 312.588 [ms] (mean)

I don't know whether the concurrent level in AB reflects actual situations, but I definitely can't pass the team review with these number. Lighttpd on the same machine does'nt have this problem.

maybe it requires some other performance tuning?

在 2012年6月21日星期四UTC+8下午5时10分09秒，Tim Fox写道：

Pid

unread,

Jun 21, 2012, 8:57:52 AM6/21/12

to ve...@googlegroups.com

On 21/06/2012 11:10, 赵普明 wrote:
> you mean the remote recipient in step #2 ? No. Those are the servers of
> our business clients. We can not control their behavior. All I can do is
> set a timeout on my HttpClientRequest, which communicates with them.
>
> My problem is that I don't have the knowledge of the internals of
> vert.x, netty and NIO. I'm a newbie on these things.
>
> Another thing that I found today is that even in the simplest server
> (the one that returns 200 OK), when I set ApacheBench's concurrent level
> higher, the average request time will increase

It's not even close to a reasonable test if the test generator is using
the resources of the same machine as vert.x to generate the test.

p

> ab -n 100000 -c 500 http://localhost:8070/
>
>
> Requests per second: 7989.97 [#/sec] (mean)
> Time per request: 62.578 [ms] (mean)
>
>
> ab -n 100000 -c 1000 http://localhost:8070/
>
> Requests per second: 7145.44 [#/sec] (mean)
> Time per request: 139.949 [ms] (mean)
>
>
>
> ab -n 100000 -c 2000 http://localhost:8070/
>
> Requests per second: 6398.20 [#/sec] (mean)
> Time per request: 312.588 [ms] (mean)
>
>
> I don't know whether the concurrent level in AB reflects actual
> situations, but I definitely can't pass the team review with these
> number. Lighttpd on the same machine does'nt have this problem.
>
> maybe it requires some other performance tuning?
>
>
> 在 2012年6月21日星期四UTC+8下午5时10分09秒，Tim Fox写道：
>
> On 21/06/2012 09:34, 锟斤拷锟斤拷锟斤拷 wrote:
>> We have a timing requirement for each request:
>>
>> Every request must return before a certain time, say, 200ms,
>>
>> if at that time no recipient have returned, we need to return some
>> default value.
>
> Can you do the timeout on the client side?
>

> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit

> https://groups.google.com/d/msg/vertx/-/wIFkD4wQJTgJ.

> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/vertx?hl=en-GB.

--

[key:62590808]

signature.asc

Tim Fox

unread,

Jun 21, 2012, 1:52:58 PM6/21/12

to ve...@googlegroups.com

Like I said, I'll take a look at this tomorrow, but I need a little more information... When you're running the server, what value are you providing to -instances ?

--
You received this message because you are subscribed to the Google Groups "vert.x" group.

To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/gnEN68il-1gJ.

Tim Fox

unread,

Jun 22, 2012, 6:24:52 AM6/22/12

to ve...@googlegroups.com

Some initial observations (more will follow):

1) I've been playing with ab this morning. AIUI it doesn't support pipelining (bad), and, by default, it creates and tears down a new TCP connection for every message sent (bad). Are your clients really doing this? Connection setup and teardown is clearly going to be much slower than re-using the same connection.

2) Regarding timeouts: If you set a timeout of 100 ms, after which you send a response, then the next request won't be read on the server until the previous response has been written. If we did this otherwise it would mean you could write responses in the wrong order which would break the HTTP protocol.

So for one connection, that means you can never get more than 1000 / timeout requests second = 10 req/s for a timeout of 100 ms.

If you have 400 connections you could get a maximum of 400 * 10 = 4000 req/s which is pretty close to what I see when I run it here.

Tim Fox

unread,

Jun 22, 2012, 7:23:47 AM6/22/12

to ve...@googlegroups.com

On Thursday, June 21, 2012 3:29:54 AM UTC+1, 赵普明 wrote:

This is most likely because ab is creating so many connections in a short space of time the TCP accept backlog is exceeded. When this occurs connections will be rejected.

You can increase this as follows:

HttpServer server = vertx.createHttpServer();

server.setAcceptBacklog(1000000);

Tim Fox

unread,

Jun 22, 2012, 7:52:21 AM6/22/12

to ve...@googlegroups.com

If you can repeat your results using httperf I will take a deeper look. To be honest, I don't really trust Apache Bench.

Tim Fox

unread,

Jun 22, 2012, 8:05:46 AM6/22/12

to ve...@googlegroups.com

On Thursday, June 21, 2012 11:10:38 AM UTC+1, 赵普明 wrote:

you mean the remote recipient in step #2 ? No. Those are the servers of our business clients. We can not control their behavior. All I can do is set a timeout on my HttpClientRequest, which communicates with them.

My problem is that I don't have the knowledge of the internals of vert.x, netty and NIO. I'm a newbie on these things.

Another thing that I found today is that even in the simplest server (the one that returns 200 OK), when I set ApacheBench's concurrent level higher, the average request time will increase

ab -n 100000 -c 500 http://localhost:8070/

Requests per second:    7989.97 [#/sec] (mean)
Time per request:       62.578 [ms] (mean)

ab -n 100000 -c 1000 http://localhost:8070/

Requests per second:    7145.44 [#/sec] (mean)
Time per request:       139.949 [ms] (mean)

ab -n 100000 -c 2000 http://localhost:8070/

Requests per second:    6398.20 [#/sec] (mean)
Time per request:       312.588 [ms] (mean)

I don't know whether the concurrent level in AB reflects actual situations, but I definitely can't pass the team review with these number. Lighttpd on the same machine does'nt have this problem.

I don't know much about lighttpd, but AIUI it is a webserver. How can it perform your use case of firing off a request to N members and waiting for all responses to come back, or timing out?

Tim Fox

unread,

Jun 24, 2012, 4:23:07 AM6/24/12

to ve...@googlegroups.com

Please see below

On Thursday, June 21, 2012 3:29:54 AM UTC+1, 赵普明 wrote:

I can also see, that when a lot of connections are made in a short amount of time, most connect quickly, but some take significantly longer to connect (usually just over 3 seconds).

I have replicated it in a simple Netty program (no Vert.x) (although I don't think the issue is with Netty), and have been trying to figure out what is going on at a low level (with WireShark).

Still investigating, but I will post my results, along with answers and explanations to all your points within the next few days.

赵普明

unread,

Jun 24, 2012, 10:52:17 PM6/24/12

to ve...@googlegroups.com

Sorry for the late response, I was on vacation the last three days.

First, regards to -instances, I've tried with 2, 4, 8 and 16, seems not much difference. I conceived that the bottleneck in the benchmark is not CPU, so more cores does not help here.

Some initial observations (more will follow):

1) I've been playing with ab this morning. AIUI it doesn't support pipelining (bad), and, by default, it creates and tears down a new TCP connection for every message sent (bad). Are your clients really doing this? Connection setup and teardown is clearly going to be much slower than re-using the same connection.

As for pipelining, I have tested 200 OK benchmark with HttpPerfClient, and results is good. So pipelining is going to help here. But in our really production environment there won't be pipelining, with all requests coming from the browsers. And most browsers don't have pipelining enabled by default.

As for keep-alive, I'll try that later and report back. I think it is really a reason here :) Thanks for the tip.

2) Regarding timeouts: If you set a timeout of 100 ms, after which you send a response, then the next request won't be read on the server until the previous response has been written. If we did this otherwise it would mean you could write responses in the wrong order which would break the HTTP protocol.

So for one connection, that means you can never get more than 1000 / timeout requests second = 10 req/s for a timeout of 100 ms.

If you have 400 connections you could get a maximum of 400 * 10 = 4000 req/s which is pretty close to what I see when I run it here.

That is the real meat here. That explains the drastic drop of throughput. So this means if we set a timer in the process, then is is not a really async process anymore? What if I do not use a timer, but send a message vie the event bus, and do the response after I receive a message from the event bus, Is that also blocked ?

My question is : Is there a way that I can do this otherwise? Our requests are independent and do not care about the order. And most of them is very small, so no need to do http chunking (no multiple request send for each real request). Can I somehow disable the ordering for this scenario?

So if I don't use a setTimer, but use event bus and HttpClientRequest.setTimout (which is not there yet), can I make it real async?

赵普明

unread,

Jun 24, 2012, 10:56:36 PM6/24/12

to ve...@googlegroups.com

I'll use httperf and get some numbers later :)

在 2012年6月22日星期五UTC+8下午7时52分21秒，Tim Fox写道：

Tim Fox

unread,

Jun 25, 2012, 2:02:36 AM6/25/12

to ve...@googlegroups.com

On Monday, June 25, 2012 3:52:17 AM UTC+1, 赵普明 wrote:

Sorry for the late response, I was on vacation the last three days.

First, regards to -instances, I've tried with 2, 4, 8 and 16, seems not much difference. I conceived that the bottleneck in the benchmark is not CPU, so more cores does not help here.

Some initial observations (more will follow):

1) I've been playing with ab this morning. AIUI it doesn't support pipelining (bad), and, by default, it creates and tears down a new TCP connection for every message sent (bad). Are your clients really doing this? Connection setup and teardown is clearly going to be much slower than re-using the same connection.

As for pipelining, I have tested 200 OK benchmark with HttpPerfClient, and results is good. So pipelining is going to help here. But in our really production environment there won't be pipelining, with all requests coming from the browsers. And most browsers don't have pipelining enabled by default.

As for keep-alive, I'll try that later and report back. I think it is really a reason here :)

Indeed

Thanks for the tip.

2) Regarding timeouts: If you set a timeout of 100 ms, after which you send a response, then the next request won't be read on the server until the previous response has been written. If we did this otherwise it would mean you could write responses in the wrong order which would break the HTTP protocol.

So for one connection, that means you can never get more than 1000 / timeout requests second = 10 req/s for a timeout of 100 ms.

If you have 400 connections you could get a maximum of 400 * 10 = 4000 req/s which is pretty close to what I see when I run it here.

That is the real meat here. That explains the drastic drop of throughput. So this means if we set a timer in the process, then is is not a really async process anymore?

It is perfectly async.

What if I do not use a timer, but send a message vie the event bus, and do the response after I receive a message from the event bus, Is that also blocked ?

My question is : Is there a way that I can do this otherwise? Our requests are independent and do not care about the order.

Yes, but the HTTP protocol does care about the order.

And most of them is very small, so no need to do http chunking (no multiple request send for each real request). Can I somehow disable the ordering for this scenario?

So if I don't use a setTimer, but use event bus and HttpClientRequest.setTimout (which is not there yet), can I make it real async?

I will explain in another post

Tim Fox

unread,

Jun 25, 2012, 2:42:32 AM6/25/12

to ve...@googlegroups.com

As promised here is a summary of my findings / observations

1. Connection per request.

You are creating a new TCP connection for every request that is sent. This is going to be slow - setting up a TCP connection requires a 3 way handshake at minimum (that's three trips across the network). By doing this you're really benchmarking the speed of your network (or loopback), not the server ;)

2. Slow connection time.

You observed that connection times are sometimes slow (around 3000 ms), I replicated this too using a raw Netty program with no Vert.x involved. I spent a few days looking into this at a low level with Wireshark.

What's happening on the TCP level is something like this. With the TCP connect handshake the client first sends a SYN packet to the server, the server replies with a SYN-ACK packet, and then the client replies to that with an ACK packet. I observed that some packets weren't being replied to by the server, so the client eventually sent them after a timeout. The default SYN resend timeout on Linux happens to be 3000 ms.

This appears to be because the accept queue for TCP connections in the process of being setup has been exceeded. This occurs because ab is a slow single threaded client and can't handle that many setups fast enough.

To fix this you need to do 3 things (10000 is just an arbitrarily high number I chose)

a) sudo sysctl -w net.core.somaxconn=10000

b) sudo sysctl -w net.ipv4.tcp_max_syn_backlog=10000

c)

HttpServer server = vertx.createHttpServer();

server.setAcceptBacklog(10000);

Just increasing the backlog in Java is *not* sufficient on Linux systems (and probably on other *nixes)

Once you've done that you should find that the longest connection setup times reported are much shorter.

3) Adding a timeout of 100ms for each request reduces throughput.

This is to be expected.

You are creating a new connection for each request. And Apache bench with a concurrency level of 400 will have no more than 400 connections open at any one time.

What's happening is this:

a) Create a connection

b) Send a request

c) Request received on server, timeout is set

d) 100ms later timeout fires, and response is written.

e) Connection is closed.

Since you have at most 400 connections at any one time, and each connection handles a single request, and each request-response takes *at least* 100ms, then that means you will have a maximum theoretical throughput of:

T = 400 * (1000/100) = 4000 requests / per sec

Going faster than that would break the laws of physics.

This does not take into account the time for the three way handshake so the actual throughput will be somewhat less than this.

The actual figure I observe on my desktop is:

Requests per second: 3265.87 [#/sec] (mean)

On Thursday, June 21, 2012 3:29:54 AM UTC+1, 赵普明 wrote:

Pid *

unread,

Jun 25, 2012, 3:10:09 AM6/25/12

to ve...@googlegroups.com

I've used 'siege' to load test apps before. Might be worth us giving that a go at some point.

--
You received this message because you are subscribed to the Google Groups "vert.x" group.

To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/AZtjyKwNwYcJ.

赵普明

unread,

Jun 25, 2012, 5:15:14 AM6/25/12

to ve...@googlegroups.com

Thanks Tim for such a thorough and detailed explaination :)

With all the efforts and explanation you gave me, our team is now pretty impressed with vert.x and using it as our server platform have been generally agreed :)

Now what I'm gonna do is to try to find ways to improve our systems performance and robustness. I'll report what we find along the way.

在 2012年6月25日星期一UTC+8下午2时42分32秒，Tim Fox写道：

As promised here is a summary of my findings / observations

1. Connection per request.

You are creating a new TCP connection for every request that is sent. This is going to be slow - setting up a TCP connection requires a 3 way handshake at minimum (that's three trips across the network). By doing this you're really benchmarking the speed of your network (or loopback), not the server ;)

Yes I think you're right. We have actually to ends of the process:

1. Millions of small, unrelated request from browsers to our front end server.
2. Our server will dispatch a message to multiple downstream recipients.

Unfortunately in the first step, both pipelining and keep-alive won't work. Those requests from the browsers are not consequent by nature, and for each user there won't be many requests at a time. So the requests scatters through many many users from the internet. That part is hard to deal. I'm wondering whether we can adjust the server to further improve performance for this scenario.

For the second step, I can apply both pipelining and keep-alive, as our downstream recipient servers are stable. But in this case we are the HttpClient, not the server. But I'll try make the client request as fast as I can. And thanks for the great design of vert.x, that won't be much more complicate than the httpperf example :)

An efficient HttpClientRequest.setTimeout() would certainly help here :)

2. Slow connection time.

You observed that connection times are sometimes slow (around 3000 ms), I replicated this too using a raw Netty program with no Vert.x involved. I spent a few days looking into this at a low level with Wireshark.

What's happening on the TCP level is something like this. With the TCP connect handshake the client first sends a SYN packet to the server, the server replies with a SYN-ACK packet, and then the client replies to that with an ACK packet. I observed that some packets weren't being replied to by the server, so the client eventually sent them after a timeout. The default SYN resend timeout on Linux happens to be 3000 ms.

This appears to be because the accept queue for TCP connections in the process of being setup has been exceeded. This occurs because ab is a slow single threaded client and can't handle that many setups fast enough.

To fix this you need to do 3 things (10000 is just an arbitrarily high number I chose)

a) sudo sysctl -w net.core.somaxconn=10000

b) sudo sysctl -w net.ipv4.tcp_max_syn_backlog=10000

c)

HttpServer server = vertx.createHttpServer();
server.setAcceptBacklog(10000);

Just increasing the backlog in Java is *not* sufficient on Linux systems (and probably on other *nixes)

Once you've done that you should find that the longest connection setup times reported are much shorter.

Thanks for this, without your investigation, we could not find a solution to this problem.

After setting the system settings according to your advice above, I've successfully ruled out those 5% slow connections !! :-)

3) Adding a timeout of 100ms for each request reduces throughput.

This is to be expected.

You are creating a new connection for each request. And Apache bench with a concurrency level of 400 will have no more than 400 connections open at any one time.

What's happening is this:

a) Create a connection
b) Send a request
c) Request received on server, timeout is set
d) 100ms later timeout fires, and response is written.
e) Connection is closed.

Since you have at most 400 connections at any one time, and each connection handles a single request, and each request-response takes *at least* 100ms, then that means you will have a maximum theoretical throughput of:

T = 400 * (1000/100) = 4000 requests / per sec

Going faster than that would break the laws of physics.

This does not take into account the time for the three way handshake so the actual throughput will be somewhat less than this.

The actual figure I observe on my desktop is:

Requests per second: 3265.87 [#/sec] (mean)

OK I finally get it :-)

I tried with httperf, and found that when I set a timeout, the performance is also drastically reduced, which leads to the conclusion you mentioned in another post:

"if you set a timeout of 100 ms, after which you send a response, then the next request won't be read on the server until the previous response has been written".

So that means the connection is in a blocking mode when a setTimer occurs. Is it just setTimer, or it's the same with all the other async processes, such as an event-bus communication?

In that case, if my request handler send an event-bus message and wait until the reply is receive to make the response, is the connection still blocked until all this is done?

What if I use keep-alive and pipelining ?

Can the connection somehow be switched to accept another request, and then switch back when the response is ready?

Now I get that Async in our context means async nature of the I/O part, not anything to do with blocking/non-blocking, right?

Conclusion: If I want better throughput and lower latency, I should get rid of the timer. because it would block the request.

But I need somehow to limit the time of the whole process.

What about HttpClientRequest.setTimout? if that is more efficient, and if an event-bus communication is NOT blocking, then it will be OK :)

Tim Fox

unread,

Jun 25, 2012, 6:38:03 AM6/25/12

to ve...@googlegroups.com

On 25/06/12 10:15, 赵普明 wrote:
> Thanks Tim for such a thorough and detailed explaination :)
>
> With all the efforts and explanation you gave me, our team is now
> pretty impressed with vert.x and using it as our server platform have
> been generally agreed :)

Great :)

Agreed.

setTimer doesn't influence the connection at all. What happens is that
Vert.x will not read the next request (i.e. call your request handler)
until the response of the previous request has been ended.

>
> In that case, if my request handler send an event-bus message and wait
> until the reply is receive to make the response, is the connection
> still blocked until all this is done?

Yes, if you are calling response.end when you get a reply, then the
connection will be "blocked" (actually nothing is blocked but nothing
will be read from the connection) until you call response.end

>
> What if I use keep-alive and pipelining ?

It's the same. An Http request on a connection won't be read (i.e.
request handler won't be called) until the previous response has been
written (ended)

>
> Can the connection somehow be switched to accept another request, and
> then switch back when the response is ready?

This is problematic. If we allowed you to read more requests before you
had written previous ones then you could write responses in the wrong order.

E.g. if the client sent request1, request2, then it might receive
response2 followed by response 1. This would break the HTTP protocol and
put your client in a mess.

Theoretically we could queue pending responses on the server side and
only send them back to the client when there are no "holes" in the
stream of responses, but this is complicated.

>
> Now I get that Async in our context means async nature of the I/O
> part, not anything to do with blocking/non-blocking, right?

Right, nothing actually blocks (by that I mean no thread blocks)

>
> Conclusion: If I want better throughput and lower latency, I should
> get rid of the timer. because it would block the request.
>
> But I need somehow to limit the time of the whole process.
>
> What about HttpClientRequest.setTimout? if that is more efficient, and
> if an event-bus communication is NOT blocking, then it will be OK :)

If you can do the timeout on the client side, that would be ideal.

>
> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit

> https://groups.google.com/d/msg/vertx/-/YODp7xC1Pd8J.

> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/vertx?hl=en-GB.

--
Tim Fox

Vert.x - effortless polyglot asynchronous application development
http://vertx.io
twitter:@timfox

赵普明

unread,

Jun 25, 2012, 10:34:15 PM6/25/12

to ve...@googlegroups.com

Yes, if you are calling response.end when you get a reply, then the
connection will be "blocked" (actually nothing is blocked but nothing
will be read from the connection) until you call response.end

Thanks Tim :)

Now I get it. The thread and computing resources are not blocked, it can do other things. It is only the current connection that is temporarily waiting for the response.

I assume that one connection is bound to a pair of client and server, so this means waiting for a reply does not affect requests from other clients(browsers) because they are in or will create different connection, right ? For our scenario, where a lot of small requests coming from all browsers over the internet, and not consequently, this should not be a problem.

So the bottleneck is how many connections we can manage at the same time.

In my tests, if I increase the concurrency level ( -c in AB), the average response time increases steadily. When -c is more than 1500, it deteriorates into more than 200ms per request. So far the best -c number is 1000. I think it's something with the TCP implementation. The more connections to manage, the more time each one have to wait in some queue.

I'm not sure my tests are right. Have you tested with a LOT of small and short-lived connections concurrently ? with httperf?

Tim Fox

unread,

Jun 26, 2012, 3:43:16 AM6/26/12

to ve...@googlegroups.com

Is most of the time being spent in connection? I would expect connection time to degrade linearly with number of connections using ab since it is single threaded. I.e. that single thread has to handle more connections so it will take longer. Using a multi-threaded client might help, but testing this properly is going to be hard - you will need many clients on different machines to the server. Putting clients on the same machine as the server is going to steal cpu cycles from the server and so not give you fair results.

But fundamentally a single server will only be able to handle X connection attempts per second. So increasing number of connections / second is going to increase accept queue size and therefore average connection time. This will be true for any server.

The way to mitigate that would be to calculate your maximum acceptable connect time, and then given the maximum number of connections you expect, you can work out how many servers you will need to satisfy that.

赵普明

unread,

Jun 26, 2012, 3:58:16 AM6/26/12

to ve...@googlegroups.com

Is most of the time being spent in connection?

Yes, I think so.

I would expect connection time to degrade linearly with number of connections using ab since it is single threaded. I.e. that single thread has to handle more connections so it will take longer. Using a multi-threaded client might help, but testing this properly is going to be hard - you will need many clients on different machines to the server. Putting clients on the same machine as the server is going to steal cpu cycles from the server and so not give you fair results.

I'll try benchmarking from different machine later. I think you're right.

But fundamentally a single server will only be able to handle X connection attempts per second. So increasing number of connections / second is going to increase accept queue size and therefore average connection time. This will be true for any server.

The way to mitigate that would be to calculate your maximum acceptable connect time, and then given the maximum number of connections you expect, you can work out how many servers you will need to satisfy that.

It is around 1000 connections on my machine. I'm wondering what numbers other people would get.

Tim Fox

unread,

Jun 26, 2012, 4:11:03 AM6/26/12

to ve...@googlegroups.com

What you are measuring here is not the number of connections, but the
number of *connects* per second (i.e. a rate). Vert.x can handle many
thousands of concurrent connections, but can only handle (like any
server) a lower number of connects per second.

If you disable timeouts on the server, and put your server on a real
production machine with several instances, tune TCP then fire many
clients (on different machines) at it, you should be able to measure how
many connects per second the server can handle within your allowable time.

> I'm wondering what numbers other people would get.
>
>
>
>

> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit

> https://groups.google.com/d/msg/vertx/-/s-_SnVC1rG0J.

赵普明

unread,

Jun 26, 2012, 4:27:11 AM6/26/12

to ve...@googlegroups.com

What you are measuring here is not the number of connections, but the
number of *connects* per second (i.e. a rate). Vert.x can handle many
thousands of concurrent connections, but can only handle (like any
server) a lower number of connects per second.

If you disable timeouts on the server, and put your server on a real
production machine with several instances, tune TCP then fire many
clients (on different machines) at it, you should be able to measure how
many connects per second the server can handle within your allowable time.

OK, I misused the unit 'connection', from now on I'll use connects per second (CPS?)

The 1000 I mentioned was meaning : If I use AB with -c 1000, in which case, there are about 1000 connections active (because AB starts a new connection only after the last one finishes), then the average response time is good. If I set -c more than 1000, it will become bad.

I now know AB does not detect the really situation. I'll get the environment later and try with the method you mentioned :)

Tim Fox

unread,

Jun 26, 2012, 4:43:33 AM6/26/12

to ve...@googlegroups.com

On 26/06/2012 09:27, 赵普明 wrote:

What you are measuring here is not the number of connections, but the
number of *connects* per second (i.e. a rate). Vert.x can handle many
thousands of concurrent connections, but can only handle (like any
server) a lower number of connects per second.

If you disable timeouts on the server, and put your server on a real
production machine with several instances, tune TCP then fire many
clients (on different machines) at it, you should be able to measure how
many connects per second the server can handle within your allowable time.

OK, I misused the unit 'connection', from now on I'll use connects per second (CPS?)

The 1000 I mentioned was meaning : If I use AB with -c 1000, in which case, there are about 1000 connections active (because AB starts a new connection only after the last one finishes),

1000 connections active at _any one time_. Over the test run there are many more than 1000 connections

then the average response time is good. If I set -c more than 1000, it will become bad.

Yes, like I mentioned before I don't think measuring this with a single ab all on the same machine is a good idea ;)

I now know AB does not detect the really situation. I'll get the environment later and try with the method you mentioned :)

--
You received this message because you are subscribed to the Google Groups "vert.x" group.

To view this discussion on the web, visit https://groups.google.com/d/msg/vertx/-/e0vmpG7u-z4J.

Bruno Bonacci

unread,

Jun 26, 2012, 5:58:48 AM6/26/12

to ve...@googlegroups.com

Hi,

I would suggest to follow guidelines and settings outlined here: http://wiki.eclipse.org/Jetty/Howto/High_Load

A normal Linux distribution is not targeted for such big load, you need to first increase buffers, handles, backlogs etc.

Additionally, since you have a very large number of very short connection you might want to enable the TCP SO_REUSEADDR or port reuse to reduce the number of ports in TIME_WAIT.

After you are sure that the TCP stack is not the your bottleneck, you should keep an eye at JVM-GC to make sure that you don't have too many pauses.

Finally, I do agree with Tim that you should run the loader tool from another box for a more realistic results.

Bruno

赵普明

unread,

Jun 26, 2012, 6:01:38 AM6/26/12

to ve...@googlegroups.com

Thanks for the tips Bruno. I'll look at that :) I'm not quite familiar with OS tweaking yet, but hopefully we'll get there

在 2012年6月26日星期二UTC+8下午5时58分48秒，Bruno Bonacci写道：

Tim Fox

unread,

Jun 27, 2012, 11:48:39 AM6/27/12

to ve...@googlegroups.com

On 26/06/12 10:58, Bruno Bonacci wrote:
>
> Hi,
> I would suggest to follow guidelines and settings outlined here:
> http://wiki.eclipse.org/Jetty/Howto/High_Load

Great resource :) Thanks Bruno

>
> A normal Linux distribution is not targeted for such big load, you
> need to first increase buffers, handles, backlogs etc.
> Additionally, since you have a very large number of very short
> connection you might want to enable the TCP SO_REUSEADDR or port reuse
> to reduce the number of ports in TIME_WAIT.
> After you are sure that the TCP stack is not the your bottleneck, you
> should keep an eye at JVM-GC to make sure that you don't have too many
> pauses.
>
> Finally, I do agree with Tim that you should run the loader tool from
> another box for a more realistic results.
>
> Bruno
>

> --
> You received this message because you are subscribed to the Google
> Groups "vert.x" group.
> To view this discussion on the web, visit

> https://groups.google.com/d/msg/vertx/-/Z4YAJ57SEs0J.

> To post to this group, send an email to ve...@googlegroups.com.
> To unsubscribe from this group, send email to
> vertx+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/vertx?hl=en-GB.

Reply all

Reply to author

Forward