Concurrent HTTP request performance issue

2,097 views
Skip to first unread message

Paweł Zięba

unread,
Jan 23, 2015, 10:03:16 AM1/23/15
to ve...@googlegroups.com

I'm trying to use Vert.x as a client app to test performance of a server application. The test scenario involves creating some long running tasks on external server via HTTP and for each task it long polls the server for results. The Vert.x client app is deployed as follows (4 event loops on my machine):
1. Worker verticle which prepare the data,
2. Worker verticle which starts the test flow by sending events to start the process for each data item prepared by the first verticle,
3. Verticle responsible for handling the test flow, which involves sending HTTP requests until server process is completed (the long polling using a timer).

The problem is that when the load reaches about 1000 concurrent requests in the client, the server responses are handled by Vert.x with a high latency. Many responses take more than 5 sec. to be handled, however the HAProxy logs show that none of the request takes more than 200 ms.

I guess that Vert.x waits for some resource. At first, I though that the event loop is overwhelmed so I measured the time between setting the timer and when it is actually run. It occurred that the timer is executed on time (after requested delay). So I guess that this is not the reason. 

My second hunch was that Vert.x HTTP client waits for HTTP connection to be available. So I tried to increase the size of the connection pool of HttpClient. Unfortunately, that leads to java.nio.channels.ClosedChannelException(s) and I could not find the reason why they occur. 

What could be the reason of such high latency of response handling? What can be done about it? Is setting a HttpClient connection pool of size greater than 1 (default) used on a single verticle a good thing? If yes, what could be done to avoid ClosedChannelException exceptions?

I would be very grateful for any help. 

Besides, I have also some more general questions. Is there a way to monitor the load of the event loop? E.g. how many events wait for being handled? How long on average does an event wait for being handled?

Below you can find some code chunks that could be useful in understanding the problem:

Application instance runs a single verticle that deploys three other verticles:
@Override
public void start() {
   
JsonObject config = container.config();
    container
.deployWorkerVerticle(UserCreator.class.getName(), config, 1, false,
            v
-> container.deployVerticle(ResourceCreator.class.getName(), config,
                    v2
-> container.deployVerticle(PerformanceTests.class.getName(), config)));

   
this.vertx.eventBus().registerHandler(ERROR_HANDLER, errorMessage -> {
        container
.logger().error("ERROR " + errorMessage.body());
        container
.exit();
   
});
}

HttpClient is configured as follows (as a verticle field):
    private HttpClient httpClient;
    private HttpClient getHttpClient() {
        if (httpClient == null) {
            httpClient = vertx.createHttpClient()
                    .setHost(container.config().getString("host", "localhost"))
                    .setPort(container.config().getInteger("port", 3000));
        }
        return httpClient;
    }

Sending check status request (long polling):
        ResourceStatusHandler handler = new ResourceStatusHandler(auth, id, System.currentTimeMillis());

        container.logger().debug("Http status request sent");
        getHttpClient().get("/resource/" + id, handler)
                .exceptionHandler(new HttpExceptionHandler(vertx))
                .putHeader("Authorization", auth)
                .end();

Finally, the response handler:
    private class ResourceStatusHandler implements Handler<HttpClientResponse> {
        private long startedAt;
        private String auth;
        private String id;

        public ResourceStatusHandler(String auth, String id, long startedAt) {
            this.auth = auth;
            this.id = id;
            this.startedAt = startedAt;
        }

        @Override
        public void handle(HttpClientResponse event) {
            if (event.statusCode() != 200) {
                Util.reportHttpError(vertx, event);
                return;
            }

            container.logger().debug("Got status in " + (System.currentTimeMillis() - this.startedAt));
            event.bodyHandler(body -> {
                JsonObject bodyJson = new JsonObject(body.toString());

                String state = bodyJson.getString("state");

                if ("PROCESSING".equals(state)) {
                    vertx.setTimer(500, l -> getResourceStatus(id, auth));
                } else if ("FINISHED".equals(state)) {
                    numberOfFinishedResources++;
                    resources.get(id).finishedAt(System.currentTimeMillis());
                }
            });
        }
    }


Thanks. 

Tim Fox

unread,
Jan 23, 2015, 10:19:37 AM1/23/15
to ve...@googlegroups.com
Which version of Vert.x?
--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paweł Zięba

unread,
Jan 23, 2015, 10:34:19 AM1/23/15
to ve...@googlegroups.com
2.1.5

--
You received this message because you are subscribed to a topic in the Google Groups "vert.x" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/vertx/LiAvgym4oXo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to vertx+un...@googlegroups.com.

Tim Fox

unread,
Jan 23, 2015, 10:39:47 AM1/23/15
to ve...@googlegroups.com
It's very hard to tell how your application works by just looking at snippets.

I suggest you put together a working reproducer that demonstrates the problem, e.g. in GitHub, then someone can take a look.

Paweł Zięba

unread,
Jan 23, 2015, 10:42:22 AM1/23/15
to ve...@googlegroups.com
Sure. I'll try to reproduce the problem with smaller project during the weekend and share it with you through GitHub. 

Paweł Zięba

unread,
Jan 24, 2015, 9:25:37 AM1/24/15
to ve...@googlegroups.com
Unfortunately, I'm not able to reproduce it on smaller project. It seems to work efficiently for reasonable number of concurrent requests (by reasonable I mean max. 1k). For larger number it gets slower, but because I couldn't make HAProxy log every request (probably because of Keep-Alive + pipelining set on client), I couldn't reason if this is server or client issue. Most likely server as I tried two servers (node.js and vert.x) and results differs significantly (vert.x is faster btw.).

Here's my test project if you're interested: https://github.com/pz07/vertx-http-client-performance-test
And here's some test results for some configurations: https://github.com/pz07/vertx-http-client-performance-test/blob/master/vertx-test-results.txt

However, I think that there is some kind of concurrency problem in HttpClient implementation. If I change worker thread from being single threaded to multi threaded (by changing false to true in https://github.com/pz07/vertx-http-client-performance-test/blob/master/vertx-http-client-performance-test-client/src/main/java/pz/StartVerticle.java:15), I get inconsistent response body (sometimes it's empty, sometimes is more than was sent). This is probably why my office project is so unstable and gets strange results from time to time. I'll check it on Monday.

Regards.
pz

Tim Fox

unread,
Jan 24, 2015, 10:08:29 AM1/24/15
to ve...@googlegroups.com
Thanks.

I had a quick look at your application, but couldn't get a clear understanding of what it's trying to do.

I can see a worker verticle and non worker verticle and various messages being sent across the event bus, and some posts being made from the worker verticle...

Perhaps you can talk through it?

Paweł Zięba

unread,
Jan 24, 2015, 11:05:59 AM1/24/15
to ve...@googlegroups.com
Thanks for quick reply.

The application on github (let's call it application G) is supposed to simulate a real life application (application P) that I'm trying to build at work. I want to use the P application to carry on performance tests of yet another application (application S). Basically the idea is to post a big number of long running tasks on the server, long polling for the status of the task and once it's finished write to logs how long it took to finish the task.

In application on github:
- worker verticle simulates posting a big number of tasks on server. Because it has potentially long running for loop, I made it a worker verticle instead of a "normal" one,
- non-worker verticle does the long polling job. Once it receive the message that task was successfully posted, it starts polling for the status of the task. If the task is still under processing, it tries once again in 500 ms. If the task is finished, it mark the task as finished, log the messages and finishes work for this task.

Because it's all about performance test application, it's critical that the overhead of the test application itself is relatively small comparing to the application being tested. The problem was that when, under high load, I compared the time between request sent and response received in application P to response time on HAProxy where requests were sent, the difference was huge (around 5sec. in testing application an average compared to less than 200ms on HAProxy for every request). Unfortunately, I can't show it here, because HAProxy logs requests differently and I don't know how to make it log each request while preserving keep-alive and pipelining on vert.x side.

On Monday I'll check real HAProxy configuration and try to make HAProxy log every request. After that I'll rerun the tests.

Regards.


Tim Fox

unread,
Jan 24, 2015, 12:28:05 PM1/24/15
to ve...@googlegroups.com
On 24/01/15 16:05, Paweł Zięba wrote:
Thanks for quick reply.

The application on github (let's call it application G) is supposed to simulate a real life application (application P) that I'm trying to build at work. I want to use the P application to carry on performance tests of yet another application (application S). Basically the idea is to post a big number of long running tasks on the server, long polling for the status of the task and once it's finished write to logs how long it took to finish the task.

In application on github:
- worker verticle simulates posting a big number of tasks on server. Because it has potentially long running for loop, I made it a worker verticle instead of a "normal" one

I can't see anything in the code that is blocking, so I can't see any reason why this should be a worker verticle. What makes you think the for loop is long running?

Other than that, tbh I still haven't really got my head around what you're trying to show here, and unfortunately I don't really have time to look into it in more detail (Vert.x 3 release to do...)

But it seems to me the event bus communication is redundant - you could probably do everything in a single verticle and scale up with number instances

Jochen Mader

unread,
Jan 25, 2015, 7:42:21 AM1/25/15
to ve...@googlegroups.com
To cut the guessing a little shorter:
Get a trial version of Yourkit and take a look at your application.
Much easier than trying to use telepathic powers on your code ;)
Helped me finding a hidden blocking call in an external library.
Jochen Mader | Lead IT Consultant

codecentric AG | Elsenheimerstr. 55a | 80687 München | Deutschland
tel: +49 89 215486633 | fax: +49 89 215486699 | mobil: +49 152 51862390
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | www.more4fi.de

Sitz der Gesellschaft: Düsseldorf | HRB 63043 | Amtsgericht Düsseldorf
Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz

Paweł Zięba

unread,
Jan 26, 2015, 3:16:25 AM1/26/15
to ve...@googlegroups.com
"I can't see anything in the code that is blocking, so I can't see any reason why this should be a worker verticle. What makes you think the for loop is long running?"
I made it a worker verticle because of potentially long running for loop. In every iteration it perform non-blocking operation, but even though I guess it can take some time to send thousands of HTTP request and set event handlers for the response. 

"But it seems to me the event bus communication is redundant - you could probably do everything in a single verticle and scale up with number instances"
This is a way to go. I'll try to make it simpler (although the real life example is more complicated). 

In the end it would be great to have some kind of monitoring tools for event loop. I've read that this is expected in 3.0 version (and I've seen in the code measures package), so looking forward for it and I'll give it a try. 

Thanks.
pz 

Tim Fox

unread,
Jan 26, 2015, 3:56:34 AM1/26/15
to ve...@googlegroups.com
On 26/01/15 08:16, Paweł Zięba wrote:
"I can't see anything in the code that is blocking, so I can't see any reason why this should be a worker verticle. What makes you think the for loop is long running?"
I made it a worker verticle because of potentially long running for loop. In every iteration it perform non-blocking operation, but even though I guess it can take some time to send thousands of HTTP request and set event handlers for the response.

No, this is very quick. Remember sending is _asynchronous_ so the actual send occurs some time after the call has returned. Making the call has very little overhead.



"But it seems to me the event bus communication is redundant - you could probably do everything in a single verticle and scale up with number instances"
This is a way to go. I'll try to make it simpler (although the real life example is more complicated).

In my experience it usually pays to make the _simplest_ possible example that demonstrates the problem. You can get there by incrementally removing code, and at each stage running the tests to see if the problem still exists. At some point you may find the problem ceases to exist and that should give you a bit clue as to where the issue is :)

Lenders

unread,
Jan 27, 2015, 1:24:26 PM1/27/15
to ve...@googlegroups.com
Just a thought but I've had similar issues when the OS ran out of resources, have you checked the open file handles etc?

Peter

Disclaimer: This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. 

Paweł Zięba

unread,
Jan 30, 2015, 8:40:21 AM1/30/15
to ve...@googlegroups.com
Tim, 
you're right. Making a call is not a problem here. It takes less than 200ms to make 10k requests on my machine.

Landers, 
I have already checked with netstat the number of open connections. It was ok, not over my OS limits.

In the end I've fixed it by deploying more instances of the same verticle. 
Still I think monitoring tools would be very helpful in gaining better insight of what's happening inside of the vert.x instance and I'm looking forward for it.

Thanks for your help.


--
Reply all
Reply to author
Forward
0 new messages