Vertx eventloop utilizes 100% cpu in simple application

Kushan Athukorala

unread,

Nov 9, 2014, 12:22:31 AM11/9/14

to ve...@googlegroups.com

Hello,

I am trying to figure out maximum TPS(transactions per second) that can be achieved from simple HttpServer with single vertx instance(one JVM) with two verticle instances running on 2 core server.

As you see in the code below, there is no blocking calls in the code, and this runs on two event loops, which is configured using JVM_OPTS.

public class HttpServer extends Verticle {

  public void start() {
    vertx.createHttpServer().requestHandler(new Handler<HttpServerRequest>() {
      public void handle(HttpServerRequest req) {
        req.response().headers().set("Content-Type", "text/plain; charset=UTF-8");
    //req.response().headers().set("Content-Length","27");
        req.response().setChunked(true);
        req.response().write("<status>AUTHORIZED</status>", "UTF-8").end();
      }
    }).listen(8080);
  }
}

I loaded this HttpServer with 1000 jmeter threads from a separate server machine and I could achieve 60,000 TPS. The jmeter request is a simple Http GET invocation and response contains only few bytes as you see in the code.

When this was happening, I separately observed the HttpServer performance using jvisualvm and I did CPU profiling to check the CPU utilization. CPU utilization was 100% at 60,000 TPS.

When I present these results, some argued that the CPU utilization cannot be 100% for this kind of simple application without much processing involved in the server side. Furthermore,they said that the TPS would be much less if the response contains few kilobytes or json body is used in a POST request, in turn, vertx would give poor performance in large applications.

Is this argument is correct?

If that is correct, in what ways I can further optimize the CPU utilization of vertx eventloop?

Regards,
Kushan

Tim Fox

unread,

Nov 9, 2014, 2:30:39 AM11/9/14

to ve...@googlegroups.com

Not really sure what you're getting at here

--
You received this message because you are subscribed to the Google Groups "vert.x" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vertx+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Fox

unread,

Nov 9, 2014, 2:37:29 AM11/9/14

to ve...@googlegroups.com

But if you need convincing about Vert.x performance, here are some independent benchmarks that were done some time back:

http://www.techempower.com/benchmarks/#section=data-r8&hw=i7&test=plaintext

As you can Vert.x is right at the top in terms of performance compared to everything else. This is using a trivial HTTP server just like in your example.

Jez P

unread,

Nov 9, 2014, 3:59:39 AM11/9/14

to ve...@googlegroups.com

Surely your maximum TPS is entirely application-dependent (and hardware dependent). If you add some heavy processing then yes your throughput would drop, assuming you remain CPU-bound. Those who argued that the CPU utilization cannot be 100% are wrong, both firstly on the fact that you proved that it was 100% for the simple application and secondly that their position amounts to "4 threads each doing a while (true) i=1" (another simple application) wouldn't lead to a CPU usage of 100%.

But so what? What's the value of the maximum TPS on a single box without a comparison to other platforms? Absolutely none in its own right. That's why Tim points out the benchmark comparisons in his message.

I guess my question is what exactly are you trying to demonstrate? Why do you want to determine the maximum TPS (and why on your specific hardware config?) - what meaning are you trying to ascribe to it?

Alexander Lehmann

unread,

Nov 9, 2014, 6:39:48 AM11/9/14

to ve...@googlegroups.com

I fail to see what you are not expecting in this case, you are doing http calls with jmeter as fast as possible for the platform and machine and the CPU utilization is 100%

The limiting factor for a http server is usually not the cpu but the network of the clients or the processing of the requests in your network (e.g. database or filesystem).

Kushan Athukorala

unread,

Nov 9, 2014, 7:16:58 AM11/9/14

to ve...@googlegroups.com

Hi Tim,

Thanks for sharing the benchmark results.

Regards,
Kushan

Kushan Athukorala

unread,

Nov 9, 2014, 7:51:27 AM11/9/14

to ve...@googlegroups.com

Hi Jez,

Thanks for your explanation.

We are developing a business application, which requires 10K to 100K TPS. We are still in the technology selection stage of the product. I try to convince vertx over the other async technologies. The question is, it is difficult to convince some people, without a clear explanation why this simple app become CPU-bound at 60K TPS.

Regards,
Kushan

Kushan Athukorala

unread,

Nov 9, 2014, 8:00:19 AM11/9/14

to ve...@googlegroups.com

Hi Alexander,

I agree with you.

Thanks,
Kushan

Jez P

unread,

Nov 9, 2014, 8:21:16 AM11/9/14

to ve...@googlegroups.com

Hi Kushan,

I think you have to start with the other async technologies too and see at what point they become CPU bound (on the same hardware) and what their limiting throughput is, but you have to make it a like for like comparison (ie access via http server, which means if they don't provide one you need to build one). Given that you're exploring which async technology to adopt, you need relative benchmarks, not just the benchmark for a single technology on its own.

I think Alexander's explanation is the clearest one - do the people you're trying to convince actually understand non-blocking i/o? In an ideal world you want to be CPU-bound (as long as your algorithms aren't horribly inefficient). Blocking threads will reduce the average CPU usage but will absolutely hammer performance and a single server will scale a lot less effectively.

Cheers,

Jez

Tim Fox

unread,

Nov 9, 2014, 8:27:41 AM11/9/14

to ve...@googlegroups.com

On 09/11/14 12:51, Kushan Athukorala wrote:

Hi Jez,

Thanks for your explanation.

We are developing a business application, which requires 10K to 100K TPS. We are still in the technology selection stage of the product. I try to convince vertx over the other async technologies. The question is, it is difficult to convince some people, without a clear explanation why this simple app become CPU-bound at 60K TPS.

I am puzzled why you think this is strange/bad. Note you are using a *single thread* here (you are only running a single instance of your verticle), and you're getting 60k req/responses per second. That seems pretty damned good to me.

--

Kushan Athukorala

unread,

Nov 9, 2014, 9:06:38 AM11/9/14

to ve...@googlegroups.com

Hi Tim,

Small correction to your comment: this simple app uses two event loops and two verticle instances, which, in turn, uses *two threads*. I selected this configuration because the server machine had two cores.

Thanks,
Kushan

Jez P

unread,

Nov 9, 2014, 10:26:33 AM11/9/14

to ve...@googlegroups.com

However, that doesn't really change the substance of Tim's comment. 30k/CPU is hardly disappointing. Do you have comparable stats from competitor technologies on the same hardware?

Kushan Athukorala

unread,

Nov 9, 2014, 11:48:48 PM11/9/14

to ve...@googlegroups.com

Hi Jez,

I will do the same with netty and get back to you soon.

Regards,
Kushan

Jez P

unread,

Nov 10, 2014, 1:46:07 AM11/10/14

to ve...@googlegroups.com

Hi Kushan,

I'd be surprised if you see much difference with netty since vert.x is built right on top of netty and Norman Maurer (netty committer) has contributed a lot to vert.x. I would expect that you'll essentially be exercising the same code under the hood.

However, at least the numbers will give you some context.

Cheers,

Jez

Tim Fox

unread,

Nov 10, 2014, 2:31:25 AM11/10/14

to ve...@googlegroups.com

You should let Vert.x decide how many event loops, and use 2 * number of cores verticles.

Tim Fox

unread,

Nov 10, 2014, 2:34:34 AM11/10/14

to ve...@googlegroups.com

These results have already been obtained with Netty and many other frameworks:

http://www.techempower.com/benchmarks/#section=data-r8&hw=i7&test=plaintext

The system under test is a trivial HTTP server almost exactly like in your example.

Everything is open, in GitHub, and reproduceable https://github.com/TechEmpower/FrameworkBenchmarks

Kushan Athukorala

unread,

Nov 10, 2014, 7:14:28 AM11/10/14

to ve...@googlegroups.com

Hi Tim,

Could you let me know the logic behind your comment?

"You should let Vert.x decide how many event loops, and use 2 * number of cores verticles."

When I tested simple HttpServer I found the following two conditions should be satisfied to get the the optimum performance.

1. Number of event loops >= number of cores
2. Number of verticle instances >= number of event loops

Please let me know my observation is correct or not.

Thanks,
Kushan

Tim Fox

unread,

Nov 10, 2014, 7:26:24 AM11/10/14

to ve...@googlegroups.com

On 10/11/14 12:14, Kushan Athukorala wrote:

Hi Tim,

Could you let me know the logic behind your comment?

"You should let Vert.x decide how many event loops,

Vert.x will automatically choose number of event loops to be 2 * number of cores. There's rarely a good need to override that.

and use 2 * number of cores verticles."

And you need as many verticles as cores

Tim Fox

unread,

Nov 10, 2014, 7:27:25 AM11/10/14

to ve...@googlegroups.com

On 10/11/14 12:26, Tim Fox wrote:

On 10/11/14 12:14, Kushan Athukorala wrote:

Hi Tim,

Could you let me know the logic behind your comment?

"You should let Vert.x decide how many event loops,

Vert.x will automatically choose number of event loops to be 2 * number of cores. There's rarely a good need to override that.

and use 2 * number of cores verticles."

And you need as many verticles as cores

I mean as event loops ;)

What I'm trying to say is.. don't try and override what Vert.x does, you will probably make performance worse.

Reply all

Reply to author

Forward