is dropwizard by default using jetty NIO

900 views
Skip to first unread message

Raymond Hon

unread,
May 6, 2014, 9:47:38 PM5/6/14
to dropwiz...@googlegroups.com
hi guys,

I want to know if jetty by default using jetty NIO? 
Currently, I am using dropwizard to build my contextual engine. I am experiencing scale problem of http post when the post content is not small (100K).
Since I am not familiar with jetty NIO, is there only one thread listening to all port and dispatch to the pool of worker threads. OR a thread per core?
If my http post size is big, will it make the dispatching thread holding up too long to pull the content and send to worker threads, in turn it dragged the performance/ throughput of the server.

thanks
ray


Aaron Baff

unread,
May 6, 2014, 10:34:44 PM5/6/14
to dropwiz...@googlegroups.com
I'm working with Ray, so I'll fill in some more details. First, we're on v0.7.0.

With just a simple GET request, no data payload and no work, we get more or less the performance we expect.

In the controller where the large POST request is handled (a few different test samples, but the largest is ~83KByte) and does no work and immediately returns an "OK", I'm seeing some significant scaling penalty in terms of throughput. Here are some average throughput for a sample that is 54KByte using JMeter Summary Report, HTTPClient4, with a 2nd server acting as a Remote JMeter server.

1 Thread:  ~76.6/sec
2 Thread: ~111.9/sec
4 Thread: ~170.6/sec

As you can see, for such a simple accept data and return immediately, this should have at least close to linearly scaling, while the results above clearly do not.

JVM (1.7.0_17):
~12GB memory
-XX:+UseConcMarkSweepGC
-XX:SurvivorRatio=16
-XX:PermSize=128m
-XX:MaxPermSize=256m
-XX:NewSize=512m

Server (both machines, JMeter Remote & DropWizard Application instance):
Xeon E5462, 4C8T
32GB memory
CentOS, kernel 2.6.32-358.el6.x86_64

--Aaron

Tatu Saloranta

unread,
May 7, 2014, 2:25:30 AM5/7/14
to dropwiz...@googlegroups.com
Have you checked how much bandwidth there is available between machines?
54kB * 170rps would give 9180kB/sec load, which would saturate 100mbps link.
If there is congestion at link level, latencies increase correspondingly, limiting throughput.

With 4 concurrent requests use of NIO really should not make any difference, for what that's worth. NIO may be beneficial with high rates of concurrency, but as a general rule of thumb, developers' expectations are much higher than actual benefits. It sometimes feels like NIO was a solution looking for a problem. :)

-+ Tatu +-



--
You received this message because you are subscribed to the Google Groups "dropwizard-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dropwizard-us...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jonathan mukiibi

unread,
May 7, 2014, 3:42:19 AM5/7/14
to dropwiz...@googlegroups.com
Yeah Jetty is the default server in there..just use that

Aaron Baff

unread,
May 7, 2014, 1:27:56 PM5/7/14
to dropwiz...@googlegroups.com
Good thought, but they are connected via 1Gbit Ethernet switched ethernet. And I just transferred a large file between the 2 machines and got >50MByte/sec, so that's 400 Mbit/sec, so I don't think we're capped on the transfer between the machines. I do have JMeter set to use KeepAlive's as well.

I just did another set of runs, same case as before but watching the CPU utilization (via top), and found that at 8 threads and above it was basically maxing out 1 CPU core (>95% consistently). Plus, based on below, you'll see that the average latency increases as more threads are added which seems to indicate some kind of contention somewhere.

1 thread
Avg Latency: 6ms
Std Dev:     2.42
Throughput:  79.6/sec

2 thread
Avg Latency: 7ms
Std Dev:     3.0
Throughput:  108.5/sec

4 thread
Avg Latency: 7ms
Std Dev:     4.5
Throughput:  167.2/sec

8 thread
Avg Latency: 10ms
Std Dev:     5.39
Throughput:  208.1/sec

16 thread
Avg Latency: 15ms
Std Dev:     7.09
Throughput:  227.3/sec

32 thread
Avg Latency: 26ms
Std Dev:     13.7
Throughput:  227.4/sec

64 thread
Avg Latency: 45ms
Std Dev:     27.85
Throughput:  231.8/sec

--Aaron

Tatu Saloranta

unread,
May 7, 2014, 2:09:33 PM5/7/14
to dropwiz...@googlegroups.com
Ok good, just wanted to suggest something you can easily rule out.
Another thing that could help is double-check that nothing in your code could possibly use synchronization  in wrong place.

From symptoms it sounds possible that same part of request processing by Jetty is using just a single thread: I think only NIO-based connector (not BIO) would do that; although it should by default be configured to use some fraction/multiple of number of cores available. And not 1, unless on single-core system.

Someone more familiar with configuring Jetty connectors may be able to help more.

-+ Tatu +-

Aaron Baff

unread,
May 7, 2014, 2:23:22 PM5/7/14
to dropwiz...@googlegroups.com
In the test case above, none of our code is actually being used. The controller simply returns Response.ok("OK now.").build().

Tatu Saloranta

unread,
May 7, 2014, 3:17:51 PM5/7/14
to dropwiz...@googlegroups.com
Ok so payload is only for request (POST payload), but minimal response.

One thing that should not matter, but that I remember having an effect on client side: if content being sent is not read (in this case, service is just ignoring it?), container will have to read the content. So maybe add code to read and drop POST payload (or bind it to byte[] or String).
Another thing that can add CPU load is gzip compression; but then again it should scale linearly.

-+ Tatu +-

Aaron Baff

unread,
May 7, 2014, 3:26:53 PM5/7/14
to dropwiz...@googlegroups.com
Tatu,

For the purposes of the test, we're ignoring the content of the POST that is being sent. Just to see what the scaling looks like without our processing code active. In production we will be doing something with the POST content, so we do actually want the container to read the content and do it's normal stuff with it.

We're actually thinking of going another route, instead of the POST being sent to us, we'll have the client send us a link where we can then go fetch the data.

--Aaron

Tatu Saloranta

unread,
May 7, 2014, 3:31:23 PM5/7/14
to dropwiz...@googlegroups.com
On Wed, May 7, 2014 at 12:26 PM, Aaron Baff <driz...@gmail.com> wrote:
Tatu,

For the purposes of the test, we're ignoring the content of the POST that is being sent. Just to see what the scaling looks like without our processing

I understand that. I was only suggesting explicit reading, because I remember some odd artifacts from other cases; although these are on client-side.
Without all content being read from the connection it can not be reused. I don't know how Jetty handles with this, ideally it will simply read the content through,.
 
code active. In production we will be doing something with the POST content, so we do actually want the container to read the content and do it's normal stuff with it.

Yes.
 

We're actually thinking of going another route, instead of the POST being sent to us, we'll have the client send us a link where we can then go fetch the data.

Ok. Approach test shows should work, but sending a link would have lower latency.

-+ Tatu +-

Lance N.

unread,
May 7, 2014, 5:24:57 PM5/7/14
to dropwiz...@googlegroups.com
Do you create a new HTTP connection for each upload? Or do you reuse one connection in each thread?

Nathan Fisher

unread,
May 7, 2014, 8:56:54 PM5/7/14
to dropwiz...@googlegroups.com
On 7 May 2014 03:34, Aaron Baff <driz...@gmail.com> wrote:
I'm working with Ray, so I'll fill in some more details. First, we're on v0.7.0.

With just a simple GET request, no data payload and no work, we get more or less the performance we expect.

In the controller where the large POST request is handled (a few different test samples, but the largest is ~83KByte) and does no work and immediately returns an "OK", I'm seeing some significant scaling penalty in terms of throughput. Here are some average throughput for a sample that is 54KByte using JMeter Summary Report, HTTPClient4, with a 2nd server acting as a Remote JMeter server.

1 Thread:  ~76.6/sec
2 Thread: ~111.9/sec
4 Thread: ~170.6/sec

As you can see, for such a simple accept data and return immediately, this should have at least close to linearly scaling, while the results above clearly do not.

JVM (1.7.0_17):
~12GB memory
-XX:+UseConcMarkSweepGC
-XX:SurvivorRatio=16
-XX:PermSize=128m
-XX:MaxPermSize=256m
-XX:NewSize=512m

Server (both machines, JMeter Remote & DropWizard Application instance):
Xeon E5462, 4C8T
32GB memory
CentOS, kernel 2.6.32-358.el6.x86_64

--Aaron


Hi Aaron & Raymond,

A couple of questions;
  1. what is your goal/expectation for RPS?
  2. what is your expected latency at that RPS?
  3. have you profiled the application to see where the bottleneck is using jConsole or Censum?
  4. are there any errors being emitted in the server logs (dropwizard and syslog)?
  5. do you have iptables enabled on either the client and/or server (if it's a problem it should be visible in syslog)?
  6. can iptables temporarily be disabled on both (until you eliminate it as a limiting factor)?
  7. what are your settings for the following sysctl settings [1];
    • net.core.somaxconn # limit of inflight TCP handshakes/socket
    • net.ipv4.tcp_syn_max_backlog # system limit of inflight TCP handshakes
    • net.core.rmem_max # receive buffer memory  
    • net.ipv4.tcp_max_tw_buckets
    • net.ipv4.ip_conntrack_max
    • net.ipv4.netfilter.ip_conntrack_max
    • net.ipv4.netfilter.ip_conntrack_tcp_timeout_established
  8. what thread limits are configured for dropwizard?
  9. what is the full command for running the Dropwizard app including any switches/env vars (e.g. max heap size)?
  10. are these two hosts the only active devices on the switch?
  11. if no, how busy is the switch overall (no switch fabric will/can provide 1Gb/port sustained for all ports)?
  12. can you share the actual contents of the POST method (or a minimal sample that reproduces the performance limit you're seeing)?
  13. can you share the jmeter test plan, jmeter settings/profile/etc?
  14. if no, can you share the high-level details (e.g. how long is your test plan running for, how many threads, etc)?
  15. can you create a test plan with "stepped intervals" using the jmeter Throughput Shaping Timer [2] to isolate at what point the app starts failing and to validate stable throughput (run each target RPS for at least a couple of mins)?
For analysis I find it useful to plot out the results rather than take an avg/sd.  Gil Tene has a good presentation[3] that covers off why Average and Standard Deviation aren't the best metric for latency measurements.

Here's a sample plot I put together using python and pandas operating on the output from the associated with the attached Jmeter test plan; http://bit.ly/jmAnalysis

The example provided only shows a little noise at the beginning of the test. I've had some tests where the RPS plot was all out of whack which is a good indication that the systems configuration (network+OS+client+app) isn't able to sustain the level throughput you're throwing at it. I'd be curious how stable the throughput is for each of your tests.

1 - To retrieve the current value execute; sysctl ${VALUE}

Kind Regards,
--
Nathan Fisher
Reply all
Reply to author
Forward
0 new messages