Netty throughput benchmarks

1,419 views
Skip to first unread message

samantp

unread,
Feb 29, 2016, 5:55:15 AM2/29/16
to Netty discussions
Hello,

Greetings! I had this question about netty. 

While doing some experiments, found that if we have a simple netty TCP server that just returns back a fixed response, and a single threaded client sending data over a single channel continuously as soon/long as the channel "isWritable" was true, we could get very high throughputs. 100K requests per second were normal. The read at the client side channel handler was in this case just draining the response data returned.

However, when we try to do this by modifying clients to use large channel pools and such that a send over a channel is not done till the response of the previous is not received (by acquiring the channel during send, and releasing it on read), the throughput for the same input load goes down by almost a factor of 10. We tried to cover for the delay in sending successive requests by having very large channel pools.

In this scenario, though, we can support a large number of concurrent channels (tried with 20000 on our low end machines) without any problem, the throughput is very low. 

Is this expected? Especially considering that every channel now has only sparse data, in fact, only one request at a time to be read and serviced by server?

Or for such a scenario where a large number of connections send very sparse data and behave in a synchronous style, some different Netty settings would be more beneficial?

Tried to search on the web for netty benchmarks and expected numbers, but could not find numbers in the scenario mentioned above. 

Help on this would be much appreciated. 


Regards,
--samantp.

Rogan Dawes

unread,
Feb 29, 2016, 7:48:01 AM2/29/16
to Netty discussions
It makes sense to benchmark the scenarios in which you will use the app.

If your clients will be making a small number of requests, with time in between them, but you are concerned about how scalable the solution is, implement a client that does the above, and then generate a whole lot and see how your server scales.

If you will have a small amount of clients that will be sending large numbers of packets, and throughput is the critical factor rather than latency, create a benchmark that measures *that*.

From your email, it sounds like you are testing the latter, but deploying the former in reality.

Rogan


--
You received this message because you are subscribed to the Google Groups "Netty discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to netty+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/netty/b6b2033b-33a7-4e92-a088-a4686c4ec815%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

samantp

unread,
Mar 1, 2016, 1:48:35 AM3/1/16
to Netty discussions, ro...@dawes.za.net
Thanks much for the reply Rogan.

Actually, we want both (if that is achievable with asyncIO and netty that is). The scenario is many clients (e.g 10000) will be sending requests with time in between them, say 10 requests per second, and one netty server should handle all these clients and give an aggregated throughput of 100000 requests per second.

We tried this scenario, and found that with anything above a few hundred clients, the aggregated throughput given by netty does not exceed 15-20K requests per second. Situation is more or less unchanged for 1000 or 10000 concurrent connections. But if the same traffic is sent via one or two channels, 100K requests per second is very much achievable. 

Therefore, was assuming that the overheads (user-kernel mode switches, epoll, io, etc. not sure) of reading for each channel (and that too, with a yield of only one or two records each time) does become prominent compared to request processing time in the worker. Especially considering that the amount of processing to be done is not heavy (say, a simple hardcoded response is to be sent). Then, the server is generally idle in terms of CPU utilization.

Regards,
--samantp

Rogan Dawes

unread,
Mar 1, 2016, 2:28:12 AM3/1/16
to Netty discussions
OK, so now we know what we are looking at :-) 

A possibly stupid question, how did you simulate your client load? Did you run several instances on a single machine, or several instances over 100 machines, etc? Would hate to be optimising/debugging the server, when the problem is in the client simulation!

Also, were all the clients synchronized/sending their messages at the same time, or spread out? i.e. did you start them all at the same time, and have a method triggering a request at System.currentTimeMillis() % 10000? Or was there a degree of randomness in the sending of the messages from the client?

Rogan

samantp

unread,
Mar 1, 2016, 3:48:14 AM3/1/16
to Netty discussions, ro...@dawes.za.net
Hello Rogan,

Please find my responses inlined.


On Tuesday, 1 March 2016 12:58:12 UTC+5:30, Rogan Dawes wrote:
OK, so now we know what we are looking at :-) 

A possibly stupid question, how did you simulate your client load? Did you run several instances on a single machine, or several instances over 100 machines, etc? Would hate to be optimising/debugging the server, when the problem is in the client simulation!

I knew this question was coming :), and clearly not a stupid one. No, actually, did not run those many clients. Instead had a single threaded (or at max 10 threads) client sending over a channel pool of size 100, 500, 10000 channels and so on. So, this client just acquires a channel, writes, and the handler, in its read function drains the response sent by the server. Everything was async. The aquire of the channel from the pool, writing to the channel, and reading of the response in the handler on client side. In fact, if we have multiple clients (order of 3-4) each with a fraction of the number of connections as a single client with a large connection pool, the situation does not change much.


Also, were all the clients synchronized/sending their messages at the same time, or spread out? i.e. did you start them all at the same time, and have a method triggering a request at System.currentTimeMillis() % 10000? Or was there a degree of randomness in the sending of the messages from the client?

So, the messages were spread out, but only slightly, since it was a while loop on the client side that sent data on the acquired channel. A sleep at send was done after sending around n records where n is a function of the number of channels in the pool. E.g, sleep for a 'millisecond' after sending 20000 requests or more.

Rogan Dawes

unread,
Mar 1, 2016, 4:05:09 AM3/1/16
to Netty discussions
So, I wonder then, given your description of the client setup, whether you had periods of "silence" from your client, that would explain why there were fewer messages sent/received.

i.e if the following represents the aggregate message flow over time for your few clients making continuous requests

OOOOOOOOOOOOOOOO == 100,000 requests per second

and the following represents the aggregate message flow over time of the "many clients making intermittent requests":

ooOOoo_____ooOOoo___

I would hardly be surprised to find that the total message count is significantly lower.

I'd be curious to see what happens if you simply remove the sleep from your clients.

Rogan


samantp

unread,
Mar 1, 2016, 4:33:49 AM3/1/16
to Netty discussions, ro...@dawes.za.net
Hello Rogan,

Even tried without the sleep, but similar results. In fact, the sleep was added a bit later in the game, to avoid queing up too many pending requests on the client side channel pool and out of memory errors due to it for very large number of requests.

Regards,
--Parikshit N. Samant.

Rogan Dawes

unread,
Mar 1, 2016, 4:46:40 AM3/1/16
to Netty discussions
Ok, it might be interesting to run a few tests, then, scaling the number of (simulated) clients from small to large, and see where the performance starts to drop off.

Not having done much performance testing myself, I'd nonetheless suggest trying to profile your server to see where the bottlenecks appear while doing so.

Rogan


samantp

unread,
Mar 1, 2016, 5:44:29 AM3/1/16
to Netty discussions, ro...@dawes.za.net
Certainly Rogan,

And thanks a ton for spending time to think over it. 

Regards,
--samantp.

Peeyush Sharma

unread,
Apr 13, 2017, 6:33:51 PM4/13/17
to Netty discussions, ro...@dawes.za.net
Hi samantp,

did you find problem in your setup, it would be great for this thread.

Thanks
Peeyush

samantp

unread,
Apr 14, 2017, 1:42:21 AM4/14/17
to Netty discussions, ro...@dawes.za.net
Hi Peeyush,

We had to cut our experiments short, unfortunately, and needed to redesign such that much fewer parallel channels were used, to maintain high throughput.

Regards,
--Parikshit N. Samant.
Reply all
Reply to author
Forward
0 new messages