Recommended settings for a load-tester?

152 views
Skip to first unread message

Andrew Cholakian

unread,
Apr 20, 2012, 12:07:55 AM4/20/12
to asyncht...@googlegroups.com
Hi all, I'm building an HTTP benchmarker/load-tester using AHC ( https://github.com/andrewvc/engulf ) and am wondering what the best AHC configuration for me to use would be.

In general AHC works great, however, when I try and hit a server with keep-alive disabled, it seems to choke up. Running only 4 concurrent requests/second against an nginx server with keep-alive disabled the process locks up with no error messages. Enabling keep-alive prevents this completely.

Does anyone have any ideas what the issue could be? I would guess that it's file descriptor related, but I've tried upping available FDs with ulimit without luck.

Stéphane Landelle

unread,
Apr 20, 2012, 1:37:42 AM4/20/12
to asyncht...@googlegroups.com
Hi Andrew,

Are you sure you want to build your own project?
Competition is a good thing, but we've been building Gatling for 1 year now and it uses AHC:

Cheers,

Stephane

2012/4/20 Andrew Cholakian <andr...@gmail.com>
Hi all, I'm building an HTTP benchmarker/load-tester using AHC ( https://github.com/andrewvc/engulf ) and am wondering what the best AHC configuration for me to use would be.

In general AHC works great, however, when I try and hit a server with keep-alive disabled, it seems to choke up. Running only 4 concurrent requests/second against an nginx server with keep-alive disabled the process locks up with no error messages. Enabling keep-alive prevents this completely.

Does anyone have any ideas what the issue could be? I would guess that it's file descriptor related, but I've tried upping available FDs with ulimit without luck.

--
You received this message because you are subscribed to the Google Groups "asynchttpclient" group.
To view this discussion on the web visit https://groups.google.com/d/msg/asynchttpclient/-/9WS3cbqO98MJ.
To post to this group, send email to asyncht...@googlegroups.com.
To unsubscribe from this group, send email to asynchttpclie...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/asynchttpclient?hl=en.

Andrew Cholakian

unread,
Apr 20, 2012, 10:48:55 AM4/20/12
to asyncht...@googlegroups.com
Hey Stephane, thanks for the link. Gatling looks like a really cool project!

I'd probably use it if it weren't for half the fact being that engulf
is just really fun to hack on. I love networks, systems, clojure, and
performance problems, so as cool as gatling is, it doesn't fit the
bill personally :).

I'm wondering, however, Stephane, if you've hit this issue at all?
With 4 concurrent requests, and keep alive disabled, AHC seems to
choke after the 16,000th request or so on my OSX box hitting nginx
locally. Have you seen anything like this? I feel like there's some
kind of allocation bug that must be going wrong no?

-- Andrew

Stéphane Landelle

unread,
Apr 20, 2012, 11:03:34 AM4/20/12
to asyncht...@googlegroups.com
Hi Andrew,

Yep, I had a better look at your project after I replied, and it seems fun and totally different from Gatling, so... good luck and have fun!
I haven't run such a long test for a while, so I can't tell right now. I'll run one probably next week and let you know the results.

Cheers,

Stephane 

2012/4/20 Andrew Cholakian <and...@andrewvc.com>

Hubert Iwaniuk

unread,
Apr 20, 2012, 5:46:17 PM4/20/12
to asyncht...@googlegroups.com
Hi Andrew,

You are most likely running out of file descriptors.
Never tuned TCP stack on OSX, but since it is BSD really that should be no problem.
If your test starts 'chocking' check netstat, you should see a lot of FIN_WAITs

Cheers,
Hubert.

April 20, 2012 4:48 PM
Hey Stephane, thanks for the link. Gatling looks like a really cool project!

I'd probably use it if it weren't for half the fact being that engulf
is just really fun to hack on. I love networks, systems, clojure, and
performance problems, so as cool as gatling is, it doesn't fit the
bill personally :).

I'm wondering, however, Stephane, if you've hit this issue at all?
With 4 concurrent requests, and keep alive disabled, AHC seems to
choke after the 16,000th request or so on my OSX box hitting nginx
locally. Have you seen anything like this? I feel like there's some
kind of allocation bug that must be going wrong no?

-- Andrew

On Thu, Apr 19, 2012 at 10:37 PM, Stéphane Landelle

April 20, 2012 7:37 AM
Hi Andrew,

Are you sure you want to build your own project?
Competition is a good thing, but we've been building Gatling for 1 year now and it uses AHC:

Cheers,

Stephane


--
You received this message because you are subscribed to the Google Groups "asynchttpclient" group.
To post to this group, send email to asyncht...@googlegroups.com.
To unsubscribe from this group, send email to asynchttpclie...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/asynchttpclient?hl=en.
April 20, 2012 6:07 AM

Andrew Cholakian

unread,
Apr 20, 2012, 7:57:27 PM4/20/12
to asyncht...@googlegroups.com
My question there, is that with only 4 concurrent clients FDs should not be an issue correct? AHC's connection pool should handle this correctly no? This is going at an extremely fast rate, hitting a local nginx server, so I'm wondering if FDs are somehow cached for a short period?
postbox-contact.jpg
postbox-contact.jpg

Hubert Iwaniuk

unread,
Apr 21, 2012, 10:23:58 AM4/21/12
to asyncht...@googlegroups.com
I thought you said no keep alive is used.
Connection pool is used only for keep alive.
Even if you would use max connections per client say 4 and sent 100 requests at the same time, only 4 would use connections from pool, remaining 96 would create new connections and close them as soon responses are finished.

Cheers,
Hubert Iwaniuk
postbox-contact.jpg
postbox-contact.jpg

Andrew Cholakian

unread,
Apr 21, 2012, 1:48:04 PM4/21/12
to asyncht...@googlegroups.com
Hubert, that's exactly the case, no keep alive is used. I meant to say that with only 4 simultaneous clients no more than 4 FDs should be consumed at any one time correct? netstat shows that only 4 are used simultaneously.

Stephane, I included notes on my tests below if you're interested in helping repro them with Gattling.

The key factor here, it would seem is connections per second (not reqs/second). If, with keep-alive disabled, I test against a slow server, one that only can perform say 200 reqs/second AHC is fine, however if I test against a server that can perform with 1,600 reqs/second AHC falls over consistently just after 16k reqs have completed (in 3 runs it failed between 16,100 and 16,300 reqs).

It would seem that allocating a large number of connections in a short time span is the issue,. Any ideas what this could be?

Testing Methodology:

Software config:

1. Enabled connection pooling in AHC
3. Engulf set to Concurrency:4, Requests: 20,0000
4. All tests performed across localhost

Test Scenarios:

1. Fast nginx with keep-alive disabled
In this scenario we're directly hitting a small static HTML file served by nginx. In separate test runs AHC always hangs after ~ 16k reqs (observed range 16,100 - 16,300). Throughput was 1,600 reqs/second up till the point of failure.

2. Slow nginx with keep-alive disabled
Nginx was proxied to python -m SimpleHTTPServer, which showed a simple directory listing. This slowed throughput to ~ 200 reqs/second. AHC completed the full test, fulfilling all 20,000 requests

3. Fast nginx with keep-alive enbaled
Same as test 1, but with keep-alive. In this case throughput hit ~ 6,000 reqs/second, and all 20,000 requests completed.

Notes:

In all tests netstat indicated that there were only 4 simultaneous connections to the server open.

My feeling is that there's a time based allocation issue, where some resource, perhaps FDs, does not get released fast enough to make rapid, repeated connection attempts successful. For very high throughput without keep-alive this is clearly an issue.
postbox-contact.jpg
postbox-contact.jpg

Tatu Saloranta

unread,
Apr 21, 2012, 2:04:35 PM4/21/12
to asyncht...@googlegroups.com
On Sat, Apr 21, 2012 at 10:48 AM, Andrew Cholakian <and...@andrewvc.com> wrote:
Hubert, that's exactly the case, no keep alive is used. I meant to say that with only 4 simultaneous clients no more than 4 FDs should be consumed at any one time correct? netstat shows that only 4 are used simultaneously.


Isn't that what http clients are supposed to do as per http spec?
To limit number of TCP connections used concurrently, when connecting to a specific end point; regardless of whether said connections are reused or not.

-+ Tatu +-

 

Andrew Cholakian

unread,
Apr 21, 2012, 3:26:44 PM4/21/12
to asyncht...@googlegroups.com
Well, I think there's a bit of misunderstanding here. In this setup AHC does only ever have 4 connections open at once. This is perfect, it's working exactly fine there. The problem seems to be that if I setup and tear down thousands of connections each second, AHC hangs. So, even though only 4 at a time are ever open, resources seem to become exhausted.

Tatu Saloranta

unread,
Apr 21, 2012, 10:57:30 PM4/21/12
to asyncht...@googlegroups.com
On Sat, Apr 21, 2012 at 12:26 PM, Andrew Cholakian <andr...@gmail.com> wrote:
> Well, I think there's a bit of misunderstanding here. In this setup AHC does
> only ever have 4 connections open at once. This is perfect, it's working
> exactly fine there. The problem seems to be that if I setup and tear down
> thousands of connections each second, AHC hangs. So, even though only 4 at a
> time are ever open, resources seem to become exhausted.

Ok, right. And that's where not using connection pool would do it --
there is only 64k port numbers to use.
The problem with TCP is that while specific connection (src-ip/port to
dst-ip/port) should get closed, it must not be reused within certain
additional time window (to avoid possible data corruption due to
in-flight packets etc; read tcp book like Stevens for details).
This used to be defined as 2 minutes; meaning that ports only get
"recycled" by OS. This would give theoretical upper bound of about 500
ports used per second. And on Windows, it'll happen faster based on my
experiences.
Delay used is configurable, but at OS level; can't be done from Java.
And usually having to change it suggests other issues.

Apologies if I totally misunderstood set up here; it's just that above
problem has bitten me in the past so thought I'll mention it.

-+ Tatu +-

Hubert Iwaniuk

unread,
Apr 22, 2012, 10:01:53 AM4/22/12
to asyncht...@googlegroups.com
Simply put: use Keep alive.

Otherwise you are testing clients (test traffic source) tcp stack, as
well as servers.
With all load test you should make sure that you are not saturating
client, and not using keep alive with well performing client (like
AHC) is problems.

BTW, according to your project.clj you are testing old version of AHC,
and master branch doesn't compile.

Cheers,
Hubert.

Andrew Cholakian

unread,
Apr 22, 2012, 11:32:48 AM4/22/12
to asyncht...@googlegroups.com
Tatu, thanks for the reply. That makes sense, in terms of port starvation. I am curious, however, why the magic number here seems to be 16000 not 64k, though that could be an OSX dependent thing, I'll try and find out more.

I'm also surprised that it didn't raise an exception.

Also, Hubert, I appreciate the input! I agree that keep-alive is good for some testing, but not all. If you're testing a stack you might want to gauge pure application throughput, in which case keep-alive should definitely be turned on. However, if your'e testing your full stack, and you want to simulate thousands of different users hitting your stack you definitely don't want keep-alive enabled.

Also Hubert, to your point about testing the load tester vs. the server, you're 100% correct. That's the tricky thing about building a load-tester correctly ! I'm planning on tackling this in engulf by adding slave support so you can just gang a whole bunch of servers together to help on that front. Currently, I'm really using nginx to load-test engulf, not the other way around.

BTW, thanks for pointing out that master isn't building right now for engulf. If you do want to try it, the jar download does work, I'm going to try and fix up master sometime soon, it's in a half-refactored state at the moment.

-- Andrew
> To post to this group, send email to asynchttpclient@googlegroups.com.
> To unsubscribe from this group, send email to asynchttpclient+unsubscribe@googlegroups.com.

Tatu Saloranta

unread,
Apr 22, 2012, 12:37:49 PM4/22/12
to asyncht...@googlegroups.com
On Sun, Apr 22, 2012 at 8:32 AM, Andrew Cholakian <andr...@gmail.com> wrote:
> Tatu, thanks for the reply. That makes sense, in terms of port starvation. I
> am curious, however, why the magic number here seems to be 16000 not 64k,
> though that could be an OSX dependent thing, I'll try and find out more.

Yeah. From what I remember, on Windows number was lower too (I think
about 30k or so). But this was few years back.

> I'm also surprised that it didn't raise an exception.

True, it would seem appropriate to get an exception in this case.

On testing: this limit would not apply if you could test from multiple
machines. That's more work, but would also make test more realistic,
with respect to load.

-+ Tatu +-

Stéphane Landelle

unread,
Apr 24, 2012, 6:37:43 AM4/24/12
to asyncht...@googlegroups.com
Hi,

You might also be short of ephemeral ports:

Cheers,

Steph

2012/4/22 Tatu Saloranta <tsalo...@gmail.com>

-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "asynchttpclient" group.
To post to this group, send email to asyncht...@googlegroups.com.
To unsubscribe from this group, send email to asynchttpclie...@googlegroups.com.

Yatindra AP

unread,
Apr 26, 2012, 1:11:13 PM4/26/12
to asyncht...@googlegroups.com
Hi Andrew,

I have my own load-testing tool based on AHC and till now i have not faced the 16K issue you are referring to, but i do make a lot more concurrent connections that 4 (usually 1000). Have tested against a lot of servers, fasted i ever did was NGINX:KeepAlive:GZip:localhost was about 16K Request/Seconds with 1000 Concurrent Connections. This was on CentOS 5.6, I did some OS Tuning, on FD limit was raised to 100000, ephemeral port was also maxed out (i.e 1024-65535).

Yatin

Yatindra AP

unread,
Apr 26, 2012, 1:30:50 PM4/26/12
to asyncht...@googlegroups.com
Forgot to add Total Requests: 1000000 and Nginx serving a 80Kb HTML page (17Kb GZiped)

Andrew Cholakian

unread,
Apr 28, 2012, 12:09:41 PM4/28/12
to asyncht...@googlegroups.com
Yatindra, thanks for the datapoint! Your test used keep-alive it sounds like, which also works well for me (I get about 6k reqs/sec on my local box with it). It's only with keep-alive disabled that I have issues.

BTW, Stephane, thanks for the note about ephemeral ports, I believe that's what Tatu was discussing, and that seems to be the likely culprit.

Thanks for all the help guys!
Reply all
Reply to author
Forward
0 new messages