AsyncHTTPClient slower?

Gnep

unread,

Oct 31, 2011, 2:04:30 PM10/31/11

to Tornado Web Server

Hi,

I am testing the performance of async and sync io with tornado.

The case is simple: use AsyncHTTPClient and HTTPClient to curl Google,
and launch 1000 concurrent request. The result is surprising:

AsyncHTTPClient: 36 seconds to finish all requests
HTTPClient: 26 seconds to finish all requests

Can somebody shed some light?

Ben Darnell

unread,

Nov 1, 2011, 12:54:28 AM11/1/11

to python-...@googlegroups.com

How are you doing your "1000 concurrent requests" through HTTPClient?
With 1000 threads? I'm also surprised that you found the synchronous
HTTPClient to be faster than AsyncHTTPClient (since HTTPClient.fetch
just makes a new AsyncHTTPClient and IOLoop for each request), but
you'll probably need to post your code for us to be able to make sense
of these numbers.

-Ben

Gnep

unread,

Nov 2, 2011, 2:55:29 PM11/2/11

to Tornado Web Server

OK. What I am doing to is to use tornado to query SimpleDB.

For better performance, I use AsyncHTTPClient for the query of SDB
with the rest url. I set max_clients=1000 (from the log message, queue
is empty). Then I launch a stress client from another host (both
client and server are ec2 instances, so the net lag should be
minimal). The client spawns 1000 child process to call tornado to
query.

The result is ~900 seconds for the client to finish. However, the
response time for individual SDB query takes ~10s (from the log as
well).

Is there something wrong with my test, or anywhere can be furthuer
tuned?

Phil Whelan

unread,

Nov 2, 2011, 3:49:16 PM11/2/11

to python-...@googlegroups.com

Hi Gnep,

Is possible that the delay is with SimpleDB? If the request to SimpleDB is taking 10s then it sounds likely that the request is utilizing a reasonable amount of memory for this request. It is possible 1000 clients performing this request in parallel would take longer than 1 client performing this synchronously if you're hitting resource limits. Maybe try with a simpler SimpleDB request to prove that the bottle-neck is not with Tornado.

Cheers,

Phil

Roberto Aguilar

unread,

Nov 2, 2011, 11:47:02 PM11/2/11

to python-...@googlegroups.com

I'd be curious to know if 10 workers making 10 requests each is faster
than both 1 worker making 100 requests and 100 workers making 1
request each. This might go along with Phil's assumption that you're
hitting resource limits.

Maybe Amazon is throttling your IP for making that many parallel requests.

-Roberto.

Gnep

unread,

Nov 3, 2011, 4:57:15 AM11/3/11

to Tornado Web Server

I did some A/B tests.
Using HTTPClient, 100 concurrent requests from the client side, 90-100
seconds
Using AsyncHTTPClient(max_clients=1000), same stress, ~90 seconds

Note, the result is related with network and AWS itself. However, the
observation is that AsyncHTTPClient brings little-or-none advantage in
my tests.

Throttling could be a reason, however I suppose I should receive
message like: "Request from [my API key] is throttled." from AWS if
throttled. Read
http://www.interrobangpath.org/index.php?q=aHR0cHM6Ly9mb3J1bXMuYXdzLmFtYXpvbi5jb20vdGhyZWFkLmpzcGE%2FbWVzc2FnZUlEPTIwOTE2NyMyMDkxNjc%3D

On Nov 3, 11:47 am, Roberto Aguilar <roberto.c.agui...@gmail.com>
wrote:

> I'd be curious to know if 10 workers making 10 requests each is faster
> than both 1 worker making 100 requests and 100 workers making 1
> request each. This might go along with Phil's assumption that you're
> hitting resource limits.
>
> Maybe Amazon is throttling your IP for making that many parallel requests.
>
> -Roberto.
>
>
>
> On Wed, Nov 2, 2011 at 12:49 PM, Phil Whelan <phil...@gmail.com> wrote:
> > Hi Gnep,
> > Is possible that the delay is with SimpleDB? If the request to SimpleDB is
> > taking 10s then it sounds likely that the request is utilizing a reasonable
> > amount of memory for this request. It is possible 1000 clients performing
> > this request in parallel would take longer than 1 client performing this
> > synchronously if you're hitting resource limits. Maybe try with a simpler
> > SimpleDB request to prove that the bottle-neck is not with Tornado.
> > Cheers,
> > Phil
>

Cliff Wells

unread,

Nov 3, 2011, 2:44:13 PM11/3/11

to python-...@googlegroups.com

There are at least two possible definitions of "performance" here:

1) how long it takes to complete the entire test
2) how long it takes for the longest request to complete

The key difference here is this: imagine these requests were made, not
by a single test program, but by 1000 users. With the synchronous
client, the 1000th user wouldn't even begin to see a result until the
previous 999 users had finished, whereas with the async client, he would
probably wait no longer than the average time.

Threading would reveal a similar pattern. Running tasks in serial will
always complete faster than running them in parallel if all you measure
is the overall time to complete all tasks. If you consider the time of
the *slowest* task (i.e. the 1000th task in your test), then suddenly
the async (or threaded) system will appear to have much better
performance. Standard deviation is the statistical tool used to reveal
this number:

http://www.blackbeak.com/2008/04/16/using-standard-deviations-to-determine-web-analytics-benchmarks/

In general, for real-world systems, it's usually preferable to have
vastly smaller standard deviation at the expense of a slightly higher
average; all users wait 1s longer so that no user waits 1000s.

Regards,
Cliff

Gnep

unread,

Nov 3, 2011, 10:19:15 PM11/3/11

to Tornado Web Server

I don't really understand why "Running tasks in serial will always

complete faster than running them in parallel if all you measure
is the overall time to complete all tasks. "

Another note is that I replace the SDB url in AsyncHTTPClient with
"http://www.google.com". The result is similar. And I found that
setting max_clients=10 is better than 1000, cause in the latter case,
some requests simply return timeout.

On Nov 4, 2:44 am, Cliff Wells <cl...@develix.com> wrote:
> On Mon, 2011-10-31 at 11:04 -0700, Gnep wrote:
> > Hi,
>
> > I am testing the performance of async and sync io with tornado.
>
> > The case is simple: use AsyncHTTPClient and HTTPClient to curl Google,
> > and launch 1000 concurrent request. The result is surprising:
>
> > AsyncHTTPClient: 36 seconds to finish all requests
> > HTTPClient: 26 seconds to finish all requests
>
> > Can somebody shed some light?
>
> There are at least two possible definitions of "performance" here:
>
> 1) how long it takes to complete the entire test
> 2) how long it takes for the longest request to complete
>
> The key difference here is this: imagine these requests were made, not
> by a single test program, but by 1000 users. With the synchronous
> client, the 1000th user wouldn't even begin to see a result until the
> previous 999 users had finished, whereas with the async client, he would
> probably wait no longer than the average time.
>
> Threading would reveal a similar pattern. Running tasks in serial will
> always complete faster than running them in parallel if all you measure
> is the overall time to complete all tasks. If you consider the time of
> the *slowest* task (i.e. the 1000th task in your test), then suddenly
> the async (or threaded) system will appear to have much better
> performance. Standard deviation is the statistical tool used to reveal
> this number:
>

> http://www.blackbeak.com/2008/04/16/using-standard-deviations-to-dete...

Cliff Wells

unread,

Nov 3, 2011, 10:53:58 PM11/3/11

to python-...@googlegroups.com

On Thu, 2011-11-03 at 19:19 -0700, Gnep wrote:
> I don't really understand why "Running tasks in serial will always
> complete faster than running them in parallel if all you measure
> is the overall time to complete all tasks. "

Because running tasks in parallel implies overhead and resource
sharing/consumption that running them in serial does not (e.g. context
switches, bandwidth limits, disk I/O limits, memory pressure, CPU cache
misses, etc). This overhead may exist in the client, on the remote
server, in the network, or more likely, be present in every part of the
stack.

Async applications suffer less from this than threaded equivalents
because they are serialized at the execution level (so context switches
aren't involved), but your particular test involves other services that
are probably not async, and some of the overhead (such as bandwidth,
memory, etc) still apply for async apps in any case.

Note that I'm not saying your conclusion is wrong (you may be right),
but rather that your testing methodology is inconclusive. You'd need a
much more isolated environment (read: not cloud-based, not testing
against a server with unknown performance properties, etc). Try your
same test on a LAN and test against something with a more predictable
performance baseline (i.e. Nginx serving a static file rather than
SDB).

Cliff

Gnep

unread,

Nov 3, 2011, 11:42:44 PM11/3/11

to Tornado Web Server

I see. The test I did is using the same instance, same code base, also
in my latest test, it fetches google, but sdb. That is pretty much
standard benchmark, isn'it?

Cliff Wells

unread,

Nov 4, 2011, 1:03:47 AM11/4/11

to python-...@googlegroups.com

On Thu, 2011-11-03 at 20:42 -0700, Gnep wrote:
> I see. The test I did is using the same instance, same code base, also
> in my latest test, it fetches google, but sdb. That is pretty much
> standard benchmark, isn'it?

It seems somewhat better than your first test, but I think I'd still
prefer a more controlled environment such as Nginx serving a static file
that lies on the same LAN as your client. Internet is an unreliable
test platform (unless you are testing your internet connection).

Cliff

Gnep

unread,

Nov 4, 2011, 6:54:39 AM11/4/11

to Tornado Web Server

But the code will be eventually running on Internet!

Andrew Fort

unread,

Nov 4, 2011, 1:40:39 PM11/4/11

to python-...@googlegroups.com

On Fri, Nov 4, 2011 at 3:54 AM, Gnep <jass...@gmail.com> wrote:
> But the code will be eventually running on Internet!

What is the problem you are trying to solve, again?

Cliff Wells

unread,

Nov 4, 2011, 1:45:22 PM11/4/11

to python-...@googlegroups.com

On Fri, 2011-11-04 at 03:54 -0700, Gnep wrote:
> But the code will be eventually running on Internet!

Yes, but your benchmark won't be. You need to ask yourself: am I
benchmarking the internet, or am I benchmarking a particular piece of
software? There might be a real issue with AsyncHTTPClient, but until
you've actually demonstrated that, I don't think anyone will spend much
time trying to track it down.

The test you've configured has far too many variables to decide where
bottleneck might be when making lots of concurrent requests: is it your
network throughput? Is it AsyncHTTPClient? Is it the database server?
Is it your cloud platform? Your test raises more questions than it
answers.

Cliff

Ben Darnell

unread,

Nov 4, 2011, 3:28:49 PM11/4/11

to python-...@googlegroups.com

On Fri, Nov 4, 2011 at 10:45 AM, Cliff Wells <cl...@develix.com> wrote:
> On Fri, 2011-11-04 at 03:54 -0700, Gnep wrote:
>> But the code will be eventually running on Internet!
>
> Yes, but your benchmark won't be. You need to ask yourself: am I
> benchmarking the internet, or am I benchmarking a particular piece of
> software? There might be a real issue with AsyncHTTPClient, but until
> you've actually demonstrated that, I don't think anyone will spend much
> time trying to track it down.
>
> The test you've configured has far too many variables to decide where
> bottleneck might be when making lots of concurrent requests: is it your
> network throughput? Is it AsyncHTTPClient? Is it the database server?
> Is it your cloud platform? Your test raises more questions than it
> answers.

More important than any of these variables is that you still haven't
told us much about what you're actually testing. You've said you're
sending 100 concurrent requests with the synchronous http client, but
haven't specified whether you're using threads or processes, etc. If
you post your code we'll probably be able to tell what's causing the
difference, but without that we're just taking shots in the dark.

-Ben

Gnep

unread,

Nov 5, 2011, 1:26:30 AM11/5/11

to Tornado Web Server

Ben,

Here is the code:

Client:
#!/usr/bin/python26

# coding: utf-8
import time
import multiprocessing
from jsonrpclib import Server

def test(server, url):
print server.get_url(url)

if __name__ == '__main__':
try:
round = 1
concurrency = 100
url = 'http://www.baidu.com'
server = Server('http://localhost:8080/')
task = []
begin = int(time.time())
for i in range(round):
for ii in range(concurrency):
p = multiprocessing.Process(target=test, args=(server, url, ))
task.append(p)
p.start()
for p in task:
p.join()
print int(time.time()) - begin
except Exception, e:
print e

Server:
from tornadorpc import async
from tornadorpc import private, start_server
from tornadorpc.json import JSONRPCHandler
from tornado.httpclient import AsyncHTTPClient, HTTPClient

class Handler(JSONRPCHandler):

@async
def get_url(self, url):
try:
client = AsyncHTTPClient(max_clients=1000)
client.fetch(url, self._handle_response)
except Exception, e:
self.result(e)

def _handle_response(self, response):
self.result(response.code)

start_server(Handler, port=8080, debug=True)

Ben Darnell

unread,

Nov 7, 2011, 1:12:12 AM11/7/11

to python-...@googlegroups.com

The difference is that your first example is using multiple processes
while the second is using a single process (and is therefore limited
to a single cpu). The first finishes faster because it is using
resources that the second can't even see. However, forking a new
process for each request is a very inefficient way to do things. In
general you'll get the best results by making a fixed number of
processes (probably equal to the number of CPUs in your machine) and
reusing those instead of creating new processes each time.

-Ben

Reply all

Reply to author

Forward