Re: [tornado] Increased latency of async http requests when upgrading from 2.0 to 2.1

Ben Darnell

unread,

Aug 28, 2012, 5:01:15 PM8/28/12

to python-...@googlegroups.com

Hmm, I don't see anything in the 2.0-2.1 diffs that would make things
significantly worse, so we'll need more information to track this
down. Are you talking about client or server-side requests, for one
thing?

Mixing and matching tornado components across versions is likely to
break things - 2.1's IOStream introduced several new methods that are
used elsewhere in tornado.

-Ben

On Tue, Aug 28, 2012 at 3:46 PM, Nathan <nat...@mopub.com> wrote:
> We recently upgraded from 2.0 to 2.3 and noticed that average latency of our
> async http requests increased significantly. After a bit of investigation
> we found that this latency was introduced by the changes to iostream.py from
> 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high latency. 2.1
> with the 2.0 version of iostream.py runs with low latency). Any suggestions
> on what specifically might have caused this? We are seeing this increased
> latency in all of our async requests. The vast majority of our requests are
> POST with a small body (<1k)
>
> Also, do you foresee any issues in running 2.3 with an older version of
> iostream.py?
>
> thanks for the help!

Nathan

unread,

Aug 28, 2012, 5:16:10 PM8/28/12

to python-...@googlegroups.com, b...@bendarnell.com

These are server-side requests. Our app takes in a GET request, does some processing then fans out to 25-40 remote clients via async POST requests with a 200ms timeout. It then waits for all the responses, does a small bit of calculation and writes a result. With 2.0, the average response time among these requests is ~45ms. With 2.1, it increases to ~85ms.

Let me know what additional information might be helpful.

Nathan

unread,

Aug 29, 2012, 2:24:32 AM8/29/12

to python-...@googlegroups.com, b...@bendarnell.com

Digging into the issue further, I found this is where the extra latency is introduced:

https://github.com/facebook/tornado/commit/894fa93e88fd4075711fc69db29f5df7b2b39228#tornado/iostream.py

Nathan

unread,

Aug 29, 2012, 2:47:43 AM8/29/12

to python-...@googlegroups.com, b...@bendarnell.com

Manually undoing that change in 2.3 seems to remove the latency. Do you foresee any harm / tradeoffs in us running 2.3 with this modification:

@@ -206,10 +206,8 @@ class IOStream(object):
                     self._write_buffer.append(data[i:i + WRITE_BUFFER_CHUNK_SIZE])
             else:
                 self._write_buffer.append(data)
+        self._add_io_state(self.io_loop.WRITE)
         self._write_callback = stack_context.wrap(callback)
-        self._handle_write()
-        if self._write_buffer:
-            self._add_io_state(self.io_loop.WRITE)
         self._maybe_add_error_listener()

Ben Darnell

unread,

Aug 29, 2012, 1:01:01 PM8/29/12

to python-...@googlegroups.com

Cool. That change should be safe if you want to patch it in to your
copy. Calling _handle_write here is normally a performance
improvement, though. Are you perhaps calling IOStream.write or
RequestHandler.flush in a loop? The 2.1 changes essentially reduced
the amount of buffering that happened in IOStream, on the assumption
that the upper levels (e.g. RequestHandler) would do their own
buffering, but lots of tiny writes could slow things down.

-Ben

Nathan

unread,

Aug 29, 2012, 4:18:32 PM8/29/12

to python-...@googlegroups.com, b...@bendarnell.com

Here is a toy version of what i'm doing:

https://gist.github.com/3518102

Using this toy app, I observed an average latency of 0.1681434235 when running 2.3 and 0.040904263 when running 2.0. (tested by sequentially hitting the endpoint 1000 times and averaging results). I ran it on an ec2 machine and had it ping out to another ec2 machine in the same az.

Thanks for the help!
Nathan

Ben Darnell

unread,

Sep 3, 2012, 1:55:22 AM9/3/12

to python-...@googlegroups.com

Thanks for posting code; that's really helpful. I'm getting some
really interesting and surprising results here. I'm testing on my
macbook, which is a pretty different environment from ec2 but the
results I'm getting are similar enough that I think the results may be
applicable in your case as well.

First of all, I can reproduce your numbers: the exact numbers differ,
but I am seeing ~4x difference between tornado 2.0 and 2.1. However,
that doesn't appear to mean that 2.0 is faster - on 2.0 I'm getting
lots of getaddrinfo errors in the logs:
[W 120902 22:41:41 simple_httpclient:272] uncaught exception
Traceback (most recent call last):
File "/Users/bdarnell/src/nathan/.tox/tornado20/lib/python2.7/site-packages/tornado/simple_httpclient.py",
line 270, in cleanup
File "/Users/bdarnell/src/nathan/.tox/tornado20/lib/python2.7/site-packages/tornado/simple_httpclient.py",
line 166, in __init__
gaierror: [Errno 8] nodename nor servname provided, or not known
[Errno 8] nodename nor servname provided, or not known

In fact, the response times on tornado 2.0 are bimodal, and in the
fast case things are mostly erroring out - when all the requests
succeed, 2.0 and 2.1 are about the same speed.

I see the first 255 HTTP requests sent succeed, and then a run of
failures. If I insert a sleep between requests from my client code
(just a command line curl loop), everything works. The default file
descriptor limit on a mac happens to be 256. Increasing the file
descriptor limit on the server processes also fixes the problem.

It looks like what's happening is simply file descriptor exhaustion.
For some reason the sockets in 2.0 aren't getting closed quickly
enough, but the new fast-path writes and other changes in 2.1 shake
things up enough that we aren't accumulating a backlog of to-be-closed
sockets.

Update your benchmark to check response.code and see if you're also
seeing errors in the "fast" 2.0 results. If you are, see if raising
your ulimit fixes the problem. If you're still seeing major
performance differences even without errors, I'll take another look at
what's going on.

-Ben

Reply all

Reply to author

Forward