We recently upgraded from 2.0 to 2.3 and noticed that average latency of our async http requests increased significantly. After a bit of investigation we found that this latency was introduced by the changes to iostream.py from 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high latency. 2.1 with the 2.0 version of iostream.py runs with low latency). Any suggestions on what specifically might have caused this? We are seeing this increased latency in all of our async requests. The vast majority of our requests are POST with a small body (<1k)
Also, do you foresee any issues in running 2.3 with an older version of iostream.py?
Hmm, I don't see anything in the 2.0-2.1 diffs that would make things
significantly worse, so we'll need more information to track this
down. Are you talking about client or server-side requests, for one
thing?
Mixing and matching tornado components across versions is likely to
break things - 2.1's IOStream introduced several new methods that are
used elsewhere in tornado.
On Tue, Aug 28, 2012 at 3:46 PM, Nathan <nat...@mopub.com> wrote:
> We recently upgraded from 2.0 to 2.3 and noticed that average latency of our
> async http requests increased significantly. After a bit of investigation
> we found that this latency was introduced by the changes to iostream.py from
> 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high latency. 2.1
> with the 2.0 version of iostream.py runs with low latency). Any suggestions
> on what specifically might have caused this? We are seeing this increased
> latency in all of our async requests. The vast majority of our requests are
> POST with a small body (<1k)
> Also, do you foresee any issues in running 2.3 with an older version of
> iostream.py?
These are server-side requests. Our app takes in a GET request, does some processing then fans out to 25-40 remote clients via async POST requests with a 200ms timeout. It then waits for all the responses, does a small bit of calculation and writes a result. With 2.0, the average response time among these requests is ~45ms. With 2.1, it increases to ~85ms.
Let me know what additional information might be helpful.
On Tuesday, August 28, 2012 2:01:38 PM UTC-7, Ben Darnell wrote:
> Hmm, I don't see anything in the 2.0-2.1 diffs that would make things > significantly worse, so we'll need more information to track this > down. Are you talking about client or server-side requests, for one > thing?
> Mixing and matching tornado components across versions is likely to > break things - 2.1's IOStream introduced several new methods that are > used elsewhere in tornado.
> -Ben
> On Tue, Aug 28, 2012 at 3:46 PM, Nathan <nat...@mopub.com <javascript:>> > wrote: > > We recently upgraded from 2.0 to 2.3 and noticed that average latency of > our > > async http requests increased significantly. After a bit of > investigation > > we found that this latency was introduced by the changes to iostream.py > from > > 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high latency. 2.1 > > with the 2.0 version of iostream.py runs with low latency). Any > suggestions > > on what specifically might have caused this? We are seeing this > increased > > latency in all of our async requests. The vast majority of our requests > are > > POST with a small body (<1k)
> > Also, do you foresee any issues in running 2.3 with an older version of > > iostream.py?
On Tuesday, August 28, 2012 2:16:10 PM UTC-7, Nathan wrote:
> These are server-side requests. Our app takes in a GET request, does some > processing then fans out to 25-40 remote clients via async POST requests > with a 200ms timeout. It then waits for all the responses, does a small bit > of calculation and writes a result. With 2.0, the average response time > among these requests is ~45ms. With 2.1, it increases to ~85ms.
> Let me know what additional information might be helpful.
> On Tuesday, August 28, 2012 2:01:38 PM UTC-7, Ben Darnell wrote:
>> Hmm, I don't see anything in the 2.0-2.1 diffs that would make things >> significantly worse, so we'll need more information to track this >> down. Are you talking about client or server-side requests, for one >> thing?
>> Mixing and matching tornado components across versions is likely to >> break things - 2.1's IOStream introduced several new methods that are >> used elsewhere in tornado.
>> -Ben
>> On Tue, Aug 28, 2012 at 3:46 PM, Nathan <nat...@mopub.com> wrote: >> > We recently upgraded from 2.0 to 2.3 and noticed that average latency >> of our >> > async http requests increased significantly. After a bit of >> investigation >> > we found that this latency was introduced by the changes to iostream.py >> from >> > 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high latency. >> 2.1 >> > with the 2.0 version of iostream.py runs with low latency). Any >> suggestions >> > on what specifically might have caused this? We are seeing this >> increased >> > latency in all of our async requests. The vast majority of our requests >> are >> > POST with a small body (<1k)
>> > Also, do you foresee any issues in running 2.3 with an older version of >> > iostream.py?
> On Tuesday, August 28, 2012 2:16:10 PM UTC-7, Nathan wrote:
>> These are server-side requests. Our app takes in a GET request, does some >> processing then fans out to 25-40 remote clients via async POST requests >> with a 200ms timeout. It then waits for all the responses, does a small bit >> of calculation and writes a result. With 2.0, the average response time >> among these requests is ~45ms. With 2.1, it increases to ~85ms.
>> Let me know what additional information might be helpful.
>> On Tuesday, August 28, 2012 2:01:38 PM UTC-7, Ben Darnell wrote:
>>> Hmm, I don't see anything in the 2.0-2.1 diffs that would make things >>> significantly worse, so we'll need more information to track this >>> down. Are you talking about client or server-side requests, for one >>> thing?
>>> Mixing and matching tornado components across versions is likely to >>> break things - 2.1's IOStream introduced several new methods that are >>> used elsewhere in tornado.
>>> -Ben
>>> On Tue, Aug 28, 2012 at 3:46 PM, Nathan <nat...@mopub.com> wrote: >>> > We recently upgraded from 2.0 to 2.3 and noticed that average latency >>> of our >>> > async http requests increased significantly. After a bit of >>> investigation >>> > we found that this latency was introduced by the changes to >>> iostream.py from >>> > 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high latency. >>> 2.1 >>> > with the 2.0 version of iostream.py runs with low latency). Any >>> suggestions >>> > on what specifically might have caused this? We are seeing this >>> increased >>> > latency in all of our async requests. The vast majority of our >>> requests are >>> > POST with a small body (<1k)
>>> > Also, do you foresee any issues in running 2.3 with an older version >>> of >>> > iostream.py?
Cool. That change should be safe if you want to patch it in to your
copy. Calling _handle_write here is normally a performance
improvement, though. Are you perhaps calling IOStream.write or
RequestHandler.flush in a loop? The 2.1 changes essentially reduced
the amount of buffering that happened in IOStream, on the assumption
that the upper levels (e.g. RequestHandler) would do their own
buffering, but lots of tiny writes could slow things down.
On Wed, Aug 29, 2012 at 2:47 AM, Nathan <nat...@mopub.com> wrote:
> Manually undoing that change in 2.3 seems to remove the latency. Do you
> foresee any harm / tradeoffs in us running 2.3 with this modification:
>> On Tuesday, August 28, 2012 2:16:10 PM UTC-7, Nathan wrote:
>>> These are server-side requests. Our app takes in a GET request, does some
>>> processing then fans out to 25-40 remote clients via async POST requests
>>> with a 200ms timeout. It then waits for all the responses, does a small bit
>>> of calculation and writes a result. With 2.0, the average response time
>>> among these requests is ~45ms. With 2.1, it increases to ~85ms.
>>> Let me know what additional information might be helpful.
>>> On Tuesday, August 28, 2012 2:01:38 PM UTC-7, Ben Darnell wrote:
>>>> Hmm, I don't see anything in the 2.0-2.1 diffs that would make things
>>>> significantly worse, so we'll need more information to track this
>>>> down. Are you talking about client or server-side requests, for one
>>>> thing?
>>>> Mixing and matching tornado components across versions is likely to
>>>> break things - 2.1's IOStream introduced several new methods that are
>>>> used elsewhere in tornado.
>>>> -Ben
>>>> On Tue, Aug 28, 2012 at 3:46 PM, Nathan <nat...@mopub.com> wrote:
>>>> > We recently upgraded from 2.0 to 2.3 and noticed that average latency
>>>> > of our
>>>> > async http requests increased significantly. After a bit of
>>>> > investigation
>>>> > we found that this latency was introduced by the changes to
>>>> > iostream.py from
>>>> > 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high latency.
>>>> > 2.1
>>>> > with the 2.0 version of iostream.py runs with low latency). Any
>>>> > suggestions
>>>> > on what specifically might have caused this? We are seeing this
>>>> > increased
>>>> > latency in all of our async requests. The vast majority of our
>>>> > requests are
>>>> > POST with a small body (<1k)
>>>> > Also, do you foresee any issues in running 2.3 with an older version
>>>> > of
>>>> > iostream.py?
Using this toy app, I observed an average latency of 0.1681434235 when running 2.3 and 0.040904263 when running 2.0. (tested by sequentially hitting the endpoint 1000 times and averaging results). I ran it on an ec2 machine and had it ping out to another ec2 machine in the same az.
On Wednesday, August 29, 2012 10:01:24 AM UTC-7, Ben Darnell wrote:
> Cool. That change should be safe if you want to patch it in to your > copy. Calling _handle_write here is normally a performance > improvement, though. Are you perhaps calling IOStream.write or > RequestHandler.flush in a loop? The 2.1 changes essentially reduced > the amount of buffering that happened in IOStream, on the assumption > that the upper levels (e.g. RequestHandler) would do their own > buffering, but lots of tiny writes could slow things down.
> -Ben
> On Wed, Aug 29, 2012 at 2:47 AM, Nathan <nat...@mopub.com <javascript:>> > wrote: > > Manually undoing that change in 2.3 seems to remove the latency. Do you > > foresee any harm / tradeoffs in us running 2.3 with this modification:
> >> On Tuesday, August 28, 2012 2:16:10 PM UTC-7, Nathan wrote:
> >>> These are server-side requests. Our app takes in a GET request, does > some > >>> processing then fans out to 25-40 remote clients via async POST > requests > >>> with a 200ms timeout. It then waits for all the responses, does a > small bit > >>> of calculation and writes a result. With 2.0, the average response > time > >>> among these requests is ~45ms. With 2.1, it increases to ~85ms.
> >>> Let me know what additional information might be helpful.
> >>> On Tuesday, August 28, 2012 2:01:38 PM UTC-7, Ben Darnell wrote:
> >>>> Hmm, I don't see anything in the 2.0-2.1 diffs that would make things > >>>> significantly worse, so we'll need more information to track this > >>>> down. Are you talking about client or server-side requests, for one > >>>> thing?
> >>>> Mixing and matching tornado components across versions is likely to > >>>> break things - 2.1's IOStream introduced several new methods that are > >>>> used elsewhere in tornado.
> >>>> -Ben
> >>>> On Tue, Aug 28, 2012 at 3:46 PM, Nathan <nat...@mopub.com> wrote: > >>>> > We recently upgraded from 2.0 to 2.3 and noticed that average > latency > >>>> > of our > >>>> > async http requests increased significantly. After a bit of > >>>> > investigation > >>>> > we found that this latency was introduced by the changes to > >>>> > iostream.py from > >>>> > 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high > latency. > >>>> > 2.1 > >>>> > with the 2.0 version of iostream.py runs with low latency). Any > >>>> > suggestions > >>>> > on what specifically might have caused this? We are seeing this > >>>> > increased > >>>> > latency in all of our async requests. The vast majority of our > >>>> > requests are > >>>> > POST with a small body (<1k)
> >>>> > Also, do you foresee any issues in running 2.3 with an older > version > >>>> > of > >>>> > iostream.py?
Thanks for posting code; that's really helpful. I'm getting some
really interesting and surprising results here. I'm testing on my
macbook, which is a pretty different environment from ec2 but the
results I'm getting are similar enough that I think the results may be
applicable in your case as well.
First of all, I can reproduce your numbers: the exact numbers differ,
but I am seeing ~4x difference between tornado 2.0 and 2.1. However,
that doesn't appear to mean that 2.0 is faster - on 2.0 I'm getting
lots of getaddrinfo errors in the logs:
[W 120902 22:41:41 simple_httpclient:272] uncaught exception
Traceback (most recent call last):
File "/Users/bdarnell/src/nathan/.tox/tornado20/lib/python2.7/site-packages/torn ado/simple_httpclient.py",
line 270, in cleanup
File "/Users/bdarnell/src/nathan/.tox/tornado20/lib/python2.7/site-packages/torn ado/simple_httpclient.py",
line 166, in __init__
gaierror: [Errno 8] nodename nor servname provided, or not known
[Errno 8] nodename nor servname provided, or not known
In fact, the response times on tornado 2.0 are bimodal, and in the
fast case things are mostly erroring out - when all the requests
succeed, 2.0 and 2.1 are about the same speed.
I see the first 255 HTTP requests sent succeed, and then a run of
failures. If I insert a sleep between requests from my client code
(just a command line curl loop), everything works. The default file
descriptor limit on a mac happens to be 256. Increasing the file
descriptor limit on the server processes also fixes the problem.
It looks like what's happening is simply file descriptor exhaustion.
For some reason the sockets in 2.0 aren't getting closed quickly
enough, but the new fast-path writes and other changes in 2.1 shake
things up enough that we aren't accumulating a backlog of to-be-closed
sockets.
Update your benchmark to check response.code and see if you're also
seeing errors in the "fast" 2.0 results. If you are, see if raising
your ulimit fixes the problem. If you're still seeing major
performance differences even without errors, I'll take another look at
what's going on.
> Using this toy app, I observed an average latency of 0.1681434235 when
> running 2.3 and 0.040904263 when running 2.0. (tested by sequentially
> hitting the endpoint 1000 times and averaging results). I ran it on an ec2
> machine and had it ping out to another ec2 machine in the same az.
> Thanks for the help!
> Nathan
> On Wednesday, August 29, 2012 10:01:24 AM UTC-7, Ben Darnell wrote:
>> Cool. That change should be safe if you want to patch it in to your
>> copy. Calling _handle_write here is normally a performance
>> improvement, though. Are you perhaps calling IOStream.write or
>> RequestHandler.flush in a loop? The 2.1 changes essentially reduced
>> the amount of buffering that happened in IOStream, on the assumption
>> that the upper levels (e.g. RequestHandler) would do their own
>> buffering, but lots of tiny writes could slow things down.
>> -Ben
>> On Wed, Aug 29, 2012 at 2:47 AM, Nathan <nat...@mopub.com> wrote:
>> > Manually undoing that change in 2.3 seems to remove the latency. Do you
>> > foresee any harm / tradeoffs in us running 2.3 with this modification:
>> >> On Tuesday, August 28, 2012 2:16:10 PM UTC-7, Nathan wrote:
>> >>> These are server-side requests. Our app takes in a GET request, does
>> >>> some
>> >>> processing then fans out to 25-40 remote clients via async POST
>> >>> requests
>> >>> with a 200ms timeout. It then waits for all the responses, does a
>> >>> small bit
>> >>> of calculation and writes a result. With 2.0, the average response
>> >>> time
>> >>> among these requests is ~45ms. With 2.1, it increases to ~85ms.
>> >>> Let me know what additional information might be helpful.
>> >>> On Tuesday, August 28, 2012 2:01:38 PM UTC-7, Ben Darnell wrote:
>> >>>> Hmm, I don't see anything in the 2.0-2.1 diffs that would make things
>> >>>> significantly worse, so we'll need more information to track this
>> >>>> down. Are you talking about client or server-side requests, for one
>> >>>> thing?
>> >>>> Mixing and matching tornado components across versions is likely to
>> >>>> break things - 2.1's IOStream introduced several new methods that are
>> >>>> used elsewhere in tornado.
>> >>>> -Ben
>> >>>> On Tue, Aug 28, 2012 at 3:46 PM, Nathan <nat...@mopub.com> wrote:
>> >>>> > We recently upgraded from 2.0 to 2.3 and noticed that average
>> >>>> > latency
>> >>>> > of our
>> >>>> > async http requests increased significantly. After a bit of
>> >>>> > investigation
>> >>>> > we found that this latency was introduced by the changes to
>> >>>> > iostream.py from
>> >>>> > 2.0 to 2.1. (2.0 runs with low latency. 2.1 runs with high
>> >>>> > latency.
>> >>>> > 2.1
>> >>>> > with the 2.0 version of iostream.py runs with low latency). Any
>> >>>> > suggestions
>> >>>> > on what specifically might have caused this? We are seeing this
>> >>>> > increased
>> >>>> > latency in all of our async requests. The vast majority of our
>> >>>> > requests are
>> >>>> > POST with a small body (<1k)
>> >>>> > Also, do you foresee any issues in running 2.3 with an older
>> >>>> > version
>> >>>> > of
>> >>>> > iostream.py?