Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Timeouts in tests using Tornado + Motor
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Daniel Kador  
View profile  
 More options Oct 4 2012, 8:50 pm
From: Daniel Kador <d...@keen.io>
Date: Thu, 4 Oct 2012 17:50:26 -0700 (PDT)
Local: Thurs, Oct 4 2012 8:50 pm
Subject: Timeouts in tests using Tornado + Motor

Hi!

I'm running into (what I perceive to be) a strange problem with my tests.
 I've got an application written on top of Tornado that communicates with
Mongo through the Motor library.  It's humming away nicely in production
(daemonized using supervisord).  The only problems (that I can detect) are
with my tests.  I'm very happy to acknowledge that this could be a problem
in my application layer, or in the Motor layer, or maybe even in MongoDB
itself - mostly I'm looking for some guidance as I'm not really sure where
to start digging in.

But let me step back and describe what I've noticed in more detail.

I have approximately ~150 tests that extend
the tornado.testing.AsyncHTTPTestCase class, spread across a number of my
own classes.  I have ~50 that are simple unit tests.  They both communicate
to Mongo through Motor.  The unit tests are fast and never fail (I mention
this only as a data point that may show Motor by itself may not be the
culprit).  The AsyncHTTPTestCase tests all pass when their various test
classes are run individually.

But when I run them all together, after about ~65 tests running and
passing, I start getting timeouts in various spots.  Sometimes it's when
interacting directly with Motor.  Sometimes it's when making an async HTTP
request to one of the APIs I've created (which, under the covers, is
talking to Mongo through Motor).  But it's always when calling (from a
test) self.wait().

It's getting to be a serious problem, as now I can't actually run all my
tests locally (at one time).  Obviously not a recipe for test-driven
development success.

I'm relatively sure I've done something stupid, but I haven't figured out
where yet.  Would really appreciate some help here.

Thanks!

-Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Darnell  
View profile  
 More options Oct 4 2012, 11:53 pm
From: Ben Darnell <b...@bendarnell.com>
Date: Thu, 4 Oct 2012 20:53:08 -0700
Local: Thurs, Oct 4 2012 11:53 pm
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor
This sounds like it might be a file descriptor leak.  Your tests might
be opening sockets and not closing them, which would cause problems
once you hit the file descriptor limit, which is relatively low by
default on many platforms.  Try increasing the limit by running "uname
-n 10000" before running your tests (if you're on linux and the OS
won't let you, try increasing the settings in
/etc/security/limits.conf).  If that works, then you're running out of
file descriptors.  Once you've confirmed the problem, a better
long-term solution is to ensure that you're closing all the file
descriptors you're opening.  This includes sockets and regular files,
as well as other file descriptors created by IOLoop and
AsyncHTTPClient (could you be forgetting to call super.tearDown() in
your tests?).

-Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
A. Jesse Jiryu Davis  
View profile  
 More options Oct 5 2012, 2:23 am
From: "A. Jesse Jiryu Davis" <ajesseda...@gmail.com>
Date: Thu, 4 Oct 2012 23:23:11 -0700 (PDT)
Local: Fri, Oct 5 2012 2:23 am
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

Agreed. Are you using MotorConnection or MotorReplicaSetConnection? Do you
create a new connection per test? Does it help to call close() on that
connection in tearDown() rather than simply relying on it to be
garbage-collected when it's dereferenced?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Kador  
View profile  
 More options Oct 5 2012, 2:38 pm
From: Daniel Kador <d...@keen.io>
Date: Fri, 5 Oct 2012 11:38:59 -0700 (PDT)
Local: Fri, Oct 5 2012 2:38 pm
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

Awesome, I was thinking along the same lines.  I've seen this problem on
both my local development platform (Mac) and my CI box (Ubuntu).  I've
looked into the file descriptor problem before and didn't come up with
anything definitive, but I'll dig in more.  Both Jesse's and Ben's
suggestions sound like good places to start.

I'm using MotorReplicaSetConnection.  I don't create a new one per test, I
have a single module-level connection reference.  I've tried explicitly
setting up the connection and then tearing it down on every test, which
didn't make a noticeable difference (except to significantly slow down my
tests :P).

I'll dig in more and report back. I don't /believe/ I'm forgetting to call
super.tearDown(), but I'll double-check.  If it's that simple I'll be
relieved (and slightly embarrassed).

Thanks so much!

-Dan

On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu Davis wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Kador  
View profile  
 More options Oct 5 2012, 5:11 pm
From: Daniel Kador <d...@keen.io>
Date: Fri, 5 Oct 2012 14:11:19 -0700 (PDT)
Local: Fri, Oct 5 2012 5:11 pm
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

Quick update - I believe I've ruled out file descriptors as the source of
my problem.  I wrote a small script that just repeatedly runs "lsof | wc
-l" (which, as I understand it, should give me the number of open file
descriptors on my system) and stores the results in a texfile.  Before I
launched any tests, my open file descriptor count hovered around 7,300.  As
I ran tests, the number fluctuated a bit, but never went above 7,405 and
never went below 7,221.  So it feels like I'm not blowing out the
descriptor count anywhere.

I also verified that the tearDown methods in AsyncTestCase and
AsyncHTTPTestCase are being called on every test.

Some more details (not sure if they're useful or relevant, but I'm hoping
they are): when I run tests my CPU gets pegged.  One of my cores goes to
about 90% and hovers there for the whole run.  Also, I'm overriding
get_new_ioloop() to return the singleton instance.

Jesse, question for you: do I need to explicitly close cursors opened by
Motor?  I'm pretty sure I don't need to do anything with the Connection
object, but I couldn't find anything that told me my responsibility re: the
cursor lifecycle.

Will keep digging in.

Thanks again!

-Dan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
A. Jesse Jiryu Davis  
View profile  
 More options Oct 5 2012, 5:53 pm
From: "A. Jesse Jiryu Davis" <ajesseda...@gmail.com>
Date: Fri, 5 Oct 2012 14:53:00 -0700 (PDT)
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

No, I'll add this to the docs -- a MotorCursor's __del__ calls the PyMongo
Cursor's close(), which checks if the cursor needs to be closed on the
server and if so sends the closeCursors command. So if you've fully
iterated a cursor (with to_list, or next_object until it returns None) then
the server has already closed the cursor, otherwise the client cursor
closes itself on the server when it's dereferenced.

Motor's own test suite checks for leaked cursors after every test:

https://github.com/ajdavis/mongo-python-driver/blob/motor/test/motor/...

So you could do something similar in your tests, or manually run
db.serverStatus().cursors in the Mongo shell while the tests are going. If
it rises a lot, then we have a problem.

Is it Python or MongoDB that mostly uses CPU? I'm not surprised your CPU is
pegged, nothing *else* would be the bottleneck in your unittests.

Can you run lsof|grep PID|wc -l, where PID is the PID of your Python
process, and make sure that process itself isn't opening more than a dozen
or so connections? What about with mongod's PID?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
A. Jesse Jiryu Davis  
View profile  
 More options Oct 5 2012, 6:09 pm
From: "A. Jesse Jiryu Davis" <ajesseda...@gmail.com>
Date: Fri, 5 Oct 2012 15:09:38 -0700 (PDT)
Local: Fri, Oct 5 2012 6:09 pm
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

I've updated the docs about MotorCursor:

http://emptysquare.net/motor/pymongo/api/motor/motor_cursor.html

Let me know what you find out regarding your tests, Dan, or any other way I
can help. I want anyone who uses Motor to succeed. =)

On Friday, October 5, 2012 2:53:00 PM UTC-7, A. Jesse Jiryu Davis wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Kador  
View profile  
 More options Oct 5 2012, 6:21 pm
From: Daniel Kador <d...@keen.io>
Date: Fri, 5 Oct 2012 15:21:18 -0700 (PDT)
Local: Fri, Oct 5 2012 6:21 pm
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

The doc additions looks great, thanks for handling it so quickly.

db.serverStatus().cursors reports nothing interesting at all.  The cursors
are definitely being closed (which is good).

The CPU is being used by Python.  Mongo hardly has a blip.

I ran lsof -p PID | wc -l, where PID was the PID of my Python process.  The
count stays steady around 55.  Most of the results are related to the .so
files loaded by Python.  There are about 10 related to sockets, which seems
pretty reasonable to me.

Mongo stays pretty consistent around 113.  Again, that doesn't sound crazy.

It definitely *feels* like the Python layer is leaking something,
somewhere.  Is there an easy way to see what's registered on the ioloop?
 Or just simply log what the ioloop is currently working on? I wonder if my
code is somehow not terminating in some cases?  Or taking much longer than
I think?

On Friday, October 5, 2012 3:09:38 PM UTC-7, A. Jesse Jiryu Davis wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Kador  
View profile  
 More options Oct 5 2012, 7:44 pm
From: Daniel Kador <d...@keen.io>
Date: Fri, 5 Oct 2012 16:44:56 -0700 (PDT)
Local: Fri, Oct 5 2012 7:44 pm
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

I appear to have found a fix.  I was previously returning an Application
instance in self.get_app() with the debug kwarg set to True (tests were
inheriting the default value for running in dev mode) and now everything is
quite speedy.  More importantly, the tests actually all run!  I'm
satisfied, but if Ben or Jesse wants me to dig in more to figure out what,
exactly, debug mode is doing, I'm happy to help out.

One thing I noticed: I was able to reproduce the problem in a small test
class with ~100 tests.  The ONLY way I could repro, though, was if I had
debug mode on AND I returned the singleton IOLoop from IOLoop.instance() in
self.get_new_ioloop() AND each test made a simple HTTP request AND my
application used my request handlers (if I tried with a barebones app it
was all fine).

So it *appears* as if there's some strange interaction between my handlers
and debug mode?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Darnell  
View profile  
 More options Oct 5 2012, 11:47 pm
From: Ben Darnell <b...@bendarnell.com>
Date: Fri, 5 Oct 2012 20:47:03 -0700
Local: Fri, Oct 5 2012 11:47 pm
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor
Oh, I see.  What's happening is that each time you create a debug-mode
application, it calls autoreload.start(), which schedules a callback
every 500ms on the IOLoop to check the modification times of all the
source files.  As you run more tests, those accumulate, and eventually
the server can hardly do anything else.  Autoreload should definitely
check to see if it's being started on the same IOLoop more than once.
(and maybe it should warn if it's being started on two different
IOLoops in the same process, since that's not really useful).

-Ben

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
A. Jesse Jiryu Davis  
View profile  
 More options Oct 6 2012, 12:38 am
From: "A. Jesse Jiryu Davis" <ajesseda...@gmail.com>
Date: Fri, 5 Oct 2012 21:38:00 -0700 (PDT)
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

Interesting. =)

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Kador  
View profile  
 More options Oct 8 2012, 2:58 pm
From: Daniel Kador <d...@keen.io>
Date: Mon, 8 Oct 2012 11:58:05 -0700 (PDT)
Local: Mon, Oct 8 2012 2:58 pm
Subject: Re: [tornado] Timeouts in tests using Tornado + Motor

Love that idea.  I was all set to implement a fix myself, but then I
noticed you already handled it on Friday. :)  Thanks!

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »