I'm running into (what I perceive to be) a strange problem with my tests. I've got an application written on top of Tornado that communicates with Mongo through the Motor library. It's humming away nicely in production (daemonized using supervisord). The only problems (that I can detect) are with my tests. I'm very happy to acknowledge that this could be a problem in my application layer, or in the Motor layer, or maybe even in MongoDB itself - mostly I'm looking for some guidance as I'm not really sure where to start digging in.
But let me step back and describe what I've noticed in more detail.
I have approximately ~150 tests that extend the tornado.testing.AsyncHTTPTestCase class, spread across a number of my own classes. I have ~50 that are simple unit tests. They both communicate to Mongo through Motor. The unit tests are fast and never fail (I mention this only as a data point that may show Motor by itself may not be the culprit). The AsyncHTTPTestCase tests all pass when their various test classes are run individually.
But when I run them all together, after about ~65 tests running and passing, I start getting timeouts in various spots. Sometimes it's when interacting directly with Motor. Sometimes it's when making an async HTTP request to one of the APIs I've created (which, under the covers, is talking to Mongo through Motor). But it's always when calling (from a test) self.wait().
It's getting to be a serious problem, as now I can't actually run all my tests locally (at one time). Obviously not a recipe for test-driven development success.
I'm relatively sure I've done something stupid, but I haven't figured out where yet. Would really appreciate some help here.
This sounds like it might be a file descriptor leak. Your tests might
be opening sockets and not closing them, which would cause problems
once you hit the file descriptor limit, which is relatively low by
default on many platforms. Try increasing the limit by running "uname
-n 10000" before running your tests (if you're on linux and the OS
won't let you, try increasing the settings in
/etc/security/limits.conf). If that works, then you're running out of
file descriptors. Once you've confirmed the problem, a better
long-term solution is to ensure that you're closing all the file
descriptors you're opening. This includes sockets and regular files,
as well as other file descriptors created by IOLoop and
AsyncHTTPClient (could you be forgetting to call super.tearDown() in
your tests?).
On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> wrote:
> Hi!
> I'm running into (what I perceive to be) a strange problem with my tests.
> I've got an application written on top of Tornado that communicates with
> Mongo through the Motor library. It's humming away nicely in production
> (daemonized using supervisord). The only problems (that I can detect) are
> with my tests. I'm very happy to acknowledge that this could be a problem
> in my application layer, or in the Motor layer, or maybe even in MongoDB
> itself - mostly I'm looking for some guidance as I'm not really sure where
> to start digging in.
> But let me step back and describe what I've noticed in more detail.
> I have approximately ~150 tests that extend the
> tornado.testing.AsyncHTTPTestCase class, spread across a number of my own
> classes. I have ~50 that are simple unit tests. They both communicate to
> Mongo through Motor. The unit tests are fast and never fail (I mention this
> only as a data point that may show Motor by itself may not be the culprit).
> The AsyncHTTPTestCase tests all pass when their various test classes are run
> individually.
> But when I run them all together, after about ~65 tests running and passing,
> I start getting timeouts in various spots. Sometimes it's when interacting
> directly with Motor. Sometimes it's when making an async HTTP request to
> one of the APIs I've created (which, under the covers, is talking to Mongo
> through Motor). But it's always when calling (from a test) self.wait().
> It's getting to be a serious problem, as now I can't actually run all my
> tests locally (at one time). Obviously not a recipe for test-driven
> development success.
> I'm relatively sure I've done something stupid, but I haven't figured out
> where yet. Would really appreciate some help here.
Agreed. Are you using MotorConnection or MotorReplicaSetConnection? Do you create a new connection per test? Does it help to call close() on that connection in tearDown() rather than simply relying on it to be garbage-collected when it's dereferenced?
On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
> This sounds like it might be a file descriptor leak. Your tests might > be opening sockets and not closing them, which would cause problems > once you hit the file descriptor limit, which is relatively low by > default on many platforms. Try increasing the limit by running "uname > -n 10000" before running your tests (if you're on linux and the OS > won't let you, try increasing the settings in > /etc/security/limits.conf). If that works, then you're running out of > file descriptors. Once you've confirmed the problem, a better > long-term solution is to ensure that you're closing all the file > descriptors you're opening. This includes sockets and regular files, > as well as other file descriptors created by IOLoop and > AsyncHTTPClient (could you be forgetting to call super.tearDown() in > your tests?).
> -Ben
> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io <javascript:>> > wrote: > > Hi!
> > I'm running into (what I perceive to be) a strange problem with my > tests. > > I've got an application written on top of Tornado that communicates with > > Mongo through the Motor library. It's humming away nicely in production > > (daemonized using supervisord). The only problems (that I can detect) > are > > with my tests. I'm very happy to acknowledge that this could be a > problem > > in my application layer, or in the Motor layer, or maybe even in MongoDB > > itself - mostly I'm looking for some guidance as I'm not really sure > where > > to start digging in.
> > But let me step back and describe what I've noticed in more detail.
> > I have approximately ~150 tests that extend the > > tornado.testing.AsyncHTTPTestCase class, spread across a number of my > own > > classes. I have ~50 that are simple unit tests. They both communicate > to > > Mongo through Motor. The unit tests are fast and never fail (I mention > this > > only as a data point that may show Motor by itself may not be the > culprit). > > The AsyncHTTPTestCase tests all pass when their various test classes are > run > > individually.
> > But when I run them all together, after about ~65 tests running and > passing, > > I start getting timeouts in various spots. Sometimes it's when > interacting > > directly with Motor. Sometimes it's when making an async HTTP request > to > > one of the APIs I've created (which, under the covers, is talking to > Mongo > > through Motor). But it's always when calling (from a test) self.wait().
> > It's getting to be a serious problem, as now I can't actually run all my > > tests locally (at one time). Obviously not a recipe for test-driven > > development success.
> > I'm relatively sure I've done something stupid, but I haven't figured > out > > where yet. Would really appreciate some help here.
Awesome, I was thinking along the same lines. I've seen this problem on both my local development platform (Mac) and my CI box (Ubuntu). I've looked into the file descriptor problem before and didn't come up with anything definitive, but I'll dig in more. Both Jesse's and Ben's suggestions sound like good places to start.
I'm using MotorReplicaSetConnection. I don't create a new one per test, I have a single module-level connection reference. I've tried explicitly setting up the connection and then tearing it down on every test, which didn't make a noticeable difference (except to significantly slow down my tests :P).
I'll dig in more and report back. I don't /believe/ I'm forgetting to call super.tearDown(), but I'll double-check. If it's that simple I'll be relieved (and slightly embarrassed).
Thanks so much!
-Dan
On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu Davis wrote:
> Agreed. Are you using MotorConnection or MotorReplicaSetConnection? Do you > create a new connection per test? Does it help to call close() on that > connection in tearDown() rather than simply relying on it to be > garbage-collected when it's dereferenced?
> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
>> This sounds like it might be a file descriptor leak. Your tests might >> be opening sockets and not closing them, which would cause problems >> once you hit the file descriptor limit, which is relatively low by >> default on many platforms. Try increasing the limit by running "uname >> -n 10000" before running your tests (if you're on linux and the OS >> won't let you, try increasing the settings in >> /etc/security/limits.conf). If that works, then you're running out of >> file descriptors. Once you've confirmed the problem, a better >> long-term solution is to ensure that you're closing all the file >> descriptors you're opening. This includes sockets and regular files, >> as well as other file descriptors created by IOLoop and >> AsyncHTTPClient (could you be forgetting to call super.tearDown() in >> your tests?).
>> -Ben
>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> wrote: >> > Hi!
>> > I'm running into (what I perceive to be) a strange problem with my >> tests. >> > I've got an application written on top of Tornado that communicates >> with >> > Mongo through the Motor library. It's humming away nicely in >> production >> > (daemonized using supervisord). The only problems (that I can detect) >> are >> > with my tests. I'm very happy to acknowledge that this could be a >> problem >> > in my application layer, or in the Motor layer, or maybe even in >> MongoDB >> > itself - mostly I'm looking for some guidance as I'm not really sure >> where >> > to start digging in.
>> > But let me step back and describe what I've noticed in more detail.
>> > I have approximately ~150 tests that extend the >> > tornado.testing.AsyncHTTPTestCase class, spread across a number of my >> own >> > classes. I have ~50 that are simple unit tests. They both communicate >> to >> > Mongo through Motor. The unit tests are fast and never fail (I mention >> this >> > only as a data point that may show Motor by itself may not be the >> culprit). >> > The AsyncHTTPTestCase tests all pass when their various test classes >> are run >> > individually.
>> > But when I run them all together, after about ~65 tests running and >> passing, >> > I start getting timeouts in various spots. Sometimes it's when >> interacting >> > directly with Motor. Sometimes it's when making an async HTTP request >> to >> > one of the APIs I've created (which, under the covers, is talking to >> Mongo >> > through Motor). But it's always when calling (from a test) >> self.wait().
>> > It's getting to be a serious problem, as now I can't actually run all >> my >> > tests locally (at one time). Obviously not a recipe for test-driven >> > development success.
>> > I'm relatively sure I've done something stupid, but I haven't figured >> out >> > where yet. Would really appreciate some help here.
Quick update - I believe I've ruled out file descriptors as the source of my problem. I wrote a small script that just repeatedly runs "lsof | wc -l" (which, as I understand it, should give me the number of open file descriptors on my system) and stores the results in a texfile. Before I launched any tests, my open file descriptor count hovered around 7,300. As I ran tests, the number fluctuated a bit, but never went above 7,405 and never went below 7,221. So it feels like I'm not blowing out the descriptor count anywhere.
I also verified that the tearDown methods in AsyncTestCase and AsyncHTTPTestCase are being called on every test.
Some more details (not sure if they're useful or relevant, but I'm hoping they are): when I run tests my CPU gets pegged. One of my cores goes to about 90% and hovers there for the whole run. Also, I'm overriding get_new_ioloop() to return the singleton instance.
Jesse, question for you: do I need to explicitly close cursors opened by Motor? I'm pretty sure I don't need to do anything with the Connection object, but I couldn't find anything that told me my responsibility re: the cursor lifecycle.
On Friday, October 5, 2012 11:38:59 AM UTC-7, Daniel Kador wrote:
> Awesome, I was thinking along the same lines. I've seen this problem on > both my local development platform (Mac) and my CI box (Ubuntu). I've > looked into the file descriptor problem before and didn't come up with > anything definitive, but I'll dig in more. Both Jesse's and Ben's > suggestions sound like good places to start.
> I'm using MotorReplicaSetConnection. I don't create a new one per test, I > have a single module-level connection reference. I've tried explicitly > setting up the connection and then tearing it down on every test, which > didn't make a noticeable difference (except to significantly slow down my > tests :P).
> I'll dig in more and report back. I don't /believe/ I'm forgetting to call > super.tearDown(), but I'll double-check. If it's that simple I'll be > relieved (and slightly embarrassed).
> Thanks so much!
> -Dan
> On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu Davis wrote:
>> Agreed. Are you using MotorConnection or MotorReplicaSetConnection? Do >> you create a new connection per test? Does it help to call close() on that >> connection in tearDown() rather than simply relying on it to be >> garbage-collected when it's dereferenced?
>> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
>>> This sounds like it might be a file descriptor leak. Your tests might >>> be opening sockets and not closing them, which would cause problems >>> once you hit the file descriptor limit, which is relatively low by >>> default on many platforms. Try increasing the limit by running "uname >>> -n 10000" before running your tests (if you're on linux and the OS >>> won't let you, try increasing the settings in >>> /etc/security/limits.conf). If that works, then you're running out of >>> file descriptors. Once you've confirmed the problem, a better >>> long-term solution is to ensure that you're closing all the file >>> descriptors you're opening. This includes sockets and regular files, >>> as well as other file descriptors created by IOLoop and >>> AsyncHTTPClient (could you be forgetting to call super.tearDown() in >>> your tests?).
>>> -Ben
>>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> wrote: >>> > Hi!
>>> > I'm running into (what I perceive to be) a strange problem with my >>> tests. >>> > I've got an application written on top of Tornado that communicates >>> with >>> > Mongo through the Motor library. It's humming away nicely in >>> production >>> > (daemonized using supervisord). The only problems (that I can detect) >>> are >>> > with my tests. I'm very happy to acknowledge that this could be a >>> problem >>> > in my application layer, or in the Motor layer, or maybe even in >>> MongoDB >>> > itself - mostly I'm looking for some guidance as I'm not really sure >>> where >>> > to start digging in.
>>> > But let me step back and describe what I've noticed in more detail.
>>> > I have approximately ~150 tests that extend the >>> > tornado.testing.AsyncHTTPTestCase class, spread across a number of my >>> own >>> > classes. I have ~50 that are simple unit tests. They both >>> communicate to >>> > Mongo through Motor. The unit tests are fast and never fail (I >>> mention this >>> > only as a data point that may show Motor by itself may not be the >>> culprit). >>> > The AsyncHTTPTestCase tests all pass when their various test classes >>> are run >>> > individually.
>>> > But when I run them all together, after about ~65 tests running and >>> passing, >>> > I start getting timeouts in various spots. Sometimes it's when >>> interacting >>> > directly with Motor. Sometimes it's when making an async HTTP request >>> to >>> > one of the APIs I've created (which, under the covers, is talking to >>> Mongo >>> > through Motor). But it's always when calling (from a test) >>> self.wait().
>>> > It's getting to be a serious problem, as now I can't actually run all >>> my >>> > tests locally (at one time). Obviously not a recipe for test-driven >>> > development success.
>>> > I'm relatively sure I've done something stupid, but I haven't figured >>> out >>> > where yet. Would really appreciate some help here.
No, I'll add this to the docs -- a MotorCursor's __del__ calls the PyMongo Cursor's close(), which checks if the cursor needs to be closed on the server and if so sends the closeCursors command. So if you've fully iterated a cursor (with to_list, or next_object until it returns None) then the server has already closed the cursor, otherwise the client cursor closes itself on the server when it's dereferenced.
Motor's own test suite checks for leaked cursors after every test:
So you could do something similar in your tests, or manually run db.serverStatus().cursors in the Mongo shell while the tests are going. If it rises a lot, then we have a problem.
Is it Python or MongoDB that mostly uses CPU? I'm not surprised your CPU is pegged, nothing *else* would be the bottleneck in your unittests.
Can you run lsof|grep PID|wc -l, where PID is the PID of your Python process, and make sure that process itself isn't opening more than a dozen or so connections? What about with mongod's PID?
On Friday, October 5, 2012 2:11:19 PM UTC-7, Daniel Kador wrote:
> Quick update - I believe I've ruled out file descriptors as the source of > my problem. I wrote a small script that just repeatedly runs "lsof | wc > -l" (which, as I understand it, should give me the number of open file > descriptors on my system) and stores the results in a texfile. Before I > launched any tests, my open file descriptor count hovered around 7,300. As > I ran tests, the number fluctuated a bit, but never went above 7,405 and > never went below 7,221. So it feels like I'm not blowing out the > descriptor count anywhere.
> I also verified that the tearDown methods in AsyncTestCase and > AsyncHTTPTestCase are being called on every test.
> Some more details (not sure if they're useful or relevant, but I'm hoping > they are): when I run tests my CPU gets pegged. One of my cores goes to > about 90% and hovers there for the whole run. Also, I'm overriding > get_new_ioloop() to return the singleton instance.
> Jesse, question for you: do I need to explicitly close cursors opened by > Motor? I'm pretty sure I don't need to do anything with the Connection > object, but I couldn't find anything that told me my responsibility re: the > cursor lifecycle.
> Will keep digging in.
> Thanks again!
> -Dan
> On Friday, October 5, 2012 11:38:59 AM UTC-7, Daniel Kador wrote:
>> Awesome, I was thinking along the same lines. I've seen this problem on >> both my local development platform (Mac) and my CI box (Ubuntu). I've >> looked into the file descriptor problem before and didn't come up with >> anything definitive, but I'll dig in more. Both Jesse's and Ben's >> suggestions sound like good places to start.
>> I'm using MotorReplicaSetConnection. I don't create a new one per test, >> I have a single module-level connection reference. I've tried explicitly >> setting up the connection and then tearing it down on every test, which >> didn't make a noticeable difference (except to significantly slow down my >> tests :P).
>> I'll dig in more and report back. I don't /believe/ I'm forgetting to >> call super.tearDown(), but I'll double-check. If it's that simple I'll be >> relieved (and slightly embarrassed).
>> Thanks so much!
>> -Dan
>> On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu Davis >> wrote:
>>> Agreed. Are you using MotorConnection or MotorReplicaSetConnection? Do >>> you create a new connection per test? Does it help to call close() on that >>> connection in tearDown() rather than simply relying on it to be >>> garbage-collected when it's dereferenced?
>>> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
>>>> This sounds like it might be a file descriptor leak. Your tests might >>>> be opening sockets and not closing them, which would cause problems >>>> once you hit the file descriptor limit, which is relatively low by >>>> default on many platforms. Try increasing the limit by running "uname >>>> -n 10000" before running your tests (if you're on linux and the OS >>>> won't let you, try increasing the settings in >>>> /etc/security/limits.conf). If that works, then you're running out of >>>> file descriptors. Once you've confirmed the problem, a better >>>> long-term solution is to ensure that you're closing all the file >>>> descriptors you're opening. This includes sockets and regular files, >>>> as well as other file descriptors created by IOLoop and >>>> AsyncHTTPClient (could you be forgetting to call super.tearDown() in >>>> your tests?).
>>>> -Ben
>>>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> wrote: >>>> > Hi!
>>>> > I'm running into (what I perceive to be) a strange problem with my >>>> tests. >>>> > I've got an application written on top of Tornado that communicates >>>> with >>>> > Mongo through the Motor library. It's humming away nicely in >>>> production >>>> > (daemonized using supervisord). The only problems (that I can >>>> detect) are >>>> > with my tests. I'm very happy to acknowledge that this could be a >>>> problem >>>> > in my application layer, or in the Motor layer, or maybe even in >>>> MongoDB >>>> > itself - mostly I'm looking for some guidance as I'm not really sure >>>> where >>>> > to start digging in.
>>>> > But let me step back and describe what I've noticed in more detail.
>>>> > I have approximately ~150 tests that extend the >>>> > tornado.testing.AsyncHTTPTestCase class, spread across a number of my >>>> own >>>> > classes. I have ~50 that are simple unit tests. They both >>>> communicate to >>>> > Mongo through Motor. The unit tests are fast and never fail (I >>>> mention this >>>> > only as a data point that may show Motor by itself may not be the >>>> culprit). >>>> > The AsyncHTTPTestCase tests all pass when their various test classes >>>> are run >>>> > individually.
>>>> > But when I run them all together, after about ~65 tests running and >>>> passing, >>>> > I start getting timeouts in various spots. Sometimes it's when >>>> interacting >>>> > directly with Motor. Sometimes it's when making an async HTTP >>>> request to >>>> > one of the APIs I've created (which, under the covers, is talking to >>>> Mongo >>>> > through Motor). But it's always when calling (from a test) >>>> self.wait().
>>>> > It's getting to be a serious problem, as now I can't actually run all >>>> my >>>> > tests locally (at one time). Obviously not a recipe for test-driven >>>> > development success.
>>>> > I'm relatively sure I've done something stupid, but I haven't figured >>>> out >>>> > where yet. Would really appreciate some help here.
> No, I'll add this to the docs -- a MotorCursor's __del__ calls the PyMongo > Cursor's close(), which checks if the cursor needs to be closed on the > server and if so sends the closeCursors command. So if you've fully > iterated a cursor (with to_list, or next_object until it returns None) then > the server has already closed the cursor, otherwise the client cursor > closes itself on the server when it's dereferenced.
> Motor's own test suite checks for leaked cursors after every test:
> So you could do something similar in your tests, or manually run > db.serverStatus().cursors in the Mongo shell while the tests are going. If > it rises a lot, then we have a problem.
> Is it Python or MongoDB that mostly uses CPU? I'm not surprised your CPU > is pegged, nothing *else* would be the bottleneck in your unittests.
> Can you run lsof|grep PID|wc -l, where PID is the PID of your Python > process, and make sure that process itself isn't opening more than a dozen > or so connections? What about with mongod's PID?
> On Friday, October 5, 2012 2:11:19 PM UTC-7, Daniel Kador wrote:
>> Quick update - I believe I've ruled out file descriptors as the source of >> my problem. I wrote a small script that just repeatedly runs "lsof | wc >> -l" (which, as I understand it, should give me the number of open file >> descriptors on my system) and stores the results in a texfile. Before I >> launched any tests, my open file descriptor count hovered around 7,300. As >> I ran tests, the number fluctuated a bit, but never went above 7,405 and >> never went below 7,221. So it feels like I'm not blowing out the >> descriptor count anywhere.
>> I also verified that the tearDown methods in AsyncTestCase and >> AsyncHTTPTestCase are being called on every test.
>> Some more details (not sure if they're useful or relevant, but I'm hoping >> they are): when I run tests my CPU gets pegged. One of my cores goes to >> about 90% and hovers there for the whole run. Also, I'm overriding >> get_new_ioloop() to return the singleton instance.
>> Jesse, question for you: do I need to explicitly close cursors opened by >> Motor? I'm pretty sure I don't need to do anything with the Connection >> object, but I couldn't find anything that told me my responsibility re: the >> cursor lifecycle.
>> Will keep digging in.
>> Thanks again!
>> -Dan
>> On Friday, October 5, 2012 11:38:59 AM UTC-7, Daniel Kador wrote:
>>> Awesome, I was thinking along the same lines. I've seen this problem on >>> both my local development platform (Mac) and my CI box (Ubuntu). I've >>> looked into the file descriptor problem before and didn't come up with >>> anything definitive, but I'll dig in more. Both Jesse's and Ben's >>> suggestions sound like good places to start.
>>> I'm using MotorReplicaSetConnection. I don't create a new one per test, >>> I have a single module-level connection reference. I've tried explicitly >>> setting up the connection and then tearing it down on every test, which >>> didn't make a noticeable difference (except to significantly slow down my >>> tests :P).
>>> I'll dig in more and report back. I don't /believe/ I'm forgetting to >>> call super.tearDown(), but I'll double-check. If it's that simple I'll be >>> relieved (and slightly embarrassed).
>>> Thanks so much!
>>> -Dan
>>> On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu Davis >>> wrote:
>>>> Agreed. Are you using MotorConnection or MotorReplicaSetConnection? Do >>>> you create a new connection per test? Does it help to call close() on that >>>> connection in tearDown() rather than simply relying on it to be >>>> garbage-collected when it's dereferenced?
>>>> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
>>>>> This sounds like it might be a file descriptor leak. Your tests might >>>>> be opening sockets and not closing them, which would cause problems >>>>> once you hit the file descriptor limit, which is relatively low by >>>>> default on many platforms. Try increasing the limit by running "uname >>>>> -n 10000" before running your tests (if you're on linux and the OS >>>>> won't let you, try increasing the settings in >>>>> /etc/security/limits.conf). If that works, then you're running out of >>>>> file descriptors. Once you've confirmed the problem, a better >>>>> long-term solution is to ensure that you're closing all the file >>>>> descriptors you're opening. This includes sockets and regular files, >>>>> as well as other file descriptors created by IOLoop and >>>>> AsyncHTTPClient (could you be forgetting to call super.tearDown() in >>>>> your tests?).
>>>>> -Ben
>>>>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> wrote: >>>>> > Hi!
>>>>> > I'm running into (what I perceive to be) a strange problem with my >>>>> tests. >>>>> > I've got an application written on top of Tornado that communicates >>>>> with >>>>> > Mongo through the Motor library. It's humming away nicely in >>>>> production >>>>> > (daemonized using supervisord). The only problems (that I can >>>>> detect) are >>>>> > with my tests. I'm very happy to acknowledge that this could be a >>>>> problem >>>>> > in my application layer, or in the Motor layer, or maybe even in >>>>> MongoDB >>>>> > itself - mostly I'm looking for some guidance as I'm not really sure >>>>> where >>>>> > to start digging in.
>>>>> > But let me step back and describe what I've noticed in more detail.
>>>>> > I have approximately ~150 tests that extend the >>>>> > tornado.testing.AsyncHTTPTestCase class, spread across a number of >>>>> my own >>>>> > classes. I have ~50 that are simple unit tests. They both >>>>> communicate to >>>>> > Mongo through Motor. The unit tests are fast and never fail (I >>>>> mention this >>>>> > only as a data point that may show Motor by itself may not be the >>>>> culprit). >>>>> > The AsyncHTTPTestCase tests all pass when their various test classes >>>>> are run >>>>> > individually.
>>>>> > But when I run them all together, after about ~65 tests running and >>>>> passing, >>>>> > I start getting timeouts in various spots. Sometimes it's when >>>>> interacting >>>>> > directly with Motor. Sometimes it's when making an async HTTP >>>>> request to >>>>> > one of the APIs I've created (which, under the covers, is talking to >>>>> Mongo >>>>> > through Motor). But it's always when calling (from a test) >>>>> self.wait().
>>>>> > It's getting to be a serious problem, as now I can't actually run >>>>> all my >>>>> > tests locally (at one time). Obviously not a recipe for test-driven >>>>> > development success.
>>>>> > I'm relatively sure I've done something stupid, but I haven't >>>>> figured out >>>>> > where yet. Would really appreciate some help here.
The doc additions looks great, thanks for handling it so quickly.
db.serverStatus().cursors reports nothing interesting at all. The cursors are definitely being closed (which is good).
The CPU is being used by Python. Mongo hardly has a blip.
I ran lsof -p PID | wc -l, where PID was the PID of my Python process. The count stays steady around 55. Most of the results are related to the .so files loaded by Python. There are about 10 related to sockets, which seems pretty reasonable to me.
Mongo stays pretty consistent around 113. Again, that doesn't sound crazy.
It definitely *feels* like the Python layer is leaking something, somewhere. Is there an easy way to see what's registered on the ioloop? Or just simply log what the ioloop is currently working on? I wonder if my code is somehow not terminating in some cases? Or taking much longer than I think?
On Friday, October 5, 2012 3:09:38 PM UTC-7, A. Jesse Jiryu Davis wrote:
> Let me know what you find out regarding your tests, Dan, or any other way > I can help. I want anyone who uses Motor to succeed. =)
> On Friday, October 5, 2012 2:53:00 PM UTC-7, A. Jesse Jiryu Davis wrote:
>> No, I'll add this to the docs -- a MotorCursor's __del__ calls the >> PyMongo Cursor's close(), which checks if the cursor needs to be closed on >> the server and if so sends the closeCursors command. So if you've fully >> iterated a cursor (with to_list, or next_object until it returns None) then >> the server has already closed the cursor, otherwise the client cursor >> closes itself on the server when it's dereferenced.
>> Motor's own test suite checks for leaked cursors after every test:
>> So you could do something similar in your tests, or manually run >> db.serverStatus().cursors in the Mongo shell while the tests are going. If >> it rises a lot, then we have a problem.
>> Is it Python or MongoDB that mostly uses CPU? I'm not surprised your CPU >> is pegged, nothing *else* would be the bottleneck in your unittests.
>> Can you run lsof|grep PID|wc -l, where PID is the PID of your Python >> process, and make sure that process itself isn't opening more than a dozen >> or so connections? What about with mongod's PID?
>> On Friday, October 5, 2012 2:11:19 PM UTC-7, Daniel Kador wrote:
>>> Quick update - I believe I've ruled out file descriptors as the source >>> of my problem. I wrote a small script that just repeatedly runs "lsof | wc >>> -l" (which, as I understand it, should give me the number of open file >>> descriptors on my system) and stores the results in a texfile. Before I >>> launched any tests, my open file descriptor count hovered around 7,300. As >>> I ran tests, the number fluctuated a bit, but never went above 7,405 and >>> never went below 7,221. So it feels like I'm not blowing out the >>> descriptor count anywhere.
>>> I also verified that the tearDown methods in AsyncTestCase and >>> AsyncHTTPTestCase are being called on every test.
>>> Some more details (not sure if they're useful or relevant, but I'm >>> hoping they are): when I run tests my CPU gets pegged. One of my cores >>> goes to about 90% and hovers there for the whole run. Also, I'm overriding >>> get_new_ioloop() to return the singleton instance.
>>> Jesse, question for you: do I need to explicitly close cursors opened by >>> Motor? I'm pretty sure I don't need to do anything with the Connection >>> object, but I couldn't find anything that told me my responsibility re: the >>> cursor lifecycle.
>>> Will keep digging in.
>>> Thanks again!
>>> -Dan
>>> On Friday, October 5, 2012 11:38:59 AM UTC-7, Daniel Kador wrote:
>>>> Awesome, I was thinking along the same lines. I've seen this problem >>>> on both my local development platform (Mac) and my CI box (Ubuntu). I've >>>> looked into the file descriptor problem before and didn't come up with >>>> anything definitive, but I'll dig in more. Both Jesse's and Ben's >>>> suggestions sound like good places to start.
>>>> I'm using MotorReplicaSetConnection. I don't create a new one per >>>> test, I have a single module-level connection reference. I've tried >>>> explicitly setting up the connection and then tearing it down on every >>>> test, which didn't make a noticeable difference (except to significantly >>>> slow down my tests :P).
>>>> I'll dig in more and report back. I don't /believe/ I'm forgetting to >>>> call super.tearDown(), but I'll double-check. If it's that simple I'll be >>>> relieved (and slightly embarrassed).
>>>> Thanks so much!
>>>> -Dan
>>>> On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu Davis >>>> wrote:
>>>>> Agreed. Are you using MotorConnection or MotorReplicaSetConnection? Do >>>>> you create a new connection per test? Does it help to call close() on that >>>>> connection in tearDown() rather than simply relying on it to be >>>>> garbage-collected when it's dereferenced?
>>>>> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
>>>>>> This sounds like it might be a file descriptor leak. Your tests >>>>>> might >>>>>> be opening sockets and not closing them, which would cause problems >>>>>> once you hit the file descriptor limit, which is relatively low by >>>>>> default on many platforms. Try increasing the limit by running >>>>>> "uname >>>>>> -n 10000" before running your tests (if you're on linux and the OS >>>>>> won't let you, try increasing the settings in >>>>>> /etc/security/limits.conf). If that works, then you're running out >>>>>> of >>>>>> file descriptors. Once you've confirmed the problem, a better >>>>>> long-term solution is to ensure that you're closing all the file >>>>>> descriptors you're opening. This includes sockets and regular files, >>>>>> as well as other file descriptors created by IOLoop and >>>>>> AsyncHTTPClient (could you be forgetting to call super.tearDown() in >>>>>> your tests?).
>>>>>> -Ben
>>>>>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> wrote: >>>>>> > Hi!
>>>>>> > I'm running into (what I perceive to be) a strange problem with my >>>>>> tests. >>>>>> > I've got an application written on top of Tornado that communicates >>>>>> with >>>>>> > Mongo through the Motor library. It's humming away nicely in >>>>>> production >>>>>> > (daemonized using supervisord). The only problems (that I can >>>>>> detect) are >>>>>> > with my tests. I'm very happy to acknowledge that this could be a >>>>>> problem >>>>>> > in my application layer, or in the Motor layer, or maybe even in >>>>>> MongoDB >>>>>> > itself - mostly I'm looking for some guidance as I'm not really >>>>>> sure where >>>>>> > to start digging in.
>>>>>> > But let me step back and describe what I've noticed in more detail.
>>>>>> > I have approximately ~150 tests that extend the >>>>>> > tornado.testing.AsyncHTTPTestCase class, spread across a number of >>>>>> my own >>>>>> > classes. I have ~50 that are simple unit tests. They both >>>>>> communicate to >>>>>> > Mongo through Motor. The unit tests are fast and never fail (I >>>>>> mention this >>>>>> > only as a data point that may show Motor by itself may not be the >>>>>> culprit). >>>>>> > The AsyncHTTPTestCase tests all pass when their various test >>>>>> classes are run >>>>>> > individually.
>>>>>> > But when I run them all together, after about ~65 tests running and >>>>>> passing, >>>>>> > I start getting timeouts in various spots. Sometimes it's when >>>>>> interacting >>>>>> > directly with Motor. Sometimes it's when making an async HTTP >>>>>> request to >>>>>> > one of the APIs I've created (which, under the covers, is talking >>>>>> to Mongo >>>>>> > through Motor). But it's always when calling (from a test) >>>>>> self.wait().
>>>>>> > It's getting to be a serious problem, as now I can't actually run >>>>>> all my >>>>>> > tests locally (at one time). Obviously not a recipe for >>>>>> test-driven >>>>>> > development success.
>>>>>> > I'm relatively sure I've done something stupid, but I haven't >>>>>> figured out >>>>>> > where yet. Would really appreciate some help here.
I appear to have found a fix. I was previously returning an Application instance in self.get_app() with the debug kwarg set to True (tests were inheriting the default value for running in dev mode) and now everything is quite speedy. More importantly, the tests actually all run! I'm satisfied, but if Ben or Jesse wants me to dig in more to figure out what, exactly, debug mode is doing, I'm happy to help out.
One thing I noticed: I was able to reproduce the problem in a small test class with ~100 tests. The ONLY way I could repro, though, was if I had debug mode on AND I returned the singleton IOLoop from IOLoop.instance() in self.get_new_ioloop() AND each test made a simple HTTP request AND my application used my request handlers (if I tried with a barebones app it was all fine).
So it *appears* as if there's some strange interaction between my handlers and debug mode?
On Friday, October 5, 2012 3:21:19 PM UTC-7, Daniel Kador wrote:
> The doc additions looks great, thanks for handling it so quickly.
> db.serverStatus().cursors reports nothing interesting at all. The cursors > are definitely being closed (which is good).
> The CPU is being used by Python. Mongo hardly has a blip.
> I ran lsof -p PID | wc -l, where PID was the PID of my Python process. > The count stays steady around 55. Most of the results are related to the > .so files loaded by Python. There are about 10 related to sockets, which > seems pretty reasonable to me.
> Mongo stays pretty consistent around 113. Again, that doesn't sound crazy.
> It definitely *feels* like the Python layer is leaking something, > somewhere. Is there an easy way to see what's registered on the ioloop? > Or just simply log what the ioloop is currently working on? I wonder if my > code is somehow not terminating in some cases? Or taking much longer than > I think?
> On Friday, October 5, 2012 3:09:38 PM UTC-7, A. Jesse Jiryu Davis wrote:
>> Let me know what you find out regarding your tests, Dan, or any other way >> I can help. I want anyone who uses Motor to succeed. =)
>> On Friday, October 5, 2012 2:53:00 PM UTC-7, A. Jesse Jiryu Davis wrote:
>>> No, I'll add this to the docs -- a MotorCursor's __del__ calls the >>> PyMongo Cursor's close(), which checks if the cursor needs to be closed on >>> the server and if so sends the closeCursors command. So if you've fully >>> iterated a cursor (with to_list, or next_object until it returns None) then >>> the server has already closed the cursor, otherwise the client cursor >>> closes itself on the server when it's dereferenced.
>>> Motor's own test suite checks for leaked cursors after every test:
>>> So you could do something similar in your tests, or manually run >>> db.serverStatus().cursors in the Mongo shell while the tests are going. If >>> it rises a lot, then we have a problem.
>>> Is it Python or MongoDB that mostly uses CPU? I'm not surprised your CPU >>> is pegged, nothing *else* would be the bottleneck in your unittests.
>>> Can you run lsof|grep PID|wc -l, where PID is the PID of your Python >>> process, and make sure that process itself isn't opening more than a dozen >>> or so connections? What about with mongod's PID?
>>> On Friday, October 5, 2012 2:11:19 PM UTC-7, Daniel Kador wrote:
>>>> Quick update - I believe I've ruled out file descriptors as the source >>>> of my problem. I wrote a small script that just repeatedly runs "lsof | wc >>>> -l" (which, as I understand it, should give me the number of open file >>>> descriptors on my system) and stores the results in a texfile. Before I >>>> launched any tests, my open file descriptor count hovered around 7,300. As >>>> I ran tests, the number fluctuated a bit, but never went above 7,405 and >>>> never went below 7,221. So it feels like I'm not blowing out the >>>> descriptor count anywhere.
>>>> I also verified that the tearDown methods in AsyncTestCase and >>>> AsyncHTTPTestCase are being called on every test.
>>>> Some more details (not sure if they're useful or relevant, but I'm >>>> hoping they are): when I run tests my CPU gets pegged. One of my cores >>>> goes to about 90% and hovers there for the whole run. Also, I'm overriding >>>> get_new_ioloop() to return the singleton instance.
>>>> Jesse, question for you: do I need to explicitly close cursors opened >>>> by Motor? I'm pretty sure I don't need to do anything with the Connection >>>> object, but I couldn't find anything that told me my responsibility re: the >>>> cursor lifecycle.
>>>> Will keep digging in.
>>>> Thanks again!
>>>> -Dan
>>>> On Friday, October 5, 2012 11:38:59 AM UTC-7, Daniel Kador wrote:
>>>>> Awesome, I was thinking along the same lines. I've seen this problem >>>>> on both my local development platform (Mac) and my CI box (Ubuntu). I've >>>>> looked into the file descriptor problem before and didn't come up with >>>>> anything definitive, but I'll dig in more. Both Jesse's and Ben's >>>>> suggestions sound like good places to start.
>>>>> I'm using MotorReplicaSetConnection. I don't create a new one per >>>>> test, I have a single module-level connection reference. I've tried >>>>> explicitly setting up the connection and then tearing it down on every >>>>> test, which didn't make a noticeable difference (except to significantly >>>>> slow down my tests :P).
>>>>> I'll dig in more and report back. I don't /believe/ I'm forgetting to >>>>> call super.tearDown(), but I'll double-check. If it's that simple I'll be >>>>> relieved (and slightly embarrassed).
>>>>> Thanks so much!
>>>>> -Dan
>>>>> On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu Davis >>>>> wrote:
>>>>>> Agreed. Are you using MotorConnection or MotorReplicaSetConnection? >>>>>> Do you create a new connection per test? Does it help to call close() on >>>>>> that connection in tearDown() rather than simply relying on it to be >>>>>> garbage-collected when it's dereferenced?
>>>>>> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
>>>>>>> This sounds like it might be a file descriptor leak. Your tests >>>>>>> might >>>>>>> be opening sockets and not closing them, which would cause problems >>>>>>> once you hit the file descriptor limit, which is relatively low by >>>>>>> default on many platforms. Try increasing the limit by running >>>>>>> "uname >>>>>>> -n 10000" before running your tests (if you're on linux and the OS >>>>>>> won't let you, try increasing the settings in >>>>>>> /etc/security/limits.conf). If that works, then you're running out >>>>>>> of >>>>>>> file descriptors. Once you've confirmed the problem, a better >>>>>>> long-term solution is to ensure that you're closing all the file >>>>>>> descriptors you're opening. This includes sockets and regular >>>>>>> files, >>>>>>> as well as other file descriptors created by IOLoop and >>>>>>> AsyncHTTPClient (could you be forgetting to call super.tearDown() in >>>>>>> your tests?).
>>>>>>> -Ben
>>>>>>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> wrote: >>>>>>> > Hi!
>>>>>>> > I'm running into (what I perceive to be) a strange problem with my >>>>>>> tests. >>>>>>> > I've got an application written on top of Tornado that >>>>>>> communicates with >>>>>>> > Mongo through the Motor library. It's humming away nicely in >>>>>>> production >>>>>>> > (daemonized using supervisord). The only problems (that I can >>>>>>> detect) are >>>>>>> > with my tests. I'm very happy to acknowledge that this could be a >>>>>>> problem >>>>>>> > in my application layer, or in the Motor layer, or maybe even in >>>>>>> MongoDB >>>>>>> > itself - mostly I'm looking for some guidance as I'm not really >>>>>>> sure where >>>>>>> > to start digging in.
>>>>>>> > But let me step back and describe what I've noticed in more >>>>>>> detail.
>>>>>>> > I have approximately ~150 tests that extend the >>>>>>> > tornado.testing.AsyncHTTPTestCase class, spread across a number of >>>>>>> my own >>>>>>> > classes. I have ~50 that are simple unit tests. They both >>>>>>> communicate to >>>>>>> > Mongo through Motor. The unit tests are fast and never fail (I >>>>>>> mention this >>>>>>> > only as a data point that may show Motor by itself may not be the >>>>>>> culprit). >>>>>>> > The AsyncHTTPTestCase tests all pass when their various test >>>>>>> classes are run >>>>>>> > individually.
>>>>>>> > But when I run them all together, after about ~65 tests running >>>>>>> and passing, >>>>>>> > I start getting timeouts in various spots. Sometimes it's when >>>>>>> interacting >>>>>>> > directly with Motor. Sometimes it's when making an async HTTP >>>>>>> request to >>>>>>> > one of the APIs I've created (which, under the covers, is talking >>>>>>> to Mongo >>>>>>> > through Motor). But it's always when calling (from a test) >>>>>>> self.wait().
>>>>>>> > It's getting to be a serious problem, as now I can't actually run >>>>>>> all my >>>>>>> > tests locally (at one time). Obviously not a recipe for >>>>>>> test-driven >>>>>>> > development success.
>>>>>>> > I'm relatively sure I've done something stupid, but I haven't >>>>>>> figured out >>>>>>> > where yet. Would really appreciate some help here.
Oh, I see. What's happening is that each time you create a debug-mode
application, it calls autoreload.start(), which schedules a callback
every 500ms on the IOLoop to check the modification times of all the
source files. As you run more tests, those accumulate, and eventually
the server can hardly do anything else. Autoreload should definitely
check to see if it's being started on the same IOLoop more than once.
(and maybe it should warn if it's being started on two different
IOLoops in the same process, since that's not really useful).
On Fri, Oct 5, 2012 at 4:44 PM, Daniel Kador <d...@keen.io> wrote:
> I appear to have found a fix. I was previously returning an Application
> instance in self.get_app() with the debug kwarg set to True (tests were
> inheriting the default value for running in dev mode) and now everything is
> quite speedy. More importantly, the tests actually all run! I'm satisfied,
> but if Ben or Jesse wants me to dig in more to figure out what, exactly,
> debug mode is doing, I'm happy to help out.
> One thing I noticed: I was able to reproduce the problem in a small test
> class with ~100 tests. The ONLY way I could repro, though, was if I had
> debug mode on AND I returned the singleton IOLoop from IOLoop.instance() in
> self.get_new_ioloop() AND each test made a simple HTTP request AND my
> application used my request handlers (if I tried with a barebones app it was
> all fine).
> So it appears as if there's some strange interaction between my handlers and
> debug mode?
> On Friday, October 5, 2012 3:21:19 PM UTC-7, Daniel Kador wrote:
>> The doc additions looks great, thanks for handling it so quickly.
>> db.serverStatus().cursors reports nothing interesting at all. The cursors
>> are definitely being closed (which is good).
>> The CPU is being used by Python. Mongo hardly has a blip.
>> I ran lsof -p PID | wc -l, where PID was the PID of my Python process.
>> The count stays steady around 55. Most of the results are related to the
>> .so files loaded by Python. There are about 10 related to sockets, which
>> seems pretty reasonable to me.
>> Mongo stays pretty consistent around 113. Again, that doesn't sound
>> crazy.
>> It definitely *feels* like the Python layer is leaking something,
>> somewhere. Is there an easy way to see what's registered on the ioloop? Or
>> just simply log what the ioloop is currently working on? I wonder if my code
>> is somehow not terminating in some cases? Or taking much longer than I
>> think?
>> On Friday, October 5, 2012 3:09:38 PM UTC-7, A. Jesse Jiryu Davis wrote:
>>> Let me know what you find out regarding your tests, Dan, or any other way
>>> I can help. I want anyone who uses Motor to succeed. =)
>>> On Friday, October 5, 2012 2:53:00 PM UTC-7, A. Jesse Jiryu Davis wrote:
>>>> No, I'll add this to the docs -- a MotorCursor's __del__ calls the
>>>> PyMongo Cursor's close(), which checks if the cursor needs to be closed on
>>>> the server and if so sends the closeCursors command. So if you've fully
>>>> iterated a cursor (with to_list, or next_object until it returns None) then
>>>> the server has already closed the cursor, otherwise the client cursor closes
>>>> itself on the server when it's dereferenced.
>>>> Motor's own test suite checks for leaked cursors after every test:
>>>> So you could do something similar in your tests, or manually run
>>>> db.serverStatus().cursors in the Mongo shell while the tests are going. If
>>>> it rises a lot, then we have a problem.
>>>> Is it Python or MongoDB that mostly uses CPU? I'm not surprised your CPU
>>>> is pegged, nothing *else* would be the bottleneck in your unittests.
>>>> Can you run lsof|grep PID|wc -l, where PID is the PID of your Python
>>>> process, and make sure that process itself isn't opening more than a dozen
>>>> or so connections? What about with mongod's PID?
>>>> On Friday, October 5, 2012 2:11:19 PM UTC-7, Daniel Kador wrote:
>>>>> Quick update - I believe I've ruled out file descriptors as the source
>>>>> of my problem. I wrote a small script that just repeatedly runs "lsof | wc
>>>>> -l" (which, as I understand it, should give me the number of open file
>>>>> descriptors on my system) and stores the results in a texfile. Before I
>>>>> launched any tests, my open file descriptor count hovered around 7,300. As
>>>>> I ran tests, the number fluctuated a bit, but never went above 7,405 and
>>>>> never went below 7,221. So it feels like I'm not blowing out the descriptor
>>>>> count anywhere.
>>>>> I also verified that the tearDown methods in AsyncTestCase and
>>>>> AsyncHTTPTestCase are being called on every test.
>>>>> Some more details (not sure if they're useful or relevant, but I'm
>>>>> hoping they are): when I run tests my CPU gets pegged. One of my cores goes
>>>>> to about 90% and hovers there for the whole run. Also, I'm overriding
>>>>> get_new_ioloop() to return the singleton instance.
>>>>> Jesse, question for you: do I need to explicitly close cursors opened
>>>>> by Motor? I'm pretty sure I don't need to do anything with the Connection
>>>>> object, but I couldn't find anything that told me my responsibility re: the
>>>>> cursor lifecycle.
>>>>> Will keep digging in.
>>>>> Thanks again!
>>>>> -Dan
>>>>> On Friday, October 5, 2012 11:38:59 AM UTC-7, Daniel Kador wrote:
>>>>>> Awesome, I was thinking along the same lines. I've seen this problem
>>>>>> on both my local development platform (Mac) and my CI box (Ubuntu). I've
>>>>>> looked into the file descriptor problem before and didn't come up with
>>>>>> anything definitive, but I'll dig in more. Both Jesse's and Ben's
>>>>>> suggestions sound like good places to start.
>>>>>> I'm using MotorReplicaSetConnection. I don't create a new one per
>>>>>> test, I have a single module-level connection reference. I've tried
>>>>>> explicitly setting up the connection and then tearing it down on every test,
>>>>>> which didn't make a noticeable difference (except to significantly slow down
>>>>>> my tests :P).
>>>>>> I'll dig in more and report back. I don't /believe/ I'm forgetting to
>>>>>> call super.tearDown(), but I'll double-check. If it's that simple I'll be
>>>>>> relieved (and slightly embarrassed).
>>>>>> Thanks so much!
>>>>>> -Dan
>>>>>> On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu Davis
>>>>>> wrote:
>>>>>>> Agreed. Are you using MotorConnection or MotorReplicaSetConnection?
>>>>>>> Do you create a new connection per test? Does it help to call close() on
>>>>>>> that connection in tearDown() rather than simply relying on it to be
>>>>>>> garbage-collected when it's dereferenced?
>>>>>>> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
>>>>>>>> This sounds like it might be a file descriptor leak. Your tests
>>>>>>>> might
>>>>>>>> be opening sockets and not closing them, which would cause problems
>>>>>>>> once you hit the file descriptor limit, which is relatively low by
>>>>>>>> default on many platforms. Try increasing the limit by running
>>>>>>>> "uname
>>>>>>>> -n 10000" before running your tests (if you're on linux and the OS
>>>>>>>> won't let you, try increasing the settings in
>>>>>>>> /etc/security/limits.conf). If that works, then you're running out
>>>>>>>> of
>>>>>>>> file descriptors. Once you've confirmed the problem, a better
>>>>>>>> long-term solution is to ensure that you're closing all the file
>>>>>>>> descriptors you're opening. This includes sockets and regular
>>>>>>>> files,
>>>>>>>> as well as other file descriptors created by IOLoop and
>>>>>>>> AsyncHTTPClient (could you be forgetting to call super.tearDown() in
>>>>>>>> your tests?).
>>>>>>>> -Ben
>>>>>>>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> wrote:
>>>>>>>> > Hi!
>>>>>>>> > I'm running into (what I perceive to be) a strange problem with my
>>>>>>>> > tests.
>>>>>>>> > I've got an application written on top of Tornado that
>>>>>>>> > communicates with
>>>>>>>> > Mongo through the Motor library. It's humming away nicely in
>>>>>>>> > production
>>>>>>>> > (daemonized using supervisord). The only problems (that I can
>>>>>>>> > detect) are
>>>>>>>> > with my tests. I'm very happy to acknowledge that this could be a
>>>>>>>> > problem
>>>>>>>> > in my application layer, or in the Motor layer, or maybe even in
>>>>>>>> > MongoDB
>>>>>>>> > itself - mostly I'm looking for some guidance as I'm not really
>>>>>>>> > sure where
>>>>>>>> > to start digging in.
>>>>>>>> > But let me step back and describe what I've noticed in more
>>>>>>>> > detail.
>>>>>>>> > I have approximately ~150 tests that extend the
>>>>>>>> > tornado.testing.AsyncHTTPTestCase class, spread across a number of
>>>>>>>> > my own
>>>>>>>> > classes. I have ~50 that are simple unit tests. They both
>>>>>>>> > communicate to
>>>>>>>> > Mongo through Motor. The unit tests are fast and never fail (I
>>>>>>>> > mention this
>>>>>>>> > only as a data point that may show Motor by itself may not be the
>>>>>>>> > culprit).
>>>>>>>> > The AsyncHTTPTestCase tests all pass when their various test
>>>>>>>> > classes are run
>>>>>>>> > individually.
>>>>>>>> > But when I run them all together, after about ~65 tests running
>>>>>>>> > and passing,
>>>>>>>> > I start getting timeouts in various spots. Sometimes it's when
On Friday, October 5, 2012 8:47:27 PM UTC-7, Ben Darnell wrote:
> Oh, I see. What's happening is that each time you create a debug-mode > application, it calls autoreload.start(), which schedules a callback > every 500ms on the IOLoop to check the modification times of all the > source files. As you run more tests, those accumulate, and eventually > the server can hardly do anything else. Autoreload should definitely > check to see if it's being started on the same IOLoop more than once. > (and maybe it should warn if it's being started on two different > IOLoops in the same process, since that's not really useful).
> -Ben
> On Fri, Oct 5, 2012 at 4:44 PM, Daniel Kador <d...@keen.io <javascript:>> > wrote: > > I appear to have found a fix. I was previously returning an Application > > instance in self.get_app() with the debug kwarg set to True (tests were > > inheriting the default value for running in dev mode) and now everything > is > > quite speedy. More importantly, the tests actually all run! I'm > satisfied, > > but if Ben or Jesse wants me to dig in more to figure out what, exactly, > > debug mode is doing, I'm happy to help out.
> > One thing I noticed: I was able to reproduce the problem in a small test > > class with ~100 tests. The ONLY way I could repro, though, was if I had > > debug mode on AND I returned the singleton IOLoop from IOLoop.instance() > in > > self.get_new_ioloop() AND each test made a simple HTTP request AND my > > application used my request handlers (if I tried with a barebones app it > was > > all fine).
> > So it appears as if there's some strange interaction between my handlers > and > > debug mode?
> > On Friday, October 5, 2012 3:21:19 PM UTC-7, Daniel Kador wrote:
> >> The doc additions looks great, thanks for handling it so quickly.
> >> db.serverStatus().cursors reports nothing interesting at all. The > cursors > >> are definitely being closed (which is good).
> >> The CPU is being used by Python. Mongo hardly has a blip.
> >> I ran lsof -p PID | wc -l, where PID was the PID of my Python process. > >> The count stays steady around 55. Most of the results are related to > the > >> .so files loaded by Python. There are about 10 related to sockets, > which > >> seems pretty reasonable to me.
> >> Mongo stays pretty consistent around 113. Again, that doesn't sound > >> crazy.
> >> It definitely *feels* like the Python layer is leaking something, > >> somewhere. Is there an easy way to see what's registered on the > ioloop? Or > >> just simply log what the ioloop is currently working on? I wonder if my > code > >> is somehow not terminating in some cases? Or taking much longer than I > >> think?
> >> On Friday, October 5, 2012 3:09:38 PM UTC-7, A. Jesse Jiryu Davis > wrote:
> >>> Let me know what you find out regarding your tests, Dan, or any other > way > >>> I can help. I want anyone who uses Motor to succeed. =)
> >>> On Friday, October 5, 2012 2:53:00 PM UTC-7, A. Jesse Jiryu Davis > wrote:
> >>>> No, I'll add this to the docs -- a MotorCursor's __del__ calls the > >>>> PyMongo Cursor's close(), which checks if the cursor needs to be > closed on > >>>> the server and if so sends the closeCursors command. So if you've > fully > >>>> iterated a cursor (with to_list, or next_object until it returns > None) then > >>>> the server has already closed the cursor, otherwise the client cursor > closes > >>>> itself on the server when it's dereferenced.
> >>>> Motor's own test suite checks for leaked cursors after every test:
> >>>> So you could do something similar in your tests, or manually run > >>>> db.serverStatus().cursors in the Mongo shell while the tests are > going. If > >>>> it rises a lot, then we have a problem.
> >>>> Is it Python or MongoDB that mostly uses CPU? I'm not surprised your > CPU > >>>> is pegged, nothing *else* would be the bottleneck in your unittests.
> >>>> Can you run lsof|grep PID|wc -l, where PID is the PID of your Python > >>>> process, and make sure that process itself isn't opening more than a > dozen > >>>> or so connections? What about with mongod's PID?
> >>>> On Friday, October 5, 2012 2:11:19 PM UTC-7, Daniel Kador wrote:
> >>>>> Quick update - I believe I've ruled out file descriptors as the > source > >>>>> of my problem. I wrote a small script that just repeatedly runs > "lsof | wc > >>>>> -l" (which, as I understand it, should give me the number of open > file > >>>>> descriptors on my system) and stores the results in a texfile. > Before I > >>>>> launched any tests, my open file descriptor count hovered around > 7,300. As > >>>>> I ran tests, the number fluctuated a bit, but never went above 7,405 > and > >>>>> never went below 7,221. So it feels like I'm not blowing out the > descriptor > >>>>> count anywhere.
> >>>>> I also verified that the tearDown methods in AsyncTestCase and > >>>>> AsyncHTTPTestCase are being called on every test.
> >>>>> Some more details (not sure if they're useful or relevant, but I'm > >>>>> hoping they are): when I run tests my CPU gets pegged. One of my > cores goes > >>>>> to about 90% and hovers there for the whole run. Also, I'm > overriding > >>>>> get_new_ioloop() to return the singleton instance.
> >>>>> Jesse, question for you: do I need to explicitly close cursors > opened > >>>>> by Motor? I'm pretty sure I don't need to do anything with the > Connection > >>>>> object, but I couldn't find anything that told me my responsibility > re: the > >>>>> cursor lifecycle.
> >>>>> Will keep digging in.
> >>>>> Thanks again!
> >>>>> -Dan
> >>>>> On Friday, October 5, 2012 11:38:59 AM UTC-7, Daniel Kador wrote:
> >>>>>> Awesome, I was thinking along the same lines. I've seen this > problem > >>>>>> on both my local development platform (Mac) and my CI box (Ubuntu). > I've > >>>>>> looked into the file descriptor problem before and didn't come up > with > >>>>>> anything definitive, but I'll dig in more. Both Jesse's and Ben's > >>>>>> suggestions sound like good places to start.
> >>>>>> I'm using MotorReplicaSetConnection. I don't create a new one per > >>>>>> test, I have a single module-level connection reference. I've > tried > >>>>>> explicitly setting up the connection and then tearing it down on > every test, > >>>>>> which didn't make a noticeable difference (except to significantly > slow down > >>>>>> my tests :P).
> >>>>>> I'll dig in more and report back. I don't /believe/ I'm forgetting > to > >>>>>> call super.tearDown(), but I'll double-check. If it's that simple > I'll be > >>>>>> relieved (and slightly embarrassed).
> >>>>>> Thanks so much!
> >>>>>> -Dan
> >>>>>> On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu > Davis > >>>>>> wrote:
> >>>>>>> Agreed. Are you using MotorConnection or > MotorReplicaSetConnection? > >>>>>>> Do you create a new connection per test? Does it help to call > close() on > >>>>>>> that connection in tearDown() rather than simply relying on it to > be > >>>>>>> garbage-collected when it's dereferenced?
> >>>>>>> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
> >>>>>>>> This sounds like it might be a file descriptor leak. Your tests > >>>>>>>> might > >>>>>>>> be opening sockets and not closing them, which would cause > problems > >>>>>>>> once you hit the file descriptor limit, which is relatively low > by > >>>>>>>> default on many platforms. Try increasing the limit by running > >>>>>>>> "uname > >>>>>>>> -n 10000" before running your tests (if you're on linux and the > OS > >>>>>>>> won't let you, try increasing the settings in > >>>>>>>> /etc/security/limits.conf). If that works, then you're running > out > >>>>>>>> of > >>>>>>>> file descriptors. Once you've confirmed the problem, a better > >>>>>>>> long-term solution is to ensure that you're closing all the file > >>>>>>>> descriptors you're opening. This includes sockets and regular > >>>>>>>> files, > >>>>>>>> as well as other file descriptors created by IOLoop and > >>>>>>>> AsyncHTTPClient (could you be forgetting to call super.tearDown() > in > >>>>>>>> your tests?).
> >>>>>>>> -Ben
> >>>>>>>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> > wrote: > >>>>>>>> > Hi!
> >>>>>>>> > I'm running into (what I perceive to be) a strange problem with > my > >>>>>>>> > tests. > >>>>>>>> > I've got an application written on top of Tornado that > >>>>>>>> > communicates with > >>>>>>>> > Mongo through the Motor library. It's humming away nicely in > >>>>>>>> > production > >>>>>>>> > (daemonized using supervisord). The only problems (that I can > >>>>>>>> > detect) are > >>>>>>>> > with my tests. I'm very happy to acknowledge that this could > be a > >>>>>>>> > problem > >>>>>>>> > in my application layer, or in the Motor layer, or maybe even > in > >>>>>>>> > MongoDB > >>>>>>>> > itself - mostly I'm looking for some guidance as I'm not really > >>>>>>>> > sure where > >>>>>>>> > to start digging in.
> >>>>>>>> > But let me step back and describe what I've noticed in more > >>>>>>>> > detail.
> >>>>>>>> > I have approximately ~150 tests that extend the > >>>>>>>> > tornado.testing.AsyncHTTPTestCase class, spread across a number
On Friday, October 5, 2012 8:47:27 PM UTC-7, Ben Darnell wrote:
> Oh, I see. What's happening is that each time you create a debug-mode > application, it calls autoreload.start(), which schedules a callback > every 500ms on the IOLoop to check the modification times of all the > source files. As you run more tests, those accumulate, and eventually > the server can hardly do anything else. Autoreload should definitely > check to see if it's being started on the same IOLoop more than once. > (and maybe it should warn if it's being started on two different > IOLoops in the same process, since that's not really useful).
> -Ben
> On Fri, Oct 5, 2012 at 4:44 PM, Daniel Kador <d...@keen.io <javascript:>> > wrote: > > I appear to have found a fix. I was previously returning an Application > > instance in self.get_app() with the debug kwarg set to True (tests were > > inheriting the default value for running in dev mode) and now everything > is > > quite speedy. More importantly, the tests actually all run! I'm > satisfied, > > but if Ben or Jesse wants me to dig in more to figure out what, exactly, > > debug mode is doing, I'm happy to help out.
> > One thing I noticed: I was able to reproduce the problem in a small test > > class with ~100 tests. The ONLY way I could repro, though, was if I had > > debug mode on AND I returned the singleton IOLoop from IOLoop.instance() > in > > self.get_new_ioloop() AND each test made a simple HTTP request AND my > > application used my request handlers (if I tried with a barebones app it > was > > all fine).
> > So it appears as if there's some strange interaction between my handlers > and > > debug mode?
> > On Friday, October 5, 2012 3:21:19 PM UTC-7, Daniel Kador wrote:
> >> The doc additions looks great, thanks for handling it so quickly.
> >> db.serverStatus().cursors reports nothing interesting at all. The > cursors > >> are definitely being closed (which is good).
> >> The CPU is being used by Python. Mongo hardly has a blip.
> >> I ran lsof -p PID | wc -l, where PID was the PID of my Python process. > >> The count stays steady around 55. Most of the results are related to > the > >> .so files loaded by Python. There are about 10 related to sockets, > which > >> seems pretty reasonable to me.
> >> Mongo stays pretty consistent around 113. Again, that doesn't sound > >> crazy.
> >> It definitely *feels* like the Python layer is leaking something, > >> somewhere. Is there an easy way to see what's registered on the > ioloop? Or > >> just simply log what the ioloop is currently working on? I wonder if my > code > >> is somehow not terminating in some cases? Or taking much longer than I > >> think?
> >> On Friday, October 5, 2012 3:09:38 PM UTC-7, A. Jesse Jiryu Davis > wrote:
> >>> Let me know what you find out regarding your tests, Dan, or any other > way > >>> I can help. I want anyone who uses Motor to succeed. =)
> >>> On Friday, October 5, 2012 2:53:00 PM UTC-7, A. Jesse Jiryu Davis > wrote:
> >>>> No, I'll add this to the docs -- a MotorCursor's __del__ calls the > >>>> PyMongo Cursor's close(), which checks if the cursor needs to be > closed on > >>>> the server and if so sends the closeCursors command. So if you've > fully > >>>> iterated a cursor (with to_list, or next_object until it returns > None) then > >>>> the server has already closed the cursor, otherwise the client cursor > closes > >>>> itself on the server when it's dereferenced.
> >>>> Motor's own test suite checks for leaked cursors after every test:
> >>>> So you could do something similar in your tests, or manually run > >>>> db.serverStatus().cursors in the Mongo shell while the tests are > going. If > >>>> it rises a lot, then we have a problem.
> >>>> Is it Python or MongoDB that mostly uses CPU? I'm not surprised your > CPU > >>>> is pegged, nothing *else* would be the bottleneck in your unittests.
> >>>> Can you run lsof|grep PID|wc -l, where PID is the PID of your Python > >>>> process, and make sure that process itself isn't opening more than a > dozen > >>>> or so connections? What about with mongod's PID?
> >>>> On Friday, October 5, 2012 2:11:19 PM UTC-7, Daniel Kador wrote:
> >>>>> Quick update - I believe I've ruled out file descriptors as the > source > >>>>> of my problem. I wrote a small script that just repeatedly runs > "lsof | wc > >>>>> -l" (which, as I understand it, should give me the number of open > file > >>>>> descriptors on my system) and stores the results in a texfile. > Before I > >>>>> launched any tests, my open file descriptor count hovered around > 7,300. As > >>>>> I ran tests, the number fluctuated a bit, but never went above 7,405 > and > >>>>> never went below 7,221. So it feels like I'm not blowing out the > descriptor > >>>>> count anywhere.
> >>>>> I also verified that the tearDown methods in AsyncTestCase and > >>>>> AsyncHTTPTestCase are being called on every test.
> >>>>> Some more details (not sure if they're useful or relevant, but I'm > >>>>> hoping they are): when I run tests my CPU gets pegged. One of my > cores goes > >>>>> to about 90% and hovers there for the whole run. Also, I'm > overriding > >>>>> get_new_ioloop() to return the singleton instance.
> >>>>> Jesse, question for you: do I need to explicitly close cursors > opened > >>>>> by Motor? I'm pretty sure I don't need to do anything with the > Connection > >>>>> object, but I couldn't find anything that told me my responsibility > re: the > >>>>> cursor lifecycle.
> >>>>> Will keep digging in.
> >>>>> Thanks again!
> >>>>> -Dan
> >>>>> On Friday, October 5, 2012 11:38:59 AM UTC-7, Daniel Kador wrote:
> >>>>>> Awesome, I was thinking along the same lines. I've seen this > problem > >>>>>> on both my local development platform (Mac) and my CI box (Ubuntu). > I've > >>>>>> looked into the file descriptor problem before and didn't come up > with > >>>>>> anything definitive, but I'll dig in more. Both Jesse's and Ben's > >>>>>> suggestions sound like good places to start.
> >>>>>> I'm using MotorReplicaSetConnection. I don't create a new one per > >>>>>> test, I have a single module-level connection reference. I've > tried > >>>>>> explicitly setting up the connection and then tearing it down on > every test, > >>>>>> which didn't make a noticeable difference (except to significantly > slow down > >>>>>> my tests :P).
> >>>>>> I'll dig in more and report back. I don't /believe/ I'm forgetting > to > >>>>>> call super.tearDown(), but I'll double-check. If it's that simple > I'll be > >>>>>> relieved (and slightly embarrassed).
> >>>>>> Thanks so much!
> >>>>>> -Dan
> >>>>>> On Thursday, October 4, 2012 11:23:11 PM UTC-7, A. Jesse Jiryu > Davis > >>>>>> wrote:
> >>>>>>> Agreed. Are you using MotorConnection or > MotorReplicaSetConnection? > >>>>>>> Do you create a new connection per test? Does it help to call > close() on > >>>>>>> that connection in tearDown() rather than simply relying on it to > be > >>>>>>> garbage-collected when it's dereferenced?
> >>>>>>> On Thursday, October 4, 2012 8:53:31 PM UTC-7, Ben Darnell wrote:
> >>>>>>>> This sounds like it might be a file descriptor leak. Your tests > >>>>>>>> might > >>>>>>>> be opening sockets and not closing them, which would cause > problems > >>>>>>>> once you hit the file descriptor limit, which is relatively low > by > >>>>>>>> default on many platforms. Try increasing the limit by running > >>>>>>>> "uname > >>>>>>>> -n 10000" before running your tests (if you're on linux and the > OS > >>>>>>>> won't let you, try increasing the settings in > >>>>>>>> /etc/security/limits.conf). If that works, then you're running > out > >>>>>>>> of > >>>>>>>> file descriptors. Once you've confirmed the problem, a better > >>>>>>>> long-term solution is to ensure that you're closing all the file > >>>>>>>> descriptors you're opening. This includes sockets and regular > >>>>>>>> files, > >>>>>>>> as well as other file descriptors created by IOLoop and > >>>>>>>> AsyncHTTPClient (could you be forgetting to call super.tearDown() > in > >>>>>>>> your tests?).
> >>>>>>>> -Ben
> >>>>>>>> On Thu, Oct 4, 2012 at 5:50 PM, Daniel Kador <d...@keen.io> > wrote: > >>>>>>>> > Hi!
> >>>>>>>> > I'm running into (what I perceive to be) a strange problem with > my > >>>>>>>> > tests. > >>>>>>>> > I've got an application written on top of Tornado that > >>>>>>>> > communicates with > >>>>>>>> > Mongo through the Motor library. It's humming away nicely in > >>>>>>>> > production > >>>>>>>> > (daemonized using supervisord). The only problems (that I can > >>>>>>>> > detect) are > >>>>>>>> > with my tests. I'm very happy to acknowledge that this could > be a > >>>>>>>> > problem > >>>>>>>> > in my application layer, or in the Motor layer, or maybe even > in > >>>>>>>> > MongoDB > >>>>>>>> > itself - mostly I'm looking for some guidance as I'm not really > >>>>>>>> > sure where > >>>>>>>> > to start digging in.
> >>>>>>>> > But let me step back and describe what I've noticed in more > >>>>>>>> > detail.