mongo dies on lengthy read

169 views
Skip to first unread message

Daniel W

unread,
Aug 16, 2012, 9:06:32 AM8/16/12
to mongod...@googlegroups.com
I'm doing a lengthy read via pymongo, essentially taking every doc from a 600,000 doc collection. Avg Obj size is 5843.
My log looks like this before and up to the mongo death:

Wed Aug 15 19:35:22 [conn3] getmore HAW.USERS query: { id: { $gte: 550000, $lt: 600000 } } cursorid:449182167489307279 nreturned:968 reslen:4195882 212ms
Wed Aug 15 19:35:24 [conn3] getmore HAW.USERS query: { id: { $gte: 550000, $lt: 600000 } } cursorid:449182167489307279 nreturned:964 reslen:4195202 853ms
Wed Aug 15 19:35:27 [conn3] getmore HAW.USERS query: { id: { $gte: 550000, $lt: 600000 } } cursorid:449182167489307279 nreturned:966 reslen:4195701 1581ms
Wed Aug 15 19:35:33 [conn3] getmore HAW.USERS query: { id: { $gte: 550000, $lt: 600000 } } cursorid:449182167489307279 nreturned:963 reslen:4196709 3606ms
Wed Aug 15 19:35:33 [conn1] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 reslen:1338 457ms

Note that I've broken up the 600k docs into chunks of 50k in an effort to solve this problem.  
What is going on with that last command, and why does it cause mongo to shut down?  

Scott Hernandez

unread,
Aug 16, 2012, 9:43:27 AM8/16/12
to mongod...@googlegroups.com
Is that the end of the log? I do not see any shutdown, or error
messages at all. Can you provide the logs showing it shutting down?

Do you simply mean that the query doesn't return all of the documents,
and not that the server is unavailable? If so, what are you doing in
your python script with the documents? Are you spending logs of time
as you process or work on each document?
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Daniel W

unread,
Aug 16, 2012, 9:54:32 AM8/16/12
to mongod...@googlegroups.com
Yes, that is the end of the log.  Is there a way I can increase the logging detail?  

I just tried again, and this time there wasn't even the serverStatus line, it just up and 'died':

[root@domU-12-31-39-18-3D-20 ~]# tail -f /var/log/mongodb.log
Thu Aug 16 09:35:21 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:968 reslen:4196559 193ms
Thu Aug 16 09:35:21 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:970 reslen:4198150 243ms
Thu Aug 16 09:35:24 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:962 reslen:4194949 471ms
Thu Aug 16 09:35:24 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:968 reslen:4196960 387ms
Thu Aug 16 09:35:26 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:964 reslen:4198129 271ms
Thu Aug 16 09:35:32 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:968 reslen:4198101 229ms
Thu Aug 16 09:35:33 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:968 reslen:4198132 483ms
Thu Aug 16 09:35:34 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:965 reslen:4195774 176ms
Thu Aug 16 09:35:35 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:966 reslen:4197451 409ms
Thu Aug 16 09:35:38 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:964 reslen:4197107 802ms

By 'died' I mean that all my connections are cutoff and the mongo process dies.  I'm starting the mongo process as per the docs:

 /usr/bin/mongod --fork --logpath /var/log/mongodb.log --logappend


Connections are cut off:

Mongostat:
insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
     0      0      0      0       1       0       0  8.03g  16.2g   773m     89        0          0       0|0     0|0    69b     4m     3   09:35:01 
     0      0      0      0       0       0       0  8.03g  16.2g   551m     45        0          0       0|0     0|0    44b     2m     3   09:35:11 
     0      0      0      0       0       0       0  8.03g  16.2g   340m     54        0          0       0|0     0|0    52b     2m     3   09:35:21 
     0      0      0      0       0       0       0  8.03g  16.2g   177m     35        0          0       0|0     0|0    27b     1m     3   09:35:31 
Thu Aug 16 09:36:08 Socket recv() errno:104 Connection reset by peer 127.0.0.1:27017
Thu Aug 16 09:36:08 SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [1] server [127.0.0.1:27017
Thu Aug 16 09:36:08 DBClientCursor::init call() failed
can't get data: DBClientBase::findN: transport error: 127.0.0.1 query: { serverStatus: 1 }
Thu Aug 16 09:36:18 trying reconnect to 127.0.0.1
Thu Aug 16 09:36:18 reconnect 127.0.0.1 failed couldn't connect to server 127.0.0.1

Pymongo:
pymongo.errors.AutoReconnect: could not connect to localhost:27017: [Errno 111] Connection refused

I am not spending any time in pymongo, just doing a [row for row in cursor]:
        users=[]
        i=0
        chunksize=4000
        while keep_going:
            minU=minuser+i*chunksize
            maxU=minU+chunksize if minU+chunksize<=maxuser else maxuser
            logger.info('Requesting ' + str(minU) + ' to ' + str(maxU))
            cursor=mongo.USERS.find({'id':{'$gte':minU, '$lt':maxU}}, {'_id':0})
            if maxU>=maxuser:
                keep_going=False
            else:
                i+=1
            users+=[row for row in cursor]

Let me know if there's any more info I can provide...

Scott Hernandez

unread,
Aug 16, 2012, 10:01:26 AM8/16/12
to mongod...@googlegroups.com
Please search your system logs for any reference to "mongod" and see
if the OOM Killer got it.
http://www.mongodb.org/display/DOCS/The+Linux+Out+of+Memory+OOM+Killer

What does "free -ltm", and "uname -a" look like?

Daniel W

unread,
Aug 16, 2012, 10:17:24 AM8/16/12
to mongod...@googlegroups.com
Yep, that's it (right?):

Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: Out of memory: kill process 5866 (mongod) score 266194 or a child
Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: Killed process 5866 (mongod)
Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: init invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

(should I run these commands while mongod is running?  here they are w/o mongo)

[root@domU-12-31-39-18-3D-20 log]# free -ltm
             total       used       free     shared    buffers     cached
Mem:         17507        513      16994          0          2         34
Low:         17507        513      16994
High:            0          0          0
-/+ buffers/cache:        475      17031
Swap:            0          0          0
Total:       17507        513      16994

[root@domU-12-31-39-18-3D-20 log]# uname -a
Linux domU-12-31-39-18-3D-20 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

I'm running this on an EC2 Type: m2.xlarge

I'm not linux saavy enough to know how to follow this advice: "It is useful for the linux kernel to have swap space to use in emergencies" from http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-Swap

Scott Hernandez

unread,
Aug 16, 2012, 10:52:53 AM8/16/12
to mongod...@googlegroups.com
You will want to add some swap to your system, yes.
Reply all
Reply to author
Forward
0 new messages