mongo dies on lengthy read

Daniel W

unread,

Aug 16, 2012, 9:06:32 AM8/16/12

to mongod...@googlegroups.com

I'm doing a lengthy read via pymongo, essentially taking every doc from a 600,000 doc collection. Avg Obj size is 5843.

My log looks like this before and up to the mongo death:

Wed Aug 15 19:35:22 [conn3] getmore HAW.USERS query: { id: { $gte: 550000, $lt: 600000 } } cursorid:449182167489307279 nreturned:968 reslen:4195882 212ms

Wed Aug 15 19:35:24 [conn3] getmore HAW.USERS query: { id: { $gte: 550000, $lt: 600000 } } cursorid:449182167489307279 nreturned:964 reslen:4195202 853ms

Wed Aug 15 19:35:27 [conn3] getmore HAW.USERS query: { id: { $gte: 550000, $lt: 600000 } } cursorid:449182167489307279 nreturned:966 reslen:4195701 1581ms

Wed Aug 15 19:35:33 [conn3] getmore HAW.USERS query: { id: { $gte: 550000, $lt: 600000 } } cursorid:449182167489307279 nreturned:963 reslen:4196709 3606ms

Wed Aug 15 19:35:33 [conn1] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 reslen:1338 457ms

Note that I've broken up the 600k docs into chunks of 50k in an effort to solve this problem.

What is going on with that last command, and why does it cause mongo to shut down?

Scott Hernandez

unread,

Aug 16, 2012, 9:43:27 AM8/16/12

to mongod...@googlegroups.com

Is that the end of the log? I do not see any shutdown, or error
messages at all. Can you provide the logs showing it shutting down?

Do you simply mean that the query doesn't return all of the documents,
and not that the server is unavailable? If so, what are you doing in
your python script with the documents? Are you spending logs of time
as you process or work on each document?

> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Daniel W

unread,

Aug 16, 2012, 9:54:32 AM8/16/12

to mongod...@googlegroups.com

Yes, that is the end of the log. Is there a way I can increase the logging detail?

I just tried again, and this time there wasn't even the serverStatus line, it just up and 'died':

[root@domU-12-31-39-18-3D-20 ~]# tail -f /var/log/mongodb.log

Thu Aug 16 09:35:21 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:968 reslen:4196559 193ms

Thu Aug 16 09:35:21 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:970 reslen:4198150 243ms

Thu Aug 16 09:35:24 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:962 reslen:4194949 471ms

Thu Aug 16 09:35:24 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:968 reslen:4196960 387ms

Thu Aug 16 09:35:26 [conn2] getmore HAW.USERS query: { id: { $gte: 580000, $lt: 585000 } } cursorid:3408029629320346164 nreturned:964 reslen:4198129 271ms

Thu Aug 16 09:35:32 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:968 reslen:4198101 229ms

Thu Aug 16 09:35:33 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:968 reslen:4198132 483ms

Thu Aug 16 09:35:34 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:965 reslen:4195774 176ms

Thu Aug 16 09:35:35 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:966 reslen:4197451 409ms

Thu Aug 16 09:35:38 [conn2] getmore HAW.USERS query: { id: { $gte: 585000, $lt: 590000 } } cursorid:7656187167682946319 nreturned:964 reslen:4197107 802ms

By 'died' I mean that all my connections are cutoff and the mongo process dies. I'm starting the mongo process as per the docs:

/usr/bin/mongod --fork --logpath /var/log/mongodb.log --logappend

Connections are cut off:

Mongostat:

insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time

0 0 0 0 1 0 0 8.03g 16.2g 773m 89 0 0 0|0 0|0 69b 4m 3 09:35:01

0 0 0 0 0 0 0 8.03g 16.2g 551m 45 0 0 0|0 0|0 44b 2m 3 09:35:11

0 0 0 0 0 0 0 8.03g 16.2g 340m 54 0 0 0|0 0|0 52b 2m 3 09:35:21

0 0 0 0 0 0 0 8.03g 16.2g 177m 35 0 0 0|0 0|0 27b 1m 3 09:35:31

Thu Aug 16 09:36:08 Socket recv() errno:104 Connection reset by peer 127.0.0.1:27017

Thu Aug 16 09:36:08 SocketException: remote: 127.0.0.1:27017 error: 9001 socket exception [1] server [127.0.0.1:27017]

Thu Aug 16 09:36:08 DBClientCursor::init call() failed

can't get data: DBClientBase::findN: transport error: 127.0.0.1 query: { serverStatus: 1 }

Thu Aug 16 09:36:18 trying reconnect to 127.0.0.1

Thu Aug 16 09:36:18 reconnect 127.0.0.1 failed couldn't connect to server 127.0.0.1

Pymongo:

pymongo.errors.AutoReconnect: could not connect to localhost:27017: [Errno 111] Connection refused

I am not spending any time in pymongo, just doing a [row for row in cursor]:

users=[]

i=0

chunksize=4000

while keep_going:

minU=minuser+i*chunksize

maxU=minU+chunksize if minU+chunksize<=maxuser else maxuser

logger.info('Requesting ' + str(minU) + ' to ' + str(maxU))

cursor=mongo.USERS.find({'id':{'$gte':minU, '$lt':maxU}}, {'_id':0})

if maxU>=maxuser:

keep_going=False

else:

i+=1

users+=[row for row in cursor]

Let me know if there's any more info I can provide...

Scott Hernandez

unread,

Aug 16, 2012, 10:01:26 AM8/16/12

to mongod...@googlegroups.com

Please search your system logs for any reference to "mongod" and see
if the OOM Killer got it.
http://www.mongodb.org/display/DOCS/The+Linux+Out+of+Memory+OOM+Killer

What does "free -ltm", and "uname -a" look like?

Daniel W

unread,

Aug 16, 2012, 10:17:24 AM8/16/12

to mongod...@googlegroups.com

Yep, that's it (right?):

Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: Out of memory: kill process 5866 (mongod) score 266194 or a child

Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: Killed process 5866 (mongod)

Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: init invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

(should I run these commands while mongod is running? here they are w/o mongo)

[root@domU-12-31-39-18-3D-20 log]# free -ltm

total used free shared buffers cached

Mem: 17507 513 16994 0 2 34

Low: 17507 513 16994

High: 0 0 0

-/+ buffers/cache: 475 17031

Swap: 0 0 0

Total: 17507 513 16994

[root@domU-12-31-39-18-3D-20 log]# uname -a

Linux domU-12-31-39-18-3D-20 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

I'm running this on an EC2 Type: m2.xlarge

I'm not linux saavy enough to know how to follow this advice: "It is useful for the linux kernel to have swap space to use in emergencies" from http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-Swap

Scott Hernandez

unread,

Aug 16, 2012, 10:52:53 AM8/16/12

to mongod...@googlegroups.com

You will want to add some swap to your system, yes.

Reply all

Reply to author

Forward