Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
mongo dies on lengthy read
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Daniel W  
View profile  
 More options Aug 16 2012, 9:06 am
From: Daniel W <dweitzenf...@gmail.com>
Date: Thu, 16 Aug 2012 06:06:32 -0700 (PDT)
Local: Thurs, Aug 16 2012 9:06 am
Subject: mongo dies on lengthy read

I'm doing a lengthy read via pymongo, essentially taking every doc from a
600,000 doc collection. Avg Obj size is 5843.
My log looks like this before and up to the mongo death:

Wed Aug 15 19:35:22 [conn3] getmore HAW.USERS query: { id: { $gte: 550000,
$lt: 600000 } } cursorid:449182167489307279 nreturned:968 reslen:4195882
212ms
Wed Aug 15 19:35:24 [conn3] getmore HAW.USERS query: { id: { $gte: 550000,
$lt: 600000 } } cursorid:449182167489307279 nreturned:964 reslen:4195202
853ms
Wed Aug 15 19:35:27 [conn3] getmore HAW.USERS query: { id: { $gte: 550000,
$lt: 600000 } } cursorid:449182167489307279 nreturned:966 reslen:4195701
1581ms
Wed Aug 15 19:35:33 [conn3] getmore HAW.USERS query: { id: { $gte: 550000,
$lt: 600000 } } cursorid:449182167489307279 nreturned:963 reslen:4196709
3606ms
Wed Aug 15 19:35:33 [conn1] command admin.$cmd command: { serverStatus: 1 }
ntoreturn:1 reslen:1338 457ms

Note that I've broken up the 600k docs into chunks of 50k in an effort to
solve this problem.  
What is going on with that last command, and why does it cause mongo to
shut down?  


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Scott Hernandez  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 9:43 am
From: Scott Hernandez <scotthernan...@gmail.com>
Date: Thu, 16 Aug 2012 06:43:27 -0700
Local: Thurs, Aug 16 2012 9:43 am
Subject: Re: [mongodb-user] mongo dies on lengthy read
Is that the end of the log? I do not see any shutdown, or error
messages at all. Can you provide the logs showing it shutting down?

Do you simply mean that the query doesn't return all of the documents,
and not that the server is unavailable? If so, what are you doing in
your python script with the documents? Are you spending logs of time
as you process or work on each document?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel W  
View profile  
 More options Aug 16 2012, 9:54 am
From: Daniel W <dweitzenf...@gmail.com>
Date: Thu, 16 Aug 2012 06:54:32 -0700 (PDT)
Local: Thurs, Aug 16 2012 9:54 am
Subject: Re: [mongodb-user] mongo dies on lengthy read

Yes, that is the end of the log.  Is there a way I can increase the logging
detail?  

I just tried again, and this time there wasn't even the serverStatus line,
it just up and 'died':

[root@domU-12-31-39-18-3D-20 ~]# tail -f /var/log/mongodb.log
Thu Aug 16 09:35:21 [conn2] getmore HAW.USERS query: { id: { $gte: 580000,
$lt: 585000 } } cursorid:3408029629320346164 nreturned:968 reslen:4196559
193ms
Thu Aug 16 09:35:21 [conn2] getmore HAW.USERS query: { id: { $gte: 580000,
$lt: 585000 } } cursorid:3408029629320346164 nreturned:970 reslen:4198150
243ms
Thu Aug 16 09:35:24 [conn2] getmore HAW.USERS query: { id: { $gte: 580000,
$lt: 585000 } } cursorid:3408029629320346164 nreturned:962 reslen:4194949
471ms
Thu Aug 16 09:35:24 [conn2] getmore HAW.USERS query: { id: { $gte: 580000,
$lt: 585000 } } cursorid:3408029629320346164 nreturned:968 reslen:4196960
387ms
Thu Aug 16 09:35:26 [conn2] getmore HAW.USERS query: { id: { $gte: 580000,
$lt: 585000 } } cursorid:3408029629320346164 nreturned:964 reslen:4198129
271ms
Thu Aug 16 09:35:32 [conn2] getmore HAW.USERS query: { id: { $gte: 585000,
$lt: 590000 } } cursorid:7656187167682946319 nreturned:968 reslen:4198101
229ms
Thu Aug 16 09:35:33 [conn2] getmore HAW.USERS query: { id: { $gte: 585000,
$lt: 590000 } } cursorid:7656187167682946319 nreturned:968 reslen:4198132
483ms
Thu Aug 16 09:35:34 [conn2] getmore HAW.USERS query: { id: { $gte: 585000,
$lt: 590000 } } cursorid:7656187167682946319 nreturned:965 reslen:4195774
176ms
Thu Aug 16 09:35:35 [conn2] getmore HAW.USERS query: { id: { $gte: 585000,
$lt: 590000 } } cursorid:7656187167682946319 nreturned:966 reslen:4197451
409ms
Thu Aug 16 09:35:38 [conn2] getmore HAW.USERS query: { id: { $gte: 585000,
$lt: 590000 } } cursorid:7656187167682946319 nreturned:964 reslen:4197107
802ms

By 'died' I mean that all my connections are cutoff and the mongo process
dies.  I'm starting the mongo process as per the docs:

 /usr/bin/mongod --fork --logpath /var/log/mongodb.log --logappend

Connections are cut off:

Mongostat:
insert  query update delete getmore command flushes mapped  vsize    res
faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn       time
     0      0      0      0       1       0       0  8.03g  16.2g   773m  
  89        0          0       0|0     0|0    69b     4m     3   09:35:01
     0      0      0      0       0       0       0  8.03g  16.2g   551m  
  45        0          0       0|0     0|0    44b     2m     3   09:35:11
     0      0      0      0       0       0       0  8.03g  16.2g   340m  
  54        0          0       0|0     0|0    52b     2m     3   09:35:21
     0      0      0      0       0       0       0  8.03g  16.2g   177m  
  35        0          0       0|0     0|0    27b     1m     3   09:35:31
Thu Aug 16 09:36:08 Socket recv() errno:104 Connection reset by peer
127.0.0.1:27017
Thu Aug 16 09:36:08 SocketException: remote: 127.0.0.1:27017 error: 9001
socket exception [1] server [127.0.0.1:27017]
Thu Aug 16 09:36:08 DBClientCursor::init call() failed
can't get data: DBClientBase::findN: transport error: 127.0.0.1 query: {
serverStatus: 1 }
Thu Aug 16 09:36:18 trying reconnect to 127.0.0.1
Thu Aug 16 09:36:18 reconnect 127.0.0.1 failed couldn't connect to server
127.0.0.1

Pymongo:
pymongo.errors.AutoReconnect: could not connect to localhost:27017: [Errno
111] Connection refused

I am not spending any time in pymongo, just doing a [row for row in cursor]:
        users=[]
        i=0
        chunksize=4000
        while keep_going:
            minU=minuser+i*chunksize
            maxU=minU+chunksize if minU+chunksize<=maxuser else maxuser
            logger.info('Requesting ' + str(minU) + ' to ' + str(maxU))
            cursor=mongo.USERS.find({'id':{'$gte':minU, '$lt':maxU}},
{'_id':0})
            if maxU>=maxuser:
                keep_going=False
            else:
                i+=1
            users+=[row for row in cursor]

Let me know if there's any more info I can provide...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Scott Hernandez  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 10:01 am
From: Scott Hernandez <scotthernan...@gmail.com>
Date: Thu, 16 Aug 2012 07:01:26 -0700
Local: Thurs, Aug 16 2012 10:01 am
Subject: Re: [mongodb-user] mongo dies on lengthy read
Please search your system logs for any reference to "mongod" and see
if the OOM Killer got it.
http://www.mongodb.org/display/DOCS/The+Linux+Out+of+Memory+OOM+Killer

What does "free -ltm", and "uname -a" look like?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel W  
View profile   Translate to Translated (View Original)
 More options Aug 16 2012, 10:17 am
From: Daniel W <dweitzenf...@gmail.com>
Date: Thu, 16 Aug 2012 07:17:24 -0700 (PDT)
Local: Thurs, Aug 16 2012 10:17 am
Subject: Re: [mongodb-user] mongo dies on lengthy read

Yep, that's it (right?):

Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: Out of memory: kill process
5866 (mongod) score 266194 or a child
Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: Killed process 5866 (mongod)
Aug 16 09:36:08 domU-12-31-39-18-3D-20 kernel: init invoked oom-killer:
gfp_mask=0x201d2, order=0, oomkilladj=0

(should I run these commands while mongod is running?  here they are w/o
mongo)

[root@domU-12-31-39-18-3D-20 log]# free -ltm
             total       used       free     shared    buffers     cached
Mem:         17507        513      16994          0          2         34
Low:         17507        513      16994
High:            0          0          0
-/+ buffers/cache:        475      17031
Swap:            0          0          0
Total:       17507        513      16994

[root@domU-12-31-39-18-3D-20 log]# uname -a
Linux domU-12-31-39-18-3D-20 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28
EST 2008 x86_64 x86_64 x86_64 GNU/Linux

I'm running this on an EC2 Type: m2.xlarge

I'm not linux saavy enough to know how to follow this advice: "It is useful
for the linux kernel to have swap space to use in emergencies" from
http://www.mongodb.org/display/DOCS/Production+Notes#ProductionNotes-...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Scott Hernandez  
View profile  
 More options Aug 16 2012, 10:52 am
From: Scott Hernandez <scotthernan...@gmail.com>
Date: Thu, 16 Aug 2012 07:52:53 -0700
Local: Thurs, Aug 16 2012 10:52 am
Subject: Re: [mongodb-user] mongo dies on lengthy read
You will want to add some swap to your system, yes.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »