Re: Index creation goes past 100% and appears to loop forever

Tad Marshall

unread,

Sep 19, 2012, 7:40:23 PM9/19/12

to mongod...@googlegroups.com

You had might as well cancel the index build, because if it passed 100% then it is never going to finish.

Is this a capped collection? The case I've seen of an infinite index build was in a capped collection that had looped back into itself. Other commands besides index builds will also loop forever; db.collection.find().itcount() for example.

Has this server had a hard crash such that it had to recover using the journal? Is this part of a replica set? Primary or secondary (or both at different times)?

Can you run db.<collection>.validate() and post the output? This will not loop forever (so long as you don't use the {full:true} variant).

On Wednesday, September 19, 2012 11:49:22 AM UTC-4, John Murdoch wrote:

I have a MongoDB collection with ~5.5M records. My attempts to index it, whether on a single field or with a compoundIndex fail as the indexing process proceeds normally but then when it reaches 100% where I presume it should stop, it goes past 100% and just continues on. I've left it running for 10 hours but it never ended, despite no parallel processes running (no one else using DB either).

The fields I try to index on are longs or doubles.

I'm running v2.2.0 on x64 Windows.

Am I right to think that this is abnormal behaviour? Any ideas what I can do?

Wed Sep 05 10:22:37 [conn1] 415000000/5576219 7442%
Wed Sep 05 10:22:48 [conn1] 417000000/5576219 7478%
Wed Sep 05 10:22:59 [conn1] 419000000/5576219 7514%

John Murdoch

unread,

Sep 30, 2012, 3:14:17 AM9/30/12

to mongod...@googlegroups.com

Hi, Tad, I tried to respond earlier but somehow the message got stuck in moderation.

On Thursday, September 20, 2012 12:40:23 AM UTC+1, Tad Marshall wrote:

Is this a capped collection? The case I've seen of an infinite index build was in a capped collection that had looped back into itself. Other commands besides index builds will also loop forever; db.collection.find().itcount() for example.

No, it is not.

Has this server had a hard crash such that it had to recover using the journal?

Yes, it has seen crashes, it is a development server whose process I've had to sometimes kill.

> Is this part of a replica set? Primary or secondary (or both at different times)?

No it is not.

Can you run db.<collection>.validate() and post the output? This will not loop forever (so long as you don't use the {full:true} variant).

> db.player.validate()

{

"ns" : "wot.player",

"firstExtent" : "1:1f8c000 ns:wot.player",

"lastExtent" : "2c:2000 ns:wot.player",

"extentCount" : 41,

"datasize" : 23240866636,

"nrecords" : 5576219,

"lastExtentSize" : 2146426864,

"padding" : 1.659999999958666,

"firstExtentDetails" : {

"loc" : "1:1f8c000",

"xnext" : "1:1fa0000",

"xprev" : "null",

"nsdiag" : "wot.player",

"size" : 81920,

"firstRecord" : "1:1f8fa58",

"lastRecord" : "1:1f93474"

},

"deletedCount" : 731321,

"deletedSize" : 1960113780,

"nIndexes" : 1,

"keysPerIndex" : {

"wot.player.$_id_" : 5576195

},

"valid" : true,

"errors" : [ ],

"warning" : "Some checks omitted for speed. use {full:true} option to do

more thorough scan.",

"ok" : 1

}

>

Tad Marshall

unread,

Oct 1, 2012, 8:50:23 AM10/1/12

to mongod...@googlegroups.com

Hi John,

Thanks for the additional information.

It seems that you have corruption in your 'player' collection, but it may be hard to diagnose what is wrong or how it got that way. Can you check the Windows Event Log and see if there are any NTFS-related errors reported?

The "keysPerIndex" count for _id should match the document count, but it is lower by 24 documents. The padding factor of about 1.66 suggests that there has been a lot of document movement over time, which can cause the indexes to be updated a lot, so a hard crash without journaling enabled might have left the index incompletely updated. A hard crash with journaling should not be able to do this, unless the disk itself has problems or you ran out of disk space at a crucial moment.

Because the validate command did not report any index problems, it looks like the _id index is just incomplete, but all keys are in-order and findable.

Because your attempt to index the collection went into an infinite loop, I'm afraid that validate({full:true}) would do the same thing, but you could try it and see, assuming that there is a time when blocking updates for an extended period would not be a problem. Validate with the full option should tell us more about what is wrong.

If you want, you could create a Community Private ticket and upload a copy of your database for us to look at, but it's hard to know in advance whether we would be able to determine anything other than some details of what the corruption is; we might be able to figure out what happened, but we might not. Even if uploading your database is not an attractive option, it would be valuable to create a "Core server" Jira ticket to collect as many details as possible so that we can correlate this information with other reports that we may get.

The safest way to repair the database, assuming that you have available disk space, is to use mongodump with the --repair option to extract everything possible from the database and then use mongorestore to restore it to a fresh database, leaving the original to the side until you have verified that your new copy works properly and is not missing anything. mongodump is documented at http://docs.mongodb.org/manual/reference/mongodump/ .

Let us know how you'd like to proceed.

Tad

John Murdoch

unread,

Oct 12, 2012, 5:07:58 PM10/12/12

to mongod...@googlegroups.com

Tad,

Thanks a lot for your help.

I am short of disk space so running out of space could have been the root cause of this. I will try to free up space and then do the mongodump --repair / mongorestore.