cannot find documents after "successful" insert

154 views
Skip to first unread message

Roee Kitan

unread,
Jul 8, 2014, 10:55:01 AM7/8/14
to mongod...@googlegroups.com
I already asked this question on Stack Overflow so in order to prevent duplication, i'll just post the link to my SO question here.
since I didn't receive any answer there, I hoped maybe here I could find some help.
thanks.

William Berkeley

unread,
Jul 8, 2014, 3:20:10 PM7/8/14
to mongod...@googlegroups.com
Reading the SO question. It looks like the documents were inserted - you can find them by a query on _id. What's an example of a "missing" document and a query that should find the "missing" document but doesn't?

-Will

Roee Kitan

unread,
Jul 9, 2014, 3:41:42 AM7/9/14
to mongod...@googlegroups.com
Hi Will,
1. I clearly see that even though the log says a document was insert successfully, collection length didn't change.
2. when I search by _id I can find it, but when I search by any other query (ISODate comparison, date-range query, or even iteration over collection.find() and search for the _id in it) it's just missing.
3. I updated SO question, and added the queries I test.

Asya Kamsky

unread,
Jul 9, 2014, 4:56:15 AM7/9/14
to mongodb-user
Stackoverflow is more for straight-forward Q&As - your case clearly has something else going on, so it's probably better discussed on the forum where back-and-forth is easier to handle and is more expected...

Asya



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/45c3f4a6-31c0-4b91-8018-7a3977f650a5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Roee Kitan

unread,
Jul 9, 2014, 5:38:34 AM7/9/14
to mongod...@googlegroups.com
got ya, i posted a comment in SO that i'm this forum is now the main thread.
and i just tested something that could might help !!
this is what i do:
1) search one of the "missing" documents by its _id > get the document > search by its date > missing :
> use myDb
> searchId = BinData(0,"alnYNFB8XxGo+KB8A70vCw==")
> db.files.find({"_id": searchId})
{ "_id" : BinData(0,"alnYNFB8XxGo+KB8A70vCw=="), "dId" : "537cbafd81536a0aa03f6be3", "cN" : 1, "ty" : 0, "sT" : ISODate("2014-07-08T06:53:20.534Z"),
eT" : ISODate("2014-07-08T06:53:30.845Z"), "hS" : 40, "sId" : 1 }
> searchDate = ISODate("2014-07-08T06:53:20.534Z")
> db.files.find({"sT": searchDate})

I find nothing ("usual" until now..)
then check this out:
2) copy Db, then search again by date
> db.copyDatabase('myDb', 'testDb')
> use testDb
> testDb.files.find({"sT": searchDate})
{ "_id" : BinData(0,"alnYNFB8XxGo+KB8A70vCw=="), "dId" : "537cbafd81536a0aa03f6be3", "cN" : 1, "ty" : 0, "sT" : ISODate("2014-07-08T06:53:20.534Z"),
eT" : ISODate("2014-07-08T06:53:30.845Z"), "hS" : 40, "sId" : 1 }

And here it is, works only on the copied db.

To test, I shutdown the mongod service, copied the data folder, open it again on another server, and the problem remains.
I packed the DB and the RAR file takes 45M, I can share it if you could use it.

William Berkeley

unread,
Jul 10, 2014, 9:52:45 AM7/10/14
to mongod...@googlegroups.com

That is interesting. It's good that copyDB provides a potential fix for the problem. Could you run a query for a missing document on the "bad" database and the same query on the copied "good" database with .explain() and post the results here?

-Will

Roee Kitan

unread,
Jul 10, 2014, 11:05:20 AM7/10/14
to mongod...@googlegroups.com
so yeah, somehow copyDB indeed solves it (btw, repairDatabase doesn't).
for your request:
> use myDb (original - with problem)
> db.files.find({"sT": ISODate("2014-07-08T06:53:20.534Z")}).explain()
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 557107,
        "nscanned" : 557107,
        "nscannedObjectsAllPlans" : 557107,
        "nscannedAllPlans" : 557107,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 4352,
        "nChunkSkips" : 0,
        "millis" : 1339,
        "server" : "R-THINK:5570",
        "filterSet" : false
}

> use testDb (after copy - looking good)
> db.files.find({"sT": ISODate("2014-07-08T06:53:20.534Z")}).explain()
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 1,
        "nscannedObjects" : 654176,
        "nscanned" : 654176,
        "nscannedObjectsAllPlans" : 654176,
        "nscannedAllPlans" : 654176,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 5110,
        "nChunkSkips" : 0,
        "millis" : 1480,
        "server" : "R-THINK:5570",
        "filterSet" : false
}

hope it helps :)

Will Berkeley

unread,
Jul 11, 2014, 10:53:34 AM7/11/14
to mongod...@googlegroups.com
Hi Roee. It's clear that corruption happened at some point. Try running validate on the collection to see if anything obvious comes up. Beyond that, diagnosing any root cause for corruption without a lot of information about the history of the collection is impossible, and the best thing to do is probably to use the fixed copy and move on with life.

-Will

Asya Kamsky

unread,
Jul 12, 2014, 2:40:57 AM7/12/14
to mongodb-user
The fact that collection scan only sees 557107 in the old DB but 654176 in the new one tells me that there is in fact corruption in documents (likely in pointers from one record to the next).

Traversing the _id index allows both DBs to get to the record directly, but traversing the collection records without the index seems to stop short...

Asya



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

Roee Kitan

unread,
Jul 13, 2014, 3:09:43 AM7/13/14
to mongod...@googlegroups.com
Asya, Will,

'validate' the "damaged" collection clearly shows a difference between "nrecords" (654177) and "objectsFound" (557107).
the use with this collection was pretty much basic: i have a service that records binary information on multiple channels (10-20, could be higher). every 10 seconds it flushes a small document for each channel containing the path to the record file on disk, start time, end time, and channel id.
so when i see a "hole" in inserts, i see that all insets (from all the channels) fail for about 1-15 minutes and then continue to really insert successfully.
then, every ~10 minutes (after a certain space threshold) I sort the collection by start time and delete the oldest 1000 records (deletes each record separately).
(btw, i disable the deletion task when started debugging this problem)
you may contact me privately for getting a copy of the DB (~40MB). it's easy to reproduce the problem.

so the conclusion is (which will also go to SO as answer):
while working with mongo, even with journaling enabled, it might somehow get corrupted in such way that inserts look successful but they actually not, and you can never know about it unless you save all the inserted document ids and search over it in collection.find()
some (or most like in my situation) of the inserts probably still work, while only some will randomly go to the corrupted black hole.
since this can situation can happen just like that, the procedure of working with the DB must include a background task to check the collection validity. i can think of few options to do that:
1. compare "count" property in collection.stats(), to result of collection.find().length(). if they're not the same - i think it means a problem.
2. use collection.validate({full:true}) and compare "nrecords" with "objectsFound". again, looks to me they should be equal as well.
3. add another random unique id field to the document. then you can really validate inserts by searching by its seconds unique id.

when problem is detected, i found out that copying the DB (db.copyDatabase) fixes the problem on the newly created DB.
since you cannot rename the DB, you can then delete the old DB, then copy the new copied DB to a 2nd copy with the original DB name, and then delete the 1st copy.
in order to do that, just make sure you always keep enough disk space to copy the entire db.

is there anything i should add?

thank you for helping.

Asya Kamsky

unread,
Jul 13, 2014, 3:38:46 AM7/13/14
to mongodb-user, gal...@gmail.com
I think you should hold off posting that until you know exactly what happened.  For example, we *know* that the inserts were in fact successful - you didn't lose a single record, you just weren't able to query for them without an index (we think because of corruption but that would need to be confirmed).   So it's actually not the case that _any_ inserts failed.

Considering I've not encountered this sort of thing before (*without clear evidence of disk failure*) I would want to debug this just a little bit further before declaring that this is something that's common or needs to be guarded against in a particular way.

Btw, are you saying that you can reproduce this?  I'm not sure how to interpret " it's easy to reproduce the problem" - do you mean to see the problem on your (damanged) DB file?  I have no doubt - but if you mean that you can reproduce this sort of corruption then I'm *extremely* interested in that.

Do you happen to have the logs for mongod for the duration you've been running it?

Actually, best way to get the info to us would be to open a ticket in https://jira.mongodb.org/browse/SUPPORT project (it's community private, so anything you attach there would be private to our engineers and you).

What would be most helpful would be if you can attach all the logs you have for this mongod, the database files and ideally your code which caused this (especially if you're saying you can reproduce it).

All the OS, driver, mongod versions, and any information about the storage configuration, etc. would also help.

Thanks for helping us debug this - if this is something that might be _caused_ by MongoDB it would be extremely crucial to get it fixed - unfortunately, keep in mind that it's always possible for the file system to corrupt some blocks (which may have MongoDB data) though usually that type of corruption manifests very differently (usually you get LOTS of invalid BSON error messages in the logs).

Asya




Roee Kitan

unread,
Jul 13, 2014, 4:27:11 AM7/13/14
to mongod...@googlegroups.com, gal...@gmail.com
hi Asya,
so i'll open a ticket for now.
it's very easy to reproduce the problem, i'll write down all the details in the ticket.
thanks.

Roee Kitan

unread,
Jul 13, 2014, 5:30:23 AM7/13/14
to mongod...@googlegroups.com, gal...@gmail.com
ticket: SUPPORT-1062

Asya Kamsky

unread,
Jul 14, 2014, 2:53:28 PM7/14/14
to mongodb-user, gal...@gmail.com
Hi Roee,

I replied in the ticket yesterday but haven't heard back from you - emailing here in case you are not properly getting e-mail notifications from Jira (you're supposed to be alerted by email when the ticket is updated by us).

Asya



Roee Kitan

unread,
Jul 20, 2014, 5:25:13 AM7/20/14
to mongod...@googlegroups.com, gal...@gmail.com
Hi Asys,
i'm sorry it took me time, it's just that terrorists continuously tries bomb all the place around my area (Israel, near Tel-Aviv), and i shit you not :))
we're trying to continue and live our lives, and I hope it all be over soon.
anyways, i have all the logs, codes sample, and even found a time when power went down but i still need to verify if the problem started before or after that time.

peace all :)

Asya Kamsky

unread,
Jul 21, 2014, 1:54:20 AM7/21/14
to mongodb-user

Wow, OK, yes. Please do take care of safety for yourself and your loved ones, and worry about debugging things second.  

Asya

Reply all
Reply to author
Forward
0 new messages