Thanks - it's good to know that there is not an acceptable level of corruption- this answers one of my questions.
The nature of the corruption seemed to be at the BSON level which is really bad news - basically I tried to add an index back onto the collection and it failed with an error message saying Invalid BSONElement type: 43 (note - I think it was 43 - there have been a few different numbers with all the testing I've done)
In order to test the import I did a mongoimport using the same json file into a new, empty collection, from the first server over to the second server.
mongoimport --host [second server reference] --db user_keywords_test_import --collection user_data --file /tmp/userdata_2014-02-01T00:00:00Z_2014-02-08T00:00:00Z.json
This reported:
check 9 3181232
Sat Feb 8 21:14:53.326 imported 3181232 objects
logged in to server2 - just do a complete count on this new clean collection:
use user_keywords_test_import
db.user_data.count()
3181232
seems great. However:
db.user_data.count( {log_time: { $gt : ISODate('2014-02-01T00:00:00Z'), $lte : ISODate('2014-02-08T00:00:00Z') } } )
FAILED:
Sat Feb 8 21:24:00.384 count failed: { "ok" : 0, "errmsg" : "10320 BSONElement: bad type 116" } at src/mongo/shell/query.js:180
So - what's wrong? I used validate(true)
db.user_data.validate(true)
{
"ns" : "user_keywords_test_import.user_data",
"firstExtent" : "0:2000 ns:user_keywords_test_import.user_data",
"lastExtent" : "5:45007000 ns:user_keywords_test_import.user_data",
"extentCount" : 19,
"extents" : [
...
all the extents
...
]
"datasize" : 2845897088,
"nrecords" : 3181232,
"lastExtentSize" : 840650752,
"padding" : 1,
"firstExtentDetails" : {
"loc" : "0:2000",
"xnext" : "0:1a000",
"xprev" : "null",
"nsdiag" : "user_keywords_test_import.user_data",
"size" : 49152,
"firstRecord" : "0:20b0",
"lastRecord" : "0:dc70"
},
"lastExtentDetails" : {
"loc" : "5:45007000",
"xnext" : "null",
"xprev" : "5:1d5cd000",
"nsdiag" : "user_keywords_test_import.user_data",
"size" : 840650752,
"firstRecord" : "5:450070b0",
"lastRecord" : "5:64669e10"
},
"objectsFound" : 3181232,
"invalidObjects" : 19,
"bytesWithHeaders" :
2896796800,
"bytesWithoutHeaders" : 2845897088,
"deletedCount" : 11,
"deletedSize" : 313861744,
"nIndexes" : 1,
"keysPerIndex" : {
"user_keywords_test_import.user_data.$_id_" : 3181232
},
"valid" : false,
"errors" : [
"invalid bson object detected (see logs for more info)"
],
"advice" : "ns corrupt, requires repair",
"ok" : 1
}
So - looking at /var/log/mongodb/mongo.log I find these:
Sun Feb 9 21:00:20.693 [conn119] Assertion: 10320:BSONElement: bad type 116
0xde46e1 0xda5e1b 0x6eb769 0xa45dd0 0xa47775 0xa5a073 0xa5ab8b 0x8167b1 0xa7f28f 0x8f1c02 0x8e049a 0x8e1402 0x8e2472 0xa85630 0xa89efc 0x9fe119 0x9ff633 0x6e8518 0xdd0cae 0x7fdc65897e9a
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xda5e1b]
/usr/bin/mongod(_ZNK5mongo11BSONElement4sizeEv+0x1f9) [0x6eb769]
/usr/bin/mongod(_ZNK5mongo7Matcher13matchesDottedEPKcRKNS_11BSONElementERKNS_7BSONObjEiRKNS_14ElementMatcherEbPNS_12MatchDetailsE+0x14f0) [0xa45dd0]
/usr/bin/mongod(_ZNK5mongo7Matcher7matchesERKNS_7BSONObjEPNS_12MatchDetailsE+0xe5) [0xa47775]
/usr/bin/mongod(_ZNK5mongo19CoveredIndexMatcher7matchesERKNS_7BSONObjERKNS_7DiskLocEPNS_12MatchDetailsEb+0xd3) [0xa5a073]
/usr/bin/mongod(_ZNK5mongo19CoveredIndexMatcher14matchesCurrentEPNS_6CursorEPNS_12MatchDetailsE+0xeb) [0xa5ab8b]
/usr/bin/mongod(_ZN5mongo6Cursor14currentMatchesEPNS_12MatchDetailsE+0x41) [0x8167b1]
/usr/bin/mongod(_ZN5mongo8runCountEPKcRKNS_7BSONObjERSsRi+0x9af) [0xa7f28f]
/usr/bin/mongod(_ZN5mongo8CmdCount3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x122) [0x8f1c02]
/usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0x8e049a]
/usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xc02) [0x8e1402]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x5f2) [0x8e2472]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x40) [0xa85630]
/usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xd7c) [0xa89efc]
/usr/bin/mongod() [0x9fe119]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x383) [0x9ff633]
/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e8518]
/usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdd0cae]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fdc65897e9a]
Sun Feb 9 21:00:20.699 [conn119] Count with ns: user_keywords_test_import.user_data and query: { log_time: { $gt: new Date(1391212800000), $lte: new Date(1391817600000) } } failed with exception: 10320 BSONElement: bad type 116 code: 10320
Sun Feb 9 21:00:20.699 [conn119] command user_keywords_test_import.$cmd command: { count: "user_data", query: { log_time: { $gt: new Date(1391212800000), $lte: new Date(1391817600000) } }, fields: {} } ntoreturn:1 keyUpdates:0 numYields: 3 locks(micros) r:4273449 reslen:81 2146ms
Sun Feb 9 21:01:33.141 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52eea6283d14c8ef2d27ae11')
Sun Feb 9 21:01:33.142 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52eea62b3d14c8992d27ae10')
Sun Feb 9 21:01:33.142 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52eea6303d14c8992d27ae13')
Sun Feb 9 21:01:33.142 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52eea6323d14c8db2d27ae16')
Sun Feb 9 21:01:43.195 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f5113d3d14c8d62642b9d6')
Sun Feb 9 21:01:43.195 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f5113f3d14c87b2642b9d9')
Sun Feb 9 21:01:43.195 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f511423d14c8e62642b9ce')
Sun Feb 9 21:01:43.294 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f51f493d14c8483542b9ea')
Sun Feb 9 21:01:43.294 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f51f493d14c8723542b9e4')
Sun Feb 9 21:01:43.294 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f51f503d14c8503642b9d3')
Sun Feb 9 21:01:43.294 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f51f503d14c80a3642b9dc')
Sun Feb 9 21:01:43.294 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f51f503d14c8993542b9e3')
Sun Feb 9 21:01:43.384 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f52cf23d14c8694542b9ec')
Sun Feb 9 21:01:43.384 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f52cf43d14c8c74542b9e6')
Sun Feb 9 21:01:43.384 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f52cf73d14c8e24542b9dc')
Sun Feb 9 21:01:43.384 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f52cf73d14c82e4642b9d6')
Sun Feb 9 21:01:43.384 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f52cf83d14c8364542b9f1')
Sun Feb 9 21:01:43.384 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f52cfa3d14c8674642b9d1')
Sun Feb 9 21:01:43.384 [conn119] Invalid bson detected in user_keywords_test_import.user_data with _id: ObjectId('52f52cfa3d14c84e4642b9d1')
Sun Feb 9 21:01:43.810 [conn119] validating index 0: user_keywords_test_import.user_data.$_id_
Sun Feb 9 21:01:43.932 [conn119] command user_keywords_test_import.$cmd command: { validate: "user_data", full: true } ntoreturn:1 keyUpdates:0 locks(micros) r:13831388 reslen:4351 13831ms
Understanding this is beyond me - I just want to finish developing the app to do *reliable stuff* which is already way behind schedule!
You also mentioned that the _id field is also the same as the log_time value - can you let me know how you get the time value from the _id values? I'd be very interested to learn about this.
eg. for one document:
"_id" : { "$oid" : "52f385693d14c8f96942b9e3" } ... "log_time" : { "$date" : 1391691113771 }
MB