Delete corrupted document after run db.coll.validate()

559 views
Skip to first unread message

Shanshan Zhang

unread,
Sep 8, 2016, 10:45:40 AM9/8/16
to mongodb-user
Hi, To whom it may concern,
I have a problem with my current database. I used PyMongo to build the database.

I'm running mongodb 3.0.12, everything is OK, until one day the db.coll.count(some criteria) return a bad BSONElement type 105 error.

Then I run db.coll.validate(true) to anchor the bad documents, there seem a bunch of documents are corrupted although PyMongo successfully inserted them into MongoDB at the beginning. 

2016-09-08T09:37:06.776-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: invalid bson type in object with _id: "extrain_06/81264"
2016-09-08T09:37:06.776-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: not null terminated string in object with _id: "extrain_06/81270"
2016-09-08T09:37:06.776-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: invalid bson type in object with _id: "extrain_06/81271"
2016-09-08T09:37:06.776-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: invalid bson type in object with _id: "extrain_06/81287"
2016-09-08T09:37:06.776-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: invalid bson in object with _id: "extrain_06/81289"
2016-09-08T09:37:13.503-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: not null terminated string in object with _id: "extrapub_048/66412"
2016-09-08T09:37:13.503-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: invalid bson in object with unknown _id
2016-09-08T09:37:13.503-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: not null terminated string in object with _id: "extrain_44/55081"
2016-09-08T09:37:13.503-0400 I STORAGE  [conn7] Invalid object detected in linkedin.profiles: not null terminated string in object with _id: "extrapub_167/89020"

200 hundreds more......


I have two questions, 
1. I want to delete all those corrupted documents, but some has error :   not null terminated string in object with unknown _id. How can I delete these documents?
2. Next time, when I insert data from PyMongo, how to avoid the invalid insertion in advance?


Best,

Shanshan Zhang

unread,
Sep 8, 2016, 9:17:37 PM9/8/16
to mongodb-user
Why nobody answers my question? Is it unusual of using MongoDB in this way?

I want to add more information of this post.

This is not the start of my problem. I was using MongoDB in an old computer, now I get a new one, so to transfer data, I tried several different ways.
My first  trial is using mongodump and then mongorestore. mongodump doesn't raise any error, while when I use mongorestore it always fails by reporting corrupted document error. So I tried bsondump to check each .bson file of my dump. Now I see bsondump raises corrupted documents error. It really means some documents are corrupted.
 

Then I tried to repair the database by starting mongod with --repair flag, but it fails to repair the database for me. 

Then I followed a poster saying using mongodump --repair instead of mongod --repair because mongodump --repair is less strict then mongod --repair. It fails to dump.

Finally, I copied the database folder directly to the new computer, setting the datapath in /etc/mongod.conf to the database location. Now when I start the mongod, I can successfully connect to the database. I checked simple queries like db.coll.findOne(), db.coll.count(), they work fine. Only when I'm using db.coll.count(some criteria), the bad BSONElement type 105 error. I think it's because setting some criteria in count() makes the mongodb scanning the whole database, so invalid documents cause problem.
  
It returns to the original problem, I have some corrupted documents in the database. 
LUCKILY, I find db.coll.validate() can locate the corrupted files for me, that's why I asked the question about how to delete corrupted document which is said to be with " unknown _id ".



If anyone  has a clue, please let me know, thanks very much in advance. 
Or if you has a better solution of migrating from a old computer to a new computer,  just give me a shot.

Best,

John Murphy

unread,
Sep 21, 2016, 2:48:39 AM9/21/16
to mongodb-user

Hi Shanshan,

According to the log entries provided you do appear to have unusual database corruption. This should not occur during normal operation and may be the result of an improper shutdown or mongod crash, especially if you are running the MMAPv1 storage engine with journaling disabled.

Could you confirm the state of journaling by reviewing (and possibly attaching) the results of the db.serverCmdLineOpts() command from your mongo shell.

If you are running a replica set you can restore a copy of your data from another member server. This involves shutting down the good mongod instance and then copying the data files from the dbpath location to your corrupted server.

However if you are running a standalone server the recommended method to clean up a database is by using the mongod --repair command line option. Note that this operation does not save any corrupt data during the repair process.

As you have attempted the database repair without success could you include more information around the failure you are seeing, along with providing the exact command line being used.

For more information on how to perform a database repair you can review the repairDatabase command documentation.

Regards,
John Murphy

Reply all
Reply to author
Forward
0 new messages