How to find orphan documents and delete them?

2,779 views
Skip to first unread message

mongo...@gmail.com

unread,
Apr 6, 2017, 7:14:13 PM4/6/17
to mongodb-user
I'm quite new to MongoDB and I'm doing some tests, but clearly I want to delete the test data when my tests are over and don't leave them in the database.
MongoDB has no concept of primary key - foreign key constraint, but documents in a collection can reference documents in other collections.
Therefore how can I find all the orphan documents of the "Child" collection, which reference to the "Parent" collection doesn't exist any more?
This means the "parent" document has been deleted but the "children" haven't.
I'm coming from SQL, and in SQL would be something of the type:
DELETE FROM Children WHERE ParentId NOT IN (SELECT Id FROM Parents)


Rhys Campbell

unread,
Apr 7, 2017, 5:01:04 AM4/7/17
to mongodb-user
You won't be able to do it in one operation but you can use $lookup to find the documents and then delete those documents...

Akshat

unread,
Apr 27, 2017, 2:08:04 AM4/27/17
to mongodb-user
Hi,

 
 I want to delete the test data when my tests are over and don't leave them in the database.

If your collection contains only test data and you want to remove all data along with the collection you can use db.collection.drop() function.

Depending on your use case, you can utilise cursor.forEach method to to iterate the documents in the child collection and perform a function.Let's use a schema example as below:

Sample Parent Collection :--


{ "_id" : 2, "name" : "jesson", "class" : "10", "section" : "2" }
{ "_id" : 4, "name" : "alisha", "class" : "10", "section" : "2" }



Sample Child Collection:--

{
 
"_id" : 12,
 
"UID" : 2,
 
"studentname" : "jesson",
 
"examversion" : 1,
 
"MathsScore" : 68,
 
"PhysicsScore" : 69,
 
"ChemistryScore" : 50
}
{
 
"_id" : 14,
 
"UID" : 4,
 
"studentname" : "alisha",
 
"examversion" : 1,
 
"MathsScore" : 68,
 
"PhysicsScore" : 69,
 
"ChemistryScore" : 50
}




Removal method:--

db
.child.find().forEach( function(document)
 
{
     
var PRNTcursor=db.parent.find({"_id":document.UID})
     
if (PRNTcursor.hasNext() == false)
         
{
            db
.child.remove({"UID":document.UID})
         
}
 
} );





If you have a large collection, cursor.forEach may not be suitable for your use case. As it iterates all documents within the collection to search a few.

As an alternative, you can use aggregation pipeline as suggested by Rhys. :--

Step 1:-- Utilise Aggregation pipeline stage $lookup to find child documents with UID value that no longer exist in parents.


db
.child.aggregate([{$lookup:{from:"parent",localField:"UID",foreignField:"_id",as:"predecessor" }},{$project:{"_id":1,UID:1,SizeOFPredecessor:{$size: "$predecessor"}}},{$match:{"SizeOFPredecessor" :0}}]).pretty();


{ "_id" : 17, "UID" : 7, "SizeOFPredecessor" : 0 }
{ "_id" : 19, "UID" : 9, "SizeOFPredecessor" : 0 }



Step 2:-- deleteMany() will  delete those child document whose parents are no longer exists in the parent collections.


db
.child.deleteMany({"_id": {$in: [ 17,19 ] }});




Depending on your application use case on using parent/child references schema, I would also suggest to review:--

I would also recommend to enroll on a free course at MongoDB University. Especially the M102 course which cover data modeling and aggregation pipeline.

Thanks,
Akshat
Reply all
Reply to author
Forward
0 new messages