MongoDB - realise a graph based filesystem with $graphLookup

654 views
Skip to first unread message

Jack Kn

unread,
Jun 6, 2017, 5:46:41 PM6/6/17
to mongodb-user
Hi,

today a i read about the $graphLookup feature and i was totaly impressed about this (for me) new feature.
Currently i solve these type of problems with eval and now i gonna transfer these functions to $graphLookup.

One of these functions is "getAllSubFolders()". And now i got in some troubles, because my
connectFromField is a object, rather than being an array (as in the mongoDB documentation examples).

To unterstand this better:

Folder documents are structured like this:

{
    "_id" : ObjectId("58286a66d43ee415567ab4c8"),
    "Children" : {
        "File_58286ce0d43ee415567ab4e5" : "58286ce0d43ee415567ab4e5",
        "Folder_5829d284d43ee41fa74f7e50" : "5829d284d43ee41fa74f7e50",
    },
    "UserID": "1"
   ...
}

Now as i mentioned, i need a aggregation pipeline (with $graphLookup?) which will give me all subfolders back.

To be more specific: My current problem is startWith. I need something like:

db.folders.aggregate( [
{ $match : { "UserID" : "1"  } },
   {
      $graphLookup: {
         from: "folders",
         startWith: "getObjectValues($Children)", //returns an array with the object values
         connectFromField: "getObjectValues($Children)", // not sure about this 
         connectToField: "_id",
         as: "sub_folders"
      }
   }
]);

But this of course, dosen't gonna work because of the absence of the getObjectValues function.

Asya Kamsky

unread,
Jun 7, 2017, 10:59:52 AM6/7/17
to mongodb-user
Can you explain what the Children sub-object represents?

That is: 
 "File_58286ce0d43ee415567ab4e5" : "58286ce0d43ee415567ab4e5”, 
 "Folder_5829d284d43ee41fa74f7e50" : “5829d284d43ee41fa74f7e50"

What are the key names?   Are they just “File” and “folder” labels but with the field value appended?   I see the same hex string in both keys and values…

Also are those strings?   Or are they strings that really map to ObectId types in the “_id”?

Asya


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/7f669ca3-3d2c-455c-afe3-87ce14274e22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Asya Kamsky
Lead Product Manager
MongoDB
Download MongoDB - mongodb.org/downloads
We're Hiring! - https://www.mongodb.com/careers

Jack Kn

unread,
Jun 7, 2017, 12:46:53 PM6/7/17
to mongodb-user
 "File_58286ce0d43ee415567ab4e5" : "58286ce0d43ee415567ab4e5”, 
 "Folder_5829d284d43ee41fa74f7e50" : “5829d284d43ee41fa74f7e50"

Yes, the same hex string. The prefix File/Folder is just to distinguish between Files and Folders. I know a bad model, but the software is nearly finished and a change would be painfull.
The value is just a String in the database, but reference to an another file or folder object. So in the example above
5829d284d43ee41fa74f7e50 reference to a Folder Document per (_id).

Abstract:
"Folder_<objectId as string>: <same objectId as String without prefix>".
Are they just “File” and “folder” labels but with the field value appended?
Yes. Exactly.

Asya Kamsky

unread,
Jun 8, 2017, 10:26:56 AM6/8/17
to mongodb-user
The fact that your model has variable key names is problem number one for the type of query you want to do.  Currently the $graphLookup expression only takes a string for a field name for connect from/connect to fields.   If they took expressions, it would help.   But because MongoDB has views functionality which allows you to apply an aggregation pipeline to a collection (and treat the result view as a collection in $graphLookup) you could do some things to change the schema without actually changing any of your application at all.

Original documents:
db.folders.find().pretty()
{
"_id" : ObjectId("58286a66d43ee415567ab4c8"),
"Children" : {
"File_58286ce0d43ee415567ab4e5" : "58286ce0d43ee415567ab4e5",
"Folder_5829d284d43ee41fa74f7e50" : "5829d284d43ee41fa74f7e50"
},
"UserID" : "1"
}
{
"_id" : ObjectId("5829d284d43ee41fa74f7e51"),
"Children" : {
"File_58286ce0d43ee415567ab4e6" : "58286ce0d43ee415567ab4e6",
"Folder_5829d284d43ee41fa74f7e51" : "5829d284d43ee41fa74f7e51"
},
"UserID" : "1"
}

db.folders.aggregate([
  {$addFields:{
     Children:{
        $arrayToObject:{
          $map:{
             input:{$objectToArray:"$Children"},
             in: {
                k:{$substr:[
                      "$$this.k",
                      0,
                      {$indexOfCP:["$$this.k","_"]}
                ]},
                v:"$$this.v"}
          }
        }
     }
  }}
]).pretty()
{
"_id" : ObjectId("58286a66d43ee415567ab4c8"),
"Children" : {
"File" : "58286ce0d43ee415567ab4e5",
"Folder" : "5829d284d43ee41fa74f7e50"
},
"UserID" : "1"
}
{
"_id" : ObjectId("5829d284d43ee41fa74f7e50"),
"Children" : {
"File" : "58286ce0d43ee415567ab4e6",
"Folder" : "5829d284d43ee41fa74f7e51"
},
"UserID" : "1"
}

So now define a view on "folders":

db.createView("foldersKeynamesFixed", "folders", [ {$addFields:{Children:{$arrayToObject:{$map:{input:{$objectToArray:"$Children"}, in:{k:{$substr:["$$this.k",0,{$indexOfCP:["$$this.k","_"]}]},v:"$$this.v"}}}}}} ])

And now the keys in "Children" are "fixed" - no extra string there.

Can you now do $graphLookup?
You can but you won't get the right output, and the reason is that you can't compare ObjectId() type with string type and get equality, and currently the aggregation pipeline does not have a method to convert between ObjectId and string (adding this improvement is tracked in SERVER-24947

Here's how it would work if you were storing ObjectId values rather than their string equivalents:

> db.folders.find().pretty()
{
"_id" : ObjectId("5829d284d43ee41fa74f7e50"),
"Children" : {
"File_58286ce0d43ee415567ab4e6" : ObjectId("58286ce0d43ee415567ab4e6"),
"Folder_5829d284d43ee41fa74f7e51" : ObjectId("5829d284d43ee41fa74f7e51")
},
"UserID" : "1"
}
{
"_id" : ObjectId("58286a66d43ee415567ab4c8"),
"Children" : {
"File_58286ce0d43ee415567ab4e5" : ObjectId("58286ce0d43ee415567ab4e5"),
"Folder_5829d284d43ee41fa74f7e50" : ObjectId("5829d284d43ee41fa74f7e50")
},
"UserID" : "1"
}
> db.foldersKeynamesFixed.find().pretty()
{
"_id" : ObjectId("5829d284d43ee41fa74f7e50"),
"Children" : {
"File" : ObjectId("58286ce0d43ee415567ab4e6"),
"Folder" : ObjectId("5829d284d43ee41fa74f7e51")
},
"UserID" : "1"
}
{
"_id" : ObjectId("58286a66d43ee415567ab4c8"),
"Children" : {
"File" : ObjectId("58286ce0d43ee415567ab4e5"),
"Folder" : ObjectId("5829d284d43ee41fa74f7e50")
},
"UserID" : "1"
}
{
"_id" : ObjectId("5829d284d43ee41fa74f7e51"),
"UserID" : "1",
"Children" : null
}
> db.foldersKeynamesFixed.aggregate({$match:{_id:ObjectId("58286a66d43ee415567ab4c8")}},{$graphLookup:{from:"foldersKeynamesFixed",startWith:"$Children.Folder",connectFromField:"Children.Folder", connectToField:"_id",as:"allChildrenFolders"}} ).pretty()
{
"_id" : ObjectId("58286a66d43ee415567ab4c8"),
"Children" : {
"File" : ObjectId("58286ce0d43ee415567ab4e5"),
"Folder" : ObjectId("5829d284d43ee41fa74f7e50")
},
"UserID" : "1",
"allChildrenFolders" : [
{
"_id" : ObjectId("5829d284d43ee41fa74f7e51"),
"UserID" : "1",
"Children" : null
},
{
"_id" : ObjectId("5829d284d43ee41fa74f7e50"),
"Children" : {
"File" : ObjectId("58286ce0d43ee415567ab4e6"),
"Folder" : ObjectId("5829d284d43ee41fa74f7e51")
},
"UserID" : "1"
}
]
}


However, I see even more problems in your schema.   How do you represent a folder that has multiple folders and files as its children?   Is that why you appended the extra part to the "Folder" key, to avoid key name collisions?  That's properly achieved with arrays, not subdocuments!  In any case, it's usually more common to store the *parent* if you only want to store a single (fixed) number of hierarchy relationships.

I recently did some work with another user with exactly this sort of use case (files and folders).  Here is the sample aggregation I gave them to get all files given a path to folder (when the folder does not store the entire path, but only a parent reference):


Hope this helps, I know rewriting things can be hard, but it'll be harder to deploy a flawed design that doesn't let you do what you need to do with the data!   Trust me, you will *still* need to rewrite it, just under much more painful circumstances...   :(

Asya




--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

For more options, visit https://groups.google.com/d/optout.

Jack Kn

unread,
Jun 20, 2017, 4:40:02 PM6/20/17
to mongodb-user
Thanks for your help, we successfully changed our database scheme to support the $graphLookup feature.

Work like a charm :)

We now have to arrays, to distinguish between folders and files:

Children: {
{ "Folders" : [ { "$oid" : "581bc233d43ee401131d82b8" }, { "$oid" : "581bc23ad43ee401131d82b9" }, { "$oid" : "581bc23dd43ee401153fbe07" } ],
"Files" : [ { "$oid" : "581bb4abd43ee401131d82a0" } ] }

}

In any case, it's usually more common to store the *parent* if you only want to store a single (fixed) number of hierarchy relationships.

Can you tell me why it is more common to store the parent? Any advantages over storing children? I did some research but didn't find anything useful yet. Maybe i'm searching with the wrong keywords.

Asya Kamsky

unread,
Jun 24, 2017, 11:46:34 PM6/24/17
to mongodb-user
The queries whether you’re storing parents or children are the same (more or less).

If you store children then each document has a potentially large (or small or empty) array of children.

If you store parents then every node has exactly one (or 0 or null in the case of root) parent.

Asya


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages