Storing a tree in Mongo

Jim Cortez

unread,

Nov 16, 2009, 6:13:33 PM11/16/09

to mongodb-user

Hello all,
I am trying to find the best way to store a large (50000+ nodes)
hierarchical tree in Mongo. Each node can have n parents and n leaves.

The source is a mongo collection of edge definitions like so:
{parent-id:"process0", child-id:"process1"}
{parent-id:"process0", child-id:"process2"}
{parent-id:"process1", child-id:"process4"}
{parent-id:"process2", child-id:"process5"}

For experimental sake, I wrote a small python script that constructs the
entire tree in memory (and makes some neat graphvis graphs). My goal now
is to write a REST api to traverse this graph. Sample URLS:

/processTree/ <-gives all root nodes
/processTree/process0 <-gives me all the children of process0
/processTree/process0/process1 <- gives me all the children of process1

What is the best way to model this tree data in Mongo?

Resolving the huge amounts of node is very slow, so I think that using
my preprocessed tree would be better. Currently, I have my data modeled
like so:

{ "_id" : ObjectId("..."),
"name" : "process0",
"children:" : [
{"children:" : [
{"children:" : [],
"name" : "process4"}
],
"name" : "process1"},
{"children:" : [],
"name" : "process2"}
],
},
{ "_id" : ObjectId("..."),
"name" : "process7"
"children:" : [
{"children:" : [],
"name" : "process8"}
],
}

However, I have found that trying to query this is tough. How do I get
all the children of process1?

I would prefer quick lookups, space does not matter at the moment.
Thoughts? Concerns? Political Commentary?
Thanks,
Jim

Dwight Merriman

unread,

Nov 16, 2009, 6:31:48 PM11/16/09

to mongod...@googlegroups.com

one option would be to put the full path of a node in each object,
perhaps in _id:

{
path : "a.b.c.d.e.f",
...
}

index on path

then, "get all children of "a.b.c" would be the query :

{ path : /^a.b.c./ }

which would use the index

note there is a limit on index key sizes this will up up to path
strings about 2KB long (not sure exact limit from memory)

also - you could simply put the path string in _id if it uniquely
identifies a document.

Mathias Stearn

unread,

Nov 16, 2009, 6:35:20 PM11/16/09

to mongod...@googlegroups.com

I'm going to write up a wiki page to summarize different ways to store trees based off of http://seancribbs.com/tech/2009/09/28/modeling-a-tree-in-a-document-database.

For your case I'd suggest storing a parent_id and a children list:

{_id: ObjectId()

,name: "foo"

,parent_id: ObjectId() // or null or missing

,children: [ObjectId(), ObjectId(), ...] // or [] or missing

}

Then you could easily take the last part of the url and do something like:

children = find_one({name:FROM_URL}).children

child_objs = find({_id:{$in: children}}).toArray()

If names are globaly unique then you could store them in place of all of the ObjectIds and then the "children" field will give you all the data you need without a second query.

--Mathias

On Mon, Nov 16, 2009 at 6:13 PM, Jim Cortez <j...@jimcortez.com> wrote:

Michael Schurter

unread,

Nov 16, 2009, 6:40:28 PM11/16/09

to mongod...@googlegroups.com

On Mon, Nov 16, 2009 at 3:13 PM, Jim Cortez <j...@jimcortez.com> wrote:
>
> Hello all,
> I am trying to find the best way to store a large (50000+ nodes)
> hierarchical tree in Mongo. Each node can have n parents and n leaves.
>
> The source is a mongo collection of edge definitions like so:
> {parent-id:"process0", child-id:"process1"}
> {parent-id:"process0", child-id:"process2"}
> {parent-id:"process1", child-id:"process4"}
> {parent-id:"process2", child-id:"process5"}
>
> For experimental sake, I wrote a small python script that constructs the
> entire tree in memory (and makes some neat graphvis graphs). My goal now
> is to write a REST api to traverse this graph. Sample URLS:
>
> /processTree/ <-gives all root nodes
> /processTree/process0 <-gives me all the children of process0
> /processTree/process0/process1 <- gives me all the children of process1
>
> What is the best way to model this tree data in Mongo?
>
> Resolving the huge amounts of node is very slow, so I think that using

Out of curiosity are you pulling 1 doc at a time? If so perhaps
resolving all docs using a server-side function might speed things up?

If the bottleneck is roundtrips to the server it could help. If the
bottleneck is already processing, well I probably just killed your
server . ;-)

Jim Cortez

unread,

Nov 16, 2009, 7:23:34 PM11/16/09

to mongod...@googlegroups.com

The processing is the shear amount of data. I already do server-side processing and serve things up using my own REST framework. One of the goals is to do some dynamic visualization in flex.
Jim

Michael Schurter wrote:

Out of curiosity are you pulling 1 doc at a time?  If so perhaps
resolving all docs using a server-side function might speed things up?

If the bottleneck is roundtrips to the server it could help.  If the
bottleneck is already processing, well I probably just killed your
server .  ;-)

--

You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=.

Jim Cortez

unread,

Nov 16, 2009, 7:25:41 PM11/16/09

to mongod...@googlegroups.com

Mathias,
Thanks for the idea of just storing parents and their children, that sounds like a decent approach. I actually read that article you mentioned but, could not decide on an approach that would allow me to do some easy queries.
Thanks,
Jim

Mathias Stearn wrote:

--~--~---------~--~----~------------~-------~--~----~

You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com

To unsubscribe from this group, send email to mongodb-user...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en
-~----------~----~----~----~------~----~------~--~---

jnunemaker

unread,

Nov 17, 2009, 8:29:59 AM11/17/09

to mongodb-user

I went along Dwight's path for an app we are building. It is a website
management system so it has items that get stored in a directory like
structure. Each document has parent_ids which is an array and
parent_id which is an object id. You can get all ancestors by querying
in parent_ids and all direct children by using parent_id. We also have
a path key that stores the full URL path to each document so querying
nested documents by URL is quick for front side rendering. Working
well for us thus far.

It is called materialized path. There are a ton of results on google
for it. Technically we have two materialized paths (path and
parent_ids). In our situation denormalizing a bit made sense as there
are different ways we need to get to our data depending on the
situation.

On Nov 16, 7:25 pm, Jim Cortez <j...@jimcortez.com> wrote:
> Mathias,
> Thanks for the idea of just storing parents and their children, that
> sounds like a decent approach. I actually read that article you
> mentioned but, could not decide on an approach that would allow me to do
> some easy queries.
> Thanks,
> Jim
>
> Mathias Stearn wrote:
> > I'm going to write up a wiki page to summarize different ways to store
> > trees based off of

> >http://seancribbs.com/tech/2009/09/28/modeling-a-tree-in-a-document-d....

Reply all

Reply to author

Forward