Recursive queries/indexes, for Hierarchical Documents

1,267 views
Skip to first unread message

Emil C

unread,
Dec 16, 2010, 6:42:06 PM12/16/10
to ravendb
I know this might be a bit of a silly question, probably doing it
wrong but I'll ask any way.

Lets say I have a set of Hierarchical Documents.
The only information they contain about their place in the hierarchy
is their parent id. If its null we have reached the top.

Getting parents and children are the easy part.

What I want to do is query for all the descendants and ancestors.
I think I have to parts to this question.
1. To get the ancestors to a document is their any way to make a
recursive query?
Get the parent parent until parent is null.

2. Is their any way to make a index/projections where we can map the
ancestor ids to a document. And then query the ancestor id or the
document id? Because if I could get that index it would be easy to say
get all documents by ancestor id for document id equal to the
documentid i want to get the children from.

Lets say I have this structure.

Document 1
Document 1.1
Document 1.1.1
Document 1.1.2
Document 1.2
Document 1.2.1

I want a index looking somthing like this
DocumentID AncestorID
1.1 1
1.1.1 1
1.1.1 1.1
1.1.2 1
1.1.2 1.1
1.2 1
1.2.1 1
1.2.1 1.2

Regards, Emil C

Emil C

unread,
Dec 17, 2010, 6:27:52 PM12/17/10
to ravendb
Is this a hard nut to crack?
Or am I unclear in what I want to do?
Or am I just insane to want to do this?

Regars, Emil C

Rob Ashton

unread,
Dec 18, 2010, 6:01:14 AM12/18/10
to ravendb
I will show you a solution in T-30 minutes

Rob Ashton

unread,
Dec 18, 2010, 6:40:50 AM12/18/10
to ravendb
Or rather I won't now I've actually thought about the problem.

The problem is while you can do a look up on the parent id, you can't
load another document up within an index definition - it would
seriously cause issues (that said, I did just write an
abstractviewgenerator that did something like that - urgh!)

My advice would be to store ALL the parent ids on each document and
then you haven't got to have this issue, you can just map across the
top-most one in your definition and query that.

Another solution would be to store the entire hierarchy as a single
document, but I don't know how big your situation is or whether that
would be a bad solution or not

On Dec 18, 11:01 am, Rob Ashton <robash...@CodeOfRob.com> wrote:
> I will show you a solution in T-30 minutes
>
> On Dec 17, 11:27 pm, Emil C <emil.card...@gmail.com> wrote:
>
O>

Ayende Rahien

unread,
Dec 18, 2010, 2:12:34 PM12/18/10
to rav...@googlegroups.com
Or, something that should be possible would  be to use Live Projections to traverse the full tree on doc load.
That way, you can write something like:


(datebase, results) => 
   from result in results
   let _ = database.Include( Hierarchy(x=>x.ParentId) )
   select result;

Please note that this does not work yet. It is just an idea about the syntax.
Thoughts?

Emil C

unread,
Dec 18, 2010, 6:35:51 PM12/18/10
to ravendb
First of all thanks for all you time to answer this question.

After Robs answer my thought where to put the ancestor on the document
and do a query on that.
The problem is when I move a document in the hierarchy I need to
update all descendants with the new ancestor information and that can
be a big number.

Is there any way I could make a trigger(or something like it) to make
use of ravendbs background worker thread to update the documents with
the new ancestor? Then I could make a solution that would bog down the
system when there is lots of documents.


Ayende: About the syntax. I think that would be great. I would call it
Recursive in stead of Hierarchy and have an max number of levels
parameter.
If you were to implement this. When do you think it would be ready for
testing?

Regards, Emil C




On Dec 18, 8:12 pm, Ayende Rahien <aye...@ayende.com> wrote:
> Or, something that should be possible would  be to use Live Projections to
> traverse the full tree on doc load.
> That way, you can write something like:
>
> (datebase, results) =>
>    from result in results
>    let _ = database.Include( Hierarchy(x=>x.ParentId) )
>    select result;
>
> Please note that this *does not work yet*. It is just an idea about the

Emil C

unread,
Dec 18, 2010, 7:55:50 PM12/18/10
to ravendb
I thought about the live projection solution to the problem.

Just have one question. Is it possible to query on the live projection
created?
Because that would be a way to get descendants. To say get documents
who got the document as a ancestor.

Regards, Emil C

Ayende Rahien

unread,
Dec 19, 2010, 8:11:18 AM12/19/10
to rav...@googlegroups.com
Eric,
Yes, a trigger to recognize that, then a background task to handle that would probably be ideal.

Regarding timing, I am focused on the new auto map/reduce and auto live projections right now. 
It should be pretty easy to implement, though, take a look at ITranslatorDatabaseAccessor, we need to implement Include there, but that is about it.

Ayende Rahien

unread,
Dec 19, 2010, 8:11:33 AM12/19/10
to rav...@googlegroups.com
No, you cannot query on the live projection

Emil C

unread,
Dec 19, 2010, 2:15:56 PM12/19/10
to ravendb
Is there a way to make the live projection work the other way around?
To get descendants.

Something like,, let _ =
database.Include( Hierarchy(x=>database.Include(y=> y.ParentID ==
x.Id) )

Regards, Emil

Ayende Rahien

unread,
Dec 19, 2010, 3:14:14 PM12/19/10
to rav...@googlegroups.com
You want to include what, all the items recursively that starts with a given node?

Emil C

unread,
Dec 19, 2010, 4:32:45 PM12/19/10
to ravendb
Yes. All the descendent to a given node.

/Emil

Rob Ashton

unread,
Dec 19, 2010, 6:00:10 PM12/19/10
to ravendb
Youch, that wouldn't necessarily be hard but would involve multiple
queries if you wanted to recurse down the ladder - pretty handy though

On Dec 19, 9:32 pm, Emil C <emil.card...@gmail.com> wrote:
> Yes. All the descendent to a given node.
>
> /Emil
> -

Emil C

unread,
Dec 19, 2010, 7:03:37 PM12/19/10
to ravendb
Bonus if it were possible to get the result paged. :)

/Emil

Ayende Rahien

unread,
Dec 20, 2010, 1:41:58 AM12/20/10
to rav...@googlegroups.com
Not going to work, no.
RavenDB queries are pretty simple, all told. They are basically reading a pre-computed index.
I suggest going with putting the full path in the document and projecting it out, along with a background task to do fixups if needed.

Emil C

unread,
Dec 20, 2010, 5:36:14 AM12/20/10
to ravendb
The solution I'm going for.
Thanks for all the input.

/Emil C

motorvei

unread,
Dec 22, 2010, 6:52:15 PM12/22/10
to rav...@googlegroups.com
Hi, sorry for breaking in, but I am getting completely hooked on RavenDB, it's a fantastic product, and I'm currently loading about 100.000 deeply hierarchical documents from an object database (400 million objects) into RavenDB, to see how it holds up. I am so far a bit unsure whether these documents should be broken up into smaller docs, as they really represent trees of other documents that will be updated individually, but then there is the question of how to manage the hierarchical structure, hence my interest in this thread.

One good and efficient way of managing such a structure, and at the same time be able to quickly access descendants, siblings, children and parents, is the ORDPATH (hierarchyid) implementation in MS SQL Server 2008. The structure itself is kept flat, but the ORDPATH implementation allows simple queries to access a hierarchy of arbitrary depth. In order to efficiently move objects in the hierarchy, there is the set based Re-parent operation.

My gut feeling is that an implementation of this would fit RavenDB very well!

Cheers

Ayende Rahien

unread,
Dec 23, 2010, 12:07:28 AM12/23/10
to ravendb
You are talking about making the hierarchical notion a core part of RavenDB?

motorvei

unread,
Dec 27, 2010, 11:23:28 AM12/27/10
to rav...@googlegroups.com
Hierarchical notion a core part: Yes and no :-) Consider a class

class HierarchyMember()
{
  string Name { get; set; }
  string Hierarchy { get; set; }
  HierarchyId Node { get; set; } 
}

Instances of this would be stored as normal documents in RavenDB, but Node is of a type that would work similarly to SQL Server 2008 HierarchyId (therefore I just called it HierarchyId, although I doubt that the SQL Server version can be used..), and represent a position in a tree hierarchy. A string representation of the contents of this field would be

/1/ - top node in the tree
/1/1/ - first child node
/1/2/ - second child node
/1/2/1/ - first child of second child node

The binary representation of this Type in SQL Server is supposed to be very efficient, so huge trees can be represented in a  compact way. However, as I see it, the most important point is to allow efficient indexing of such  members, 

//Depth first
Map = hierarchyMembers => 
from member in hierarchyMembers
select new { Hierarchy = member.Hierarchy, Node = member.Node }

//Breadth first
Map = hierarchyMembers => 

from member in hierarchyMembers
select new Hierarchy = member.Hierarchy, Level = member.Hierarchy.GetLevel(), Node = member.Node }

in order to allow the following queries:

hierarchyMembers.Hierarchy = "1" && hierarchyMembers.Node.IsDescendantOf( "/1/2/" ) // return all children
hierarchyMembers.Hierarchy = "1" && hierarchyMembers.Node.IsDescendantOf( "/1/2/" ) && hierarchyMembers.Level == 3  // return direct children

One would also need clever functions to create a new node "between" to existing ones, and to "reparent" a subtree (both are covered by SQL 2008 HierarchyId). Reparenting could easily run on the parent, and a PATCH would update the documents.

So, the indexed member would have to by hierarchical by nature, and would need client and server implementations, but documents and everything else don't need any structural changes to RavenDB, the way I see it. It would be a fantastic addition. It would allow me to store "right-sized" documents in RavenDB, and at the same time efficiently manage the documents in a hierarchy (multiple hierarchies for that matter, by definining multiple HierarchyId-members).

Hope I'm not wasting your time.

Cheers

Ayende Rahien

unread,
Dec 28, 2010, 3:30:44 AM12/28/10
to ravendb
No, this is interesting, but not really problematic. You can do it today very easily using StartWith in the Linq API
Using the exact string representation that you shown.

I understand that this is supposed to do more, and I would happily accept a pull request with this feature, but I am currently focusing on other aspects of RavenDB.
Reply all
Reply to author
Forward
0 new messages