Updating Documents With Recursive Map Reduce

Andrew Browne

unread,

Feb 9, 2014, 2:37:46 AM2/9/14

to rav...@googlegroups.com

Hey All,

I have a couple of problems that I have solved with recursive map reduce. I wanted to float the scenarios here in case there is something much better I should be doing. These problems may be better solved with something like Neo4J but I can keep the whole app in RavenDB for now that would be a big win.

Our application has a nested folder structure - currently we model it as one document per folder and folders point to their parent folder.

Scenario 1:

User permissions are inherited down the folder hierarchy. We have a recursive map reduce that sets each folder's computed permissions to the computed permissions of it's parent + it's own permissions. If a user is added at the top of the hierarchy the changes are propagated down one level at a time. Currently the computed permissions are stored in a companion document to avoid conflicts with the application editing the document but we are considering moving them to the document metadata so they can be used similarly to the permissions in the authorization bundle. So far performance seems. We are load testing right now and they permissions can get behind when inserting a large number of folders but under normal conditions it all looks good.

Scenario 2:

When we display a folder we want to enable breadcrumbs all the way to the top. In order to facilitate this we have a folder path property which is updated in a trigger. Once again if the user changes the name of a folder higher in the hierarchy the change propagates down and we append the current folder's name to the end.

So my questions are:

1. Is there some other way to model the information that would avoid these recursive map reduce updates? I've browsed some older posts that were related to this but couldn't find a design that seemed better. We are talking about large number of folders (10,000+) nested up to 10 levels deep so the solutions that suggested storing the hierarchy in one document probably won't work for me.

2. Is there a more elegant way to solve this? I would love if these properties could be computed in an index without the extra step of writing out a document but I understand that you need to write to a document in the trigger to the next recursion.

3. Should I be trying to solve this in my application somehow? I couldn't work out how to do this without running into race conditions if documents are being editing/indexes updated at the same time as I am querying for documents to update.

3. How commonly are people using recursive map/reduce in RavenDB? - I haven't found too many examples around the web.

If the answer to my questions is that this is a good idea and I'm on the right track I'll make sure to extract some stand alone examples in case they are helpful to others here.

cheers

Andrew Browne

Picnic Software

Oren Eini (Ayende Rahien)

unread,

Feb 9, 2014, 5:02:03 AM2/9/14

to ravendb

Are you always access a single folder by path?

Oren Eini

CEO

Hibernating Rhinos

Cellular: +972-52-548-6969

Office: +972-4-674-7811

Fax: +972-153-4622-7811

--
You received this message because you are subscribed to the Google Groups "RavenDB - 2nd generation document database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew Browne

unread,

Feb 9, 2014, 7:30:15 AM2/9/14

to rav...@googlegroups.com

Mostly we are accessing a single folder by id. If it helps we could probably use the full path path.

I should explain that we actually store two folder paths - one with the name for display only and one with that path of ids. The id version is used as a prefix in search so that we can search for items underneath a folder. One of the really nice things about the recursive map reduce is things like moving folders become trivial because we just update the parent id and all the paths are kept up to date.

Oren Eini (Ayende Rahien)

unread,

Feb 9, 2014, 9:27:06 AM2/9/14

to ravendb

Why no use includes, instead?

Oren Eini

CEO

Hibernating Rhinos

Cellular: +972-52-548-6969

Office: +972-4-674-7811

Fax: +972-153-4622-7811

Itamar Syn-Hershko

unread,

Feb 9, 2014, 9:32:07 AM2/9/14

to rav...@googlegroups.com

My 0.02 on this scenario: http://code972.com/blog/2014/01/769-modelling-hierarchical-data-with-ravendb (TLDR: model it differently and do all that in-memory)

--

Itamar Syn-Hershko

http://code972.com | @synhershko

Freelance Developer & Consultant

Author of RavenDB in Action

On Sun, Feb 9, 2014 at 9:37 AM, Andrew Browne <bro...@brownie.com.au> wrote:

--

Kijana Woodard

unread,

Feb 9, 2014, 9:34:53 AM2/9/14

to rav...@googlegroups.com

How many users can modify the folder hierarchy?
How often does that happen?

thef

Itamar Syn-Hershko

unread,

Feb 9, 2014, 9:38:44 AM2/9/14

to rav...@googlegroups.com

Kijana, unless such a change occurs every second you'll be more than set with using optimistic concurrency. The only parameter that should really worry you is the size of the hierarchy document, and depending on your scenario you may be able to break it down to multiple departments etc.

--

Itamar Syn-Hershko

http://code972.com | @synhershko

Freelance Developer & Consultant

Author of RavenDB in Action

Kijana Woodard

unread,

Feb 9, 2014, 9:43:36 AM2/9/14

to rav...@googlegroups.com

@itamar - good article. Agree.
Fwiw, I've used the same approach with SQL server of modeling the hierarchy independently from the entities (using the hierarchy id feature).

The other bonus you get from this approach is being able to easily have the entities in multiple hierarchies.

On Feb 9, 2014 8:32 AM, "Itamar Syn-Hershko" <ita...@code972.com> wrote:

Kijana Woodard

unread,

Feb 9, 2014, 9:44:55 AM2/9/14

to rav...@googlegroups.com

Agree. Pressed send before I saw your post. Was leading to that conclusion through inquiry.

Kijana Woodard

unread,

Feb 9, 2014, 9:48:37 AM2/9/14

to rav...@googlegroups.com

Oh. Also helps when you realize a single item needs to be in multiple places within a single hierarchy: this product should be under electronics, home/garden, and hot deals.

Andrew Browne

unread,

Feb 9, 2014, 6:59:27 PM2/9/14

to rav...@googlegroups.com

Thanks. Storing the whole hierarchy in one/a few documents might be doable, but I have a couple more questions.

Seeing the referenced post has pushed me into thinking more about how the results of my recursive map/reduce are used.

Kijana's question is a good one - security changes will happen reasonably regularly at the bottom levels but most of the time much less than once a second. I'm more worried about

having good search/query performance. So a single document with the hierarchy makes updates easier and loading the hierarchy quick - but working through how my queries/indexes would work

is my new concern.

First we use a trigger similar to the one that comes with the authorization plugin to filter out things people can't see. Would I

need to load that large document for each item in my read trigger?

We also use it when building our main search index in two ways. Pseudo code for the index:

Map = items =>

from item in items

let security = LoadDocument("securitydocument/" + item.id)

select new {

name = item.name,

readGroups = security.readGroups,

path = item.path

}

Then we can query like:

path:/1/2/3/* AND (ReadGroups:"CurrentUsers'sGroup1" || ReadGroups:"CurrentUsers'sGroup2")

Note: the securitydocument and the path field are the things that have been setup by the recursive map reduce.

If I was to switch to having a single document for the hierarchy would I use the Recurse function somehow to index the full path and collect read groups going down the hierarchy? Is that going to kill my indexing performance because it'll be parsing that big hierarchy document in each Map step?

@oren I don't think Include will work to calculate these as it will only go up one level? I can use it to get the parent folder's id and name but I don't know how to get it to load the whole chain back up to the root.

Reply all

Reply to author

Forward