Very interesting! Luck - May it be with you!
--
In this scenario, would it be a better option to store data in a graph database like neo4j?On Tuesday, 12 March 2013 at 3:01 PM, Kiran Jonnalagadda wrote:
On Tuesday, 12 March 2013 at 1:00 PM, Mitesh Ashar wrote:Very interesting! Luck - May it be with you!
So I wrote this, packed up and headed to the office, and on the way realized I'm completely wrong. The closure table pattern isn't what we need in Nodular.Look at this slide comparing the various approaches: http://www.slideshare.net/billkarwin/models-for-hierarchical-data/69Now here's the situation: In a path like /foo/bar, we have to ensure that the name "foo" is unique within its container (the root node), and that "bar" is likewise unique within "foo". There can't be two nodes with the same name in the same container.To do this, I have a `parent_id` column in Node, and a unique constraint on (name, parent_id): https://github.com/hasgeek/nodular/blob/master/nodular/node.py#L27I do not know of any other way to establish this constraint on `name`. Let me know if you know of a way.What this means is that we have to use the adjacency list pattern regardless of whatever else we use to make subtree queries easier.The second major problem is URL traversal. If the user wants to go to /foo/bar/baz, we have to establish this path exists, and that takes at least as many queries as there are path segments (ie, O(n+1) efficiency):1. Find the root node2. Find 'foo' within root3. Find 'bar' within 'foo'4. Find 'baz' within 'bar'The deeper the tree gets, the more queries we have to run. This is obviously inefficient, and the closure table pattern does nothing to help. It can retrieve the full tree, but not specific paths.The path enumeration pattern, however, tackles exactly this problem. It stores the full path as a column in the row, so it's trivial to retrieve a path or subtree. The two problems with path enumeration (as per Karwin's slide above) -- querying for immediate children and maintaining referential integrity -- are both already solved by the adjacency list pattern.
The path enumeration pattern does have one problem: if a node is renamed, the paths of everything below it have to be amended recursively. Fortunately for us, renames are relatively rare, so it's okay for them to be inefficient.I'm going to have to throw out all of yesterday's work and retool for path enumeration. At least this is much easier to implement.Kiran--
You received this message because you are subscribed to the Google Groups "HasGeek Code" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hasgeek-code...@googlegroups.com.
To post to this group, send email to hasgee...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "HasGeek Code" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hasgeek-code...@googlegroups.com.
To post to this group, send email to hasgee...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
A table with 2 column like urlpath, pkeys_as_dict. E.g: /jsfoo/2013/23-video-name, {'channel': 2, 'playlist': 34, 'video': 23} stored in db. Retrival time is O(1) since it is more like hashtable. Is this how path enumeration or traversal work ? if irrelevant pls ignore.
In this scenario, would it be a better option to store data in a graph database like neo4j?
The path enumeration pattern, however, tackles exactly this problem. It stores the full path as a column in the row, so it's trivial to retrieve a path or subtree. The two problems with path enumeration (as per Karwin's slide above) -- querying for immediate children and maintaining referential integrity -- are both already solved by the adjacency list pattern.The path enumeration pattern does have one problem: if a node is renamed, the paths of everything below it have to be amended recursively. Fortunately for us, renames are relatively rare, so it's okay for them to be inefficient.
Path enumeration with recursive path amendment is now in the repo with tests. http://git.io/ViZwQw
At Open Library, we had pages with /about.en and /about.it etc. and/about page picks the right document (node in your case) and displaysthe content.