Tree Behaviour Overhaul (?)

115 views
Skip to first unread message

Thom Seddon

unread,
Feb 8, 2012, 8:58:10 AM2/8/12
to cakeph...@googlegroups.com
I recently submitted a pull request for some changes to the Tree Behaviour to allow scoped trees to be independent (i.e. MPTT revolves around each scope), here:  https://github.com/cakephp/cakephp/pull/444
It works as it is (loads of tests included!) and after some moving and shaking it fit's in nicely.

The question now is: should we "patch" the current behaviour like this or would this be a good opportunity to overhaul the Tree Behaviour and build in some alternative hierarchical structures to give better solutions to different use cases and generally broaden the potential scope of usage.

I wrote a brief overview of potential solutions further down the pull request (https://github.com/cakephp/cakephp/pull/444#issuecomment-3846600):

I think you are quite right - MPTT isn't always the right answer, I think these would be the main options:

  • Adjacency List - just store parent_id
    • Pros - Simple, OK for retrieving a single level, good write performance
    • Cons - Poor recursive read performance, harder to correctly manage all children when removing part of tree (limited depth?)
  • Path Enumeration - store chain of ancestors e.g. path = "/1/2/3"
    • Pros - Good matching for simple trees e.g. page structure (breadcrumbs!), good write performance
    • Cons - Harder to display descendent's structure
  • Nested Sets (MPTT) - store lft and rght (similar to current implementation)
    • Pros - Good for counting descendent's, better recursive read performance
    • Cons - Writing a single node can touch entire tree, not easy to retrieve immediate ancestor/decendent
  • Closure Table - extra table to store all relations (ancestor_id, descendent_ic)
    • Pros - Good for retrieving all ancestors/descendents
    • Cons - Potentially very large table, more DB load on inserts/deletes
    • Can add "length" column to closure table to track depth from ancestor, improves read performance for retirieving direct ancestor/descendent's

The current implementation combines Agency List with Nested Set to to benefit from the Pro's of both, but also suffers from main con of Nested Set's (may have to touch large number of nodes on any insert/delete) although the tree separation in this pull request was aimed at mitigating this.

Also, NoSQL type data storage opens up other potential structures?

Overall, alternative structures would allow developers to make more informed decisions of how to tackle storing hierarchical data, In my opinion I think that at least adding path enumeration would be a valuable addition.

I would like to gauge response and any thoughts/comments, thanks.

AD7six

unread,
Feb 10, 2012, 7:06:38 AM2/10/12
to cakephp-core


On Feb 8, 2:58 pm, Thom Seddon <t...@seddonmedia.co.uk> wrote:
> I recently submitted a pull request for some changes to the Tree Behaviour
> to allow scoped trees to be independent (i.e. MPTT revolves around each
> scope), here:  https://github.com/cakephp/cakephp/pull/444
> It works as it is (loads of tests included!) and after some moving and
> shaking it fit's in nicely.
>
> The question now is: *should we "patch" the current behaviour like this or
> would this be a good opportunity to overhaul the Tree Behaviour and build
> in some alternative hierarchical structures* to give better solutions to
> different use cases and generally broaden the potential scope of usage.
>
> I wrote a brief overview of potential solutions further down the pull
> request (https://github.com/cakephp/cakephp/pull/444#issuecomment-3846600):
>
> I think you are quite right - MPTT isn't always the right answer, I think
>
>
>
>
>
>
>
> > these would be the main options:
>
> >    - *Adjacency List* - just store parent_id
> >       - *Pros* - Simple, OK for retrieving a single level, good write
> >       performance
> >       - *Cons* - Poor recursive read performance, harder to correctly
> >       manage all children when removing part of tree (limited depth?)
> >    - *Path Enumeration* - store chain of ancestors e.g. path = "/1/2/3"
> >       - *Pros* - Good matching for simple trees e.g. page structure
> >       (breadcrumbs!), good write performance
> >       - *Cons* - Harder to display descendent's structure
> >    - *Nested Sets (MPTT)* - store lft and rght (similar to current
> >    implementation)
> >       - *Pros* - Good for counting descendent's, better recursive read
> >       performance
> >       - *Cons* - Writing a single node can touch entire tree, not easy to
> >       retrieve immediate ancestor/decendent
> >    - *Closure Table* - extra table to store all relations (ancestor_id,
> >    descendent_ic)
> >       - *Pros* - Good for retrieving all ancestors/descendents
> >       - *Cons* - Potentially very large table, more DB load on
> >       inserts/deletes
> >       - Can add "length" column to closure table to track depth from
> >       ancestor, improves read performance for retirieving direct
> >       ancestor/descendent's
>
> > The current implementation combines Agency List with Nested Set to to
> > benefit from the Pro's of both, but also suffers from main con of Nested
> > Set's (may have to touch large number of nodes on any insert/delete)
> > although the tree separation in this pull request was aimed at mitigating
> > this.
>
> Also, NoSQL type data storage opens up other potential structures?
>
> Overall, alternative structures would allow developers to make more
> informed decisions of how to tackle storing hierarchical data, In my
> opinion I think that at least adding path enumeration would be a valuable
> addition.
>
> I would like to gauge response and any thoughts/comments, thanks.

I'm going to take a look (hopefully this weekend) at writing the core
implementation (parent field only), defining a cleaner api, and how to
permit multiple engine/implementations.

I have some vague ideas how to do that - if you're willing to think
about the problem your time is unlikely to be wasted. A branch on
cakephp/cakephp will appear when I get around to doing that if you
want to see what I come up with.

This would be something for 3.0, as otherwise we're restricted to
keeping within the existing api, which would likely mean many of the
existing sore-points with the tree behavior have to remain as they
are.

Cheers,

AD
Reply all
Reply to author
Forward
0 new messages