I'm trying to implement a tree structure, hence I'd like the Nested
Set model, but I need to be able to record the history of the tree
structure *somehow* so that I can access previous versions.
I've searched up and down online but all I can find are transactions
+rollbacks.
As far as I can see...
The simplest option would be to add a version number to the model, but
then every member of the set would have to be updated each time a
change is made - not feasible because of how fast that would make the
table would grow, and how slow that would make it to update?
I imagine I could log each action and then replay all the log actions
in a temporary table when I needed them, but that would get pretty
slow after a few updates.
The previous Nested Set could be serialised to an blob and stored.
That's currently looking like the best option, but only by process of
elimination...
I've probably missed something obvious that's staring me in the face
here! - any suggestions?
Thanks for taking the time to read this far, best regards,
Ed
You can find the solution at www.dbdesign11.com – see states of
relationships.
Vladimir Odrljin
When you make even small change to a tree (e.g. insert a node), you
have to recalculate half of the node encodings on average. This
feature alone invalidates nested sets approach for temporal
hierarchies. Google nested intervals.
Aha! Thank you very much to you and Vladimir!
Now I just need to sit in a darkened room with a lot of coffee while I
get my head round it all...
Mary Smith is a boss of 100 employees, and one employee is Jane
Jones. Try to solve the following using my solution:
1. Mary Smith changed her last name (she got divorced)
Jane Jones changed her last names (she got married)
Generally, every attribute of an entity can be changed at any
time.
2. One week Jane Jones works the morning shift. From 8-12 she works
with
customers and her boss is Mary Smith. From 12- 4pm Jane does data
entry
and her boss is Jim.
Every second week she works the afternoon shift and also has two
different bosses.
(Note that Jane can have in different periods some new bosses)
3. Jane can enter false data, to get a better salary. Or generally,
solve the problem of any
kind of crime, false data or tries to broke the DB solution.
Now try to apply my solution so that it keeps all information from1.,
2. and 3.
Note that Mary Smith --- Jane Jones is just one line in this hierarchy
and this example is a simple hierarchy, but this is a kind of the real
world example. Hope this example will help.
Vladimir Odrljin
In this model, the structure is kept in a table like this: Tree (lft,
rgt, node) where the node is a reference to the Nodes table and (lft,
rgt) is an integer pair. It is very small, and if you leave an
appropriate fill factor on the data pages, updates are very fast.
Think about how much fits on a 5K data page, with three 4-byte
integers. .
You could do snapshots of the whole tree with a timestamp. Or add a
timestamp column and keep it all in one table. This will let you do a
COMPLETE re-structuring every time -- probably not likely in the real
world.
If the table is growing, then we can put wide gaps in the (lft, rgt)
and keep adding nodes to the tree as long as we can preserve the
"between-ness" property of new nodes. This means you have to give up
the algebraic property (i.e sub-tree size rooted at this node = (rgt -
lft +1) /2 for all nodes); but do you need it?
My experience with applications that use Nested Set model, you handle
~1,000 nodes per thread in a newsgroup forum where trees grow in a
predictable fashion (i.e. add leaf nodes to an existing structure).
Talk to me off-line.