I'm looking for pointers as to how I could implement the following concept:
Say I have a tree of objects (a few hundred of them, 10 levels deep for
example), that I want to keep a 'history' of. Changes to some object that
cause an object property to change, or cause the tree structure to change,
will create a new generation of the object tree.
The idea is that a few current generations of the tree are available for
retrieval, essentially snapshots of past tree configurations. Generations
that are no longer needed can be freed/collected/recycled.
I want to avoid having to just dumb copy the entire tree to a new
generation, each time a new generation is required due to some object
change. Instead, I've been pondering COW schemes and such, but I don't see
how I can implement this in a reasonably speed- and memory efficient
fashion..
Does this problem sound familiar to anyone? Ideas or references are
welcome :)
Cheers,
Rob
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Here's a partial prototype of how to do PCOW balanced binary trees
in Java. View the entire thread starting with this posting.
http://groups.google.com/group/comp.programming.threads/msg/6c0775e9882516a2?hl=en&
I don't know how efficient it is. You'd need to modify it to do versioning.
The linearization points, where you do the atomic store of the new subtree,
is where you have to select which version you want. So the atomic store
would be more of a push since you'd want to get to earlier versions.
Most of the generational schemes use pointer with a hidden extra level
of indirection. They're easier to use at the application level since
you don't have copy nodes so much. And you don't have to do anything
programatically to get the different versions. Just set the version of
all memory you want. Every in that version of memory will be at the
level you want. Transactional memory might work. Just start transactions
for each version you want to save and don't end the transaction. The
transactions might be associated with threads so you might need to have
a bush of threads laying about.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
This sounds like a problem that comes up in version control systems.
Subversion (http://subversion.tigris.org/), for example, has some
documentation on their data structures, and how they've solved it. I
would assume some of the other open source version control software
does (git might have a very different model, for example), but I'm not
familiar with that documentation.
Michael
Is this tree something that can be exported to file/memory in it's
entirety? If so you should try to exploit that infrastructure. If you
haven't already, you should create your tree so that each node is aware
of its parent and can call a recursive function that moves to the base
of the tree. This way, the node which changes stores the changed info
in some structure and backs out of the hierarchy one parent at a time,
adding the identity of that parent each step of the way. This gives
you a precise map from the base to the node in question, and once there
you have the full detail of the change.
I've been working on something related, however it's intended for
parsing/exporting hierchical data from/to files. Although it's
open-source and might work for this it likely isn't complete,
solidified, or documented enough at this point, however I can tell you
how I've done certain things if you are interested.
ta0kira
Rob Kramer schrieb:
> Hi all,
>
> I'm looking for pointers as to how I could implement the following concept:
>
> Say I have a tree of objects (a few hundred of them, 10 levels deep for
> example), that I want to keep a 'history' of. Changes to some object that
> cause an object property to change, or cause the tree structure to change,
> will create a new generation of the object tree.
>
> The idea is that a few current generations of the tree are available for
> retrieval, essentially snapshots of past tree configurations. Generations
> that are no longer needed can be freed/collected/recycled.
>
> I want to avoid having to just dumb copy the entire tree to a new
> generation, each time a new generation is required due to some object
> change. Instead, I've been pondering COW schemes and such, but I don't see
> how I can implement this in a reasonably speed- and memory efficient
> fashion..
>
> Does this problem sound familiar to anyone? Ideas or references are
> welcome :)
>
> Cheers,
>
> Rob
>
>
Hi Rob,
what you are describing is a well-known technique in database
technology, where it is called multiversion concurrency control. As the
name indicates, the focus here is to provide concurrent access to data,
and several versions of a dataset are allowed to exist at the same time
to reduce/avoid locking.
If you do not need concurrent access, but simply want several versions
around e.g. for backup purposes, the following datastructure should do
the trick: all generations of objects live in the same tree. Every
object carries two timestamps, one for the beginning of its lifetime,
one for the end. Modifying an object means adding the new version to the
tree and setting the end-of-life timestamp of the original version.
Every access to the tree is associated with a timestamp and must choose
the version that was alive at this timestamp. Dead objects can be
deleted from the tree after some time.
In this implementation, the tree can be easily constructed from
available containers, e.g. std::multimap, but reading and writing
requires an additional argument (the timestamp) and will certainly be an
interesting design project :-)
However, even if the basic idea is simple, things get much more
complicated if you have concurrent operations that use more than one
object at the same time.
cheers,
Rupert