Any advice on a potential rejig of my application?

33 views
Skip to first unread message

Joops

unread,
Jun 21, 2011, 3:19:30 AM6/21/11
to Google App Engine
Hi all,

I am facing a massive rewrite, so I wanted to get some validation
before I strap on my lucky boots and go kick the crap out of my code.
(by which I mean refractor it after some serious pondering)

My program stores data in a tree structure, a user could have several
trees. Currently I have an entry in the datastore for each node. I can
walk up and down the tree, fetching nodes and I can do everything I
want to and all is fine.

Except that it hits the datastore loads.

I am considering re-plumbing it so that each whole tree is a data
store entry.
(unless it was over a certain size, in which case I would break a
nodes children out to a new data store entry)

So if a user needed the data for a specific node, then its appropriate
tree would be loaded from memcache or the datasore, and then the data
would be returned to the user.

Just for background - I am a basic level python user. (only use it in
my free time).

Ok… my questions…

- Does this seem reasonable to you? (My original architecture was very
stuck in how I would have managed the data in a relational database)

- Any caveats or things I should be aware of? (binary or JSON
serialisation performance, that kind of thing)

- Any thoughts on how I should address nodes? (Currently they have a
convenient ID which my client side code can use to identify which node
to do something to)
- I will most likely use guids, but they seem a bit heavy

Thanks very much

J

Robert Kluin

unread,
Jun 26, 2011, 3:31:07 AM6/26/11
to google-a...@googlegroups.com
Hey John,
I've used and seen a few ways to deal with trees depending on what
is actually in them.

I have used relational type models, where each node in the tree
points to its parent. You can make this queryable in some way by
storing a list of ancestors or descendants. Or, you can do some
denormalization. I've also used the each-node-is-an-entity method in
conjunction with a meta-data-entity that stores the actual structure
as something like a serialized nested dict. There are other subtle
variations on each of these themes. The right one probably mostly
depends on how you will write and *especially* on how you will need to
query and fetch the data. Pick the technique that makes fetching /
traversing / querying the tree easiest for your app.

If each node is light-weight, like a name, then storing the entire
tree serialized works well too (I frequently do this for meta-data
used in computations). Just be aware it makes querying a bit more
difficult. If the nodes are a little heavier, but still pretty small,
the idea of storing the tree until hit hits some critical size and
splitting it is cool. If you expect most trees will fit in a small
number of entities (like 1), this could be a very nice solution since
a single fetch by key gets the whole tree. If you've got a good way
to do this for your app, it might be worth looking in to.

The lucky boots + kick the crap of the code warranted some
responses; disappointed with the rest of the group community....


Robert

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

Tim Hoffman

unread,
Jun 26, 2011, 4:23:30 AM6/26/11
to google-a...@googlegroups.com
I would suggest the approach is dependent on the most likely access pattern.

If you only use parts of the tree at a time (For instance url traversal of the tree)  then I would keep the nodes as seperate entities.

I have done this for a simple CMS.  All children have a reference property pointing to the parent.  And a list property describing the path elements (names).
In addition each parent holds an ordered list of names (children) and keys of the children.

This way you can retrieve all immediate children in a single db.get.  You can get children of arbitrary depths by querying based on the path.

Aggressive memcache use then means any re-use of parts of the tree on subsequant requests hit cache pretty well.

If one the other hand your using whole trees then ignore what I just said ;-)

Have fun.

T

Joops

unread,
Jun 26, 2011, 2:48:35 PM6/26/11
to Google App Engine
Cheers guys,

I have had quite a fun time over the course of this project trying to
optimise the tree for various different things at different times.
(It's a notes app, and people store the notes in a tree structure).

I have found a real tension between making it easy to get tasks (by
storing various child and ancestor lists) and making it cheap to keep
everything up to date.

It's going to be quite write heavy, so I want to make writing and
updating notes as cheap as possible.

As Robert mentioned, if I can get the tree into a single entity, it is
going to make writing and reading as cheap as possible.

Hopefully most people will have a lot less than 1MB of notes.
(not writing an evernote replacement!)

Thanks again both!

J.
Reply all
Reply to author
Forward
0 new messages