I have to cope with a few hundreds item tree/hierarchy in my Django
project. Specifically, I'll have 500 hundreds leaves in a 3-levels
tree with a dozen branches.
I won't need much writing over this tree, I'll just need to read it
and show data in a drop down menu.
I've been told to use django-treebeard or django-mptt to speed up
things, but I wrote down a first draft on my own:
class Node(models.Model):
name = models.CharField(max_length=50)
parent = models.ForeignKey('self')
class Product(models.Model):
name = models.CharField(max_length=50)
parent = models.ForeignKey(Node)
[some more info here]
The point is, how can I create the root of my tree? Should I add some
"blank=True, null=True" properties to my Node model? So to have:
class Node(models.Model):
name = models.CharField(max_length=50)
parent = models.ForeignKey('self', blank=True, null=True)
And then consider the "null-parented" node as the root one?
Does my code make any sense? Do you think it will turn out as too slow
for my hierarchy? Should I better rely on django-treebeard or
django-mptt?
Any help appreciated, thank you so much!
Regards,
--
Fabio Natali
> If you are really confident that your hierarchy will never grow deeper
> than three levels, the above solution might be good enough. But it's
> still less efficient than the materialized path or the nested sets
> patterns when it comes to reading a whole branch.
Dear Bruno, thank you so much for your fast and detailed reply.
I am glad to hear that the naive way could possibly fit my 3-levels
tree, or at least that it deserves a try. I'll have some benchmarks as
you suggest.
All the best, Fabio.
--
Fabio Natali
That's the usual method.
>
> Does my code make any sense? Do you think it will turn out as too slow
> for my hierarchy? Should I better rely on django-treebeard or
> django-mptt?
500 leaves is really nothing for the database. If you want to do
anything with that size tree, you can easily just use a simple nested
hierarchy like this, pull all the rows back into Python and work with
them there. Particularly since you only have three levels in the tree:
queries on name or parent__name or parent__parent__name are as bad as
it's going to get (thereby meaning you can often filter to just the
branches you want).
Things like the modified pre-order tree traversal data structures just
speed up the retrieval so that you don't have to retrieve all the rows
and become useful for large trees, or trees with deep (and unknown)
levels of nesting. So start with something simple, by all means. It's
fun, it's a learning experience, it will probably work very well in
practice. And if you later decide to switch to one of the other
packages, it won't take a lot of changes to move things to the new model
(it will take some changes, but it's not really that difficult).
Regards,
Malcolm
Malcolm Tredinnick wrote:
[...]
> 500 leaves is really nothing for the database. If you want to do
> anything with that size tree, you can easily just use a simple nested
> hierarchy like this, pull all the rows back into Python and work with
> them there. Particularly since you only have three levels in the tree:
> queries on name or parent__name or parent__parent__name are as bad as
> it's going to get (thereby meaning you can often filter to just the
> branches you want).
Sorry Malcom, but I am not a native and I am not 100% sure I got your
line. When you say "a simple nested hierarchy like this", do you refer
to my naive adjacency list implementation?
> Regards,
> Malcolm
Thanks and regards,
--
Fabio Natali
Hi Gustavo, thanks for your reply and for your django-treebeard, which
I'll surely use very soon!
> You can look at the treebeard benchmarks in
> http://django-treebeard.googlecode.com/svn/docs/index.html#module-tbbench
I see that my adjacency list is by far the worst bet as a matter of
speed (I'll have to read from the tree most of the time). My only hope
is that this speed overhead won't be significant for my small tree and
that I can rely on my own naive code for this small project.
> -tabo
Thanks and regards, Fabio.
--
Fabio Natali
Sorry. Yes, that's exactly what I meant. When I wrote "simple" I meant
"without any extra information to determine the position". After all,
all the extra information in other ways of representing trees is just
redundant information to speed up certain operations (often at the cost
of other operations).
Regards,
Malcolm
Size matters, though. Your numbers are for a lot more than 500 nodes and
I'll repeat my earlier claim that for something that small you just
aren't going to lose a lot of performance (in fact, depending on what
you're doing with the data, you'll often gain speed) by doing a naive
select of almost everything (or everything restricted by parent name,
say) and then just working Python. You cut out a bunch of database level
comparisons at that point.
Your benchmarks match what other people see with those various
implementation strategies, so they look believable (and the comparisons
between the various database engines look believable, too), so I think
they're good measures, but at various ends of the scale (and, let's be
serious here: 500 is small), alternative strategies work very well.
Don't misunderstand me: I think treebeard is an interesting project and
having multiple implementations is a good idea both for drop-in
practicality and as reference code. I think it's great what you've done.
Trees are a fascinating storage structure and there are lots of
trade-offs involved in code complexity and performance (insert vs.
different retrieval patterns vs number of nodes, for example), so
anybody picking just one implementation is going to be disappointed at
some point.
(By the way, are those benchmark numbers averages for each run over the
1000 nodes, or totals for the "several times" that you did the
retrievals?)
Regards,
Malcolm
> Your benchmarks match what other people see with those various
> implementation strategies, so they look believable (and the comparisons
> between the various database engines look believable, too), so I think
> they're good measures, but at various ends of the scale (and, let's be
> serious here: 500 is small), alternative strategies work very well.
>
In RDBMS terms 500 is *tiny*, and handling the complexity in Python is a
very viable strategy at that level unless the rows are 100kB apiece.
> [...]
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/