Items tree in Django

Fabio Natali

unread,

Nov 17, 2008, 5:46:30 AM11/17/08

to django...@googlegroups.com

Hi everybody out there! :-)

I have to cope with a few hundreds item tree/hierarchy in my Django
project. Specifically, I'll have 500 hundreds leaves in a 3-levels
tree with a dozen branches.

I won't need much writing over this tree, I'll just need to read it
and show data in a drop down menu.

I've been told to use django-treebeard or django-mptt to speed up
things, but I wrote down a first draft on my own:

class Node(models.Model):
name = models.CharField(max_length=50)
parent = models.ForeignKey('self')

class Product(models.Model):
name = models.CharField(max_length=50)
parent = models.ForeignKey(Node)
[some more info here]

The point is, how can I create the root of my tree? Should I add some
"blank=True, null=True" properties to my Node model? So to have:

class Node(models.Model):
name = models.CharField(max_length=50)
parent = models.ForeignKey('self', blank=True, null=True)

And then consider the "null-parented" node as the root one?

Does my code make any sense? Do you think it will turn out as too slow
for my hierarchy? Should I better rely on django-treebeard or
django-mptt?

Any help appreciated, thank you so much!

Regards,

--
Fabio Natali

bruno desthuilliers

unread,

Nov 17, 2008, 6:01:35 AM11/17/08

to Django users

On 17 nov, 11:46, Fabio Natali <nat...@poisson.phc.unipi.it> wrote:
> Hi everybody out there! :-)
>
> I have to cope with a few hundreds item tree/hierarchy in my Django
> project. Specifically, I'll have 500 hundreds leaves in a 3-levels
> tree with a dozen branches.
>
> I won't need much writing over this tree, I'll just need to read it
> and show data in a drop down menu.
>
> I've been told to use django-treebeard

That's what I was about to suggest !-)

> or django-mptt to speed up
> things, but I wrote down a first draft on my own:
>
> class Node(models.Model):
> name = models.CharField(max_length=50)
> parent = models.ForeignKey('self')
>
> class Product(models.Model):
> name = models.CharField(max_length=50)
> parent = models.ForeignKey(Node)
> [some more info here]
>
> The point is, how can I create the root of my tree? Should I add some
> "blank=True, null=True" properties to my Node model? So to have:
>
> class Node(models.Model):
> name = models.CharField(max_length=50)
> parent = models.ForeignKey('self', blank=True, null=True)
>
> And then consider the "null-parented" node as the root one?

That's the canonical solution when using the adjacency list pattern,
yes.

> Does my code make any sense? Do you think it will turn out as too slow
> for my hierarchy?

If you are really confident that your hierarchy will never grow deeper
than three levels, the above solution might be good enough. But it's
still less efficient than the materialized path or the nested sets
patterns when it comes to reading a whole branch.

> Should I better rely on django-treebeard or
> django-mptt?

Well... I'm not sure there's any clearcut answer here. If you have
enough time, you may want to do a couple benchmarks on representative
data and use cases.

My 2 cents...

Fabio Natali

unread,

Nov 17, 2008, 6:24:37 AM11/17/08

to django...@googlegroups.com

bruno desthuilliers wrote:
[...]

> > I've been told to use django-treebeard
>
> That's what I was about to suggest !-)

> If you are really confident that your hierarchy will never grow deeper

> than three levels, the above solution might be good enough. But it's
> still less efficient than the materialized path or the nested sets
> patterns when it comes to reading a whole branch.

Dear Bruno, thank you so much for your fast and detailed reply.

I am glad to hear that the naive way could possibly fit my 3-levels
tree, or at least that it deserves a try. I'll have some benchmarks as
you suggest.

All the best, Fabio.

--
Fabio Natali

Malcolm Tredinnick

unread,

Nov 17, 2008, 6:27:33 AM11/17/08

to django...@googlegroups.com

On Mon, 2008-11-17 at 11:46 +0100, Fabio Natali wrote:
[...]

> The point is, how can I create the root of my tree? Should I add some
> "blank=True, null=True" properties to my Node model? So to have:
>
> class Node(models.Model):
> name = models.CharField(max_length=50)
> parent = models.ForeignKey('self', blank=True, null=True)
>
> And then consider the "null-parented" node as the root one?

That's the usual method.

>
> Does my code make any sense? Do you think it will turn out as too slow
> for my hierarchy? Should I better rely on django-treebeard or
> django-mptt?

500 leaves is really nothing for the database. If you want to do
anything with that size tree, you can easily just use a simple nested
hierarchy like this, pull all the rows back into Python and work with
them there. Particularly since you only have three levels in the tree:
queries on name or parent__name or parent__parent__name are as bad as
it's going to get (thereby meaning you can often filter to just the
branches you want).

Things like the modified pre-order tree traversal data structures just
speed up the retrieval so that you don't have to retrieve all the rows
and become useful for large trees, or trees with deep (and unknown)
levels of nesting. So start with something simple, by all means. It's
fun, it's a learning experience, it will probably work very well in
practice. And if you later decide to switch to one of the other
packages, it won't take a lot of changes to move things to the new model
(it will take some changes, but it's not really that difficult).

Regards,
Malcolm

Gustavo Picón

unread,

Nov 17, 2008, 2:07:04 PM11/17/08

to Django users

Disclaimer: I'm the author of django-treebeard

> I am glad to hear that the naive way could possibly fit my 3-levels
> tree, or at least that it deserves a try. I'll have some benchmarks as
> you suggest.

You can look at the treebeard benchmarks in
http://django-treebeard.googlecode.com/svn/docs/index.html#module-tbbench

I'll take as an example the results of reading branches in postgresql
8.3 in unordered trees:

- Nested sets: 5840ms
- Materialized Path: 7132ms
- Adjacency List: 50682ms

So as you can see, your approach (adjacency list) is by far the
slowest.

The only advantage of the adjacency list model is for trees that are
mostly write-only, look at the benchmark results for insertions in an
ordered tree with transactions (the optimal use for AL trees):

- Nested sets: 10941ms
- Materialized path: 3942ms
- Adjacency List: 896ms

So the usual recommendation is:

- if you're going to insert a lot more than you read, use adjacency
list
- if, as is the most common case, you're going to read your tree more
than you insert nodes, use nested sets or materialized path

-tabo

Fabio Natali

unread,

Nov 17, 2008, 4:30:14 PM11/17/08

to django...@googlegroups.com

Hi Malcom! And thank you very much for your kind reply.

Malcolm Tredinnick wrote:
[...]

> 500 leaves is really nothing for the database. If you want to do
> anything with that size tree, you can easily just use a simple nested
> hierarchy like this, pull all the rows back into Python and work with
> them there. Particularly since you only have three levels in the tree:
> queries on name or parent__name or parent__parent__name are as bad as
> it's going to get (thereby meaning you can often filter to just the
> branches you want).

Sorry Malcom, but I am not a native and I am not 100% sure I got your
line. When you say "a simple nested hierarchy like this", do you refer
to my naive adjacency list implementation?

> Regards,
> Malcolm

Thanks and regards,

--
Fabio Natali

Fabio Natali

unread,

Nov 17, 2008, 4:43:09 PM11/17/08

to django...@googlegroups.com

Gustavo Picón wrote:
[...]

> Disclaimer: I'm the author of django-treebeard

Hi Gustavo, thanks for your reply and for your django-treebeard, which
I'll surely use very soon!

> You can look at the treebeard benchmarks in
> http://django-treebeard.googlecode.com/svn/docs/index.html#module-tbbench

I see that my adjacency list is by far the worst bet as a matter of
speed (I'll have to read from the tree most of the time). My only hope
is that this speed overhead won't be significant for my small tree and
that I can rely on my own naive code for this small project.

> -tabo

Thanks and regards, Fabio.

--
Fabio Natali

Malcolm Tredinnick

unread,

Nov 18, 2008, 7:06:47 PM11/18/08

to django...@googlegroups.com

Sorry. Yes, that's exactly what I meant. When I wrote "simple" I meant
"without any extra information to determine the position". After all,
all the extra information in other ways of representing trees is just
redundant information to speed up certain operations (often at the cost
of other operations).

Regards,
Malcolm

Malcolm Tredinnick

unread,

Nov 18, 2008, 7:16:35 PM11/18/08

to django...@googlegroups.com

On Mon, 2008-11-17 at 11:07 -0800, Gustavo Picón wrote:
[...]

> So the usual recommendation is:
>
> - if you're going to insert a lot more than you read, use adjacency
> list
> - if, as is the most common case, you're going to read your tree more
> than you insert nodes, use nested sets or materialized path

Size matters, though. Your numbers are for a lot more than 500 nodes and
I'll repeat my earlier claim that for something that small you just
aren't going to lose a lot of performance (in fact, depending on what
you're doing with the data, you'll often gain speed) by doing a naive
select of almost everything (or everything restricted by parent name,
say) and then just working Python. You cut out a bunch of database level
comparisons at that point.

Your benchmarks match what other people see with those various
implementation strategies, so they look believable (and the comparisons
between the various database engines look believable, too), so I think
they're good measures, but at various ends of the scale (and, let's be
serious here: 500 is small), alternative strategies work very well.

Don't misunderstand me: I think treebeard is an interesting project and
having multiple implementations is a good idea both for drop-in
practicality and as reference code. I think it's great what you've done.
Trees are a fascinating storage structure and there are lots of
trade-offs involved in code complexity and performance (insert vs.
different retrieval patterns vs number of nodes, for example), so
anybody picking just one implementation is going to be disappointed at
some point.

(By the way, are those benchmark numbers averages for each run over the
1000 nodes, or totals for the "several times" that you did the
retrievals?)

Regards,
Malcolm

Gustavo Picón

unread,

Nov 18, 2008, 11:08:41 PM11/18/08

to Django users

On Nov 18, 7:16 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

> On Mon, 2008-11-17 at 11:07 -0800, Gustavo Picón wrote:
>
> [...]
>
> > So the usual recommendation is:
>
> > - if you're going to insert a lot more than you read, use adjacency
> > list
> > - if, as is the most common case, you're going to read your tree more
> > than you insert nodes, use nested sets or materialized path
>
> Size matters, though. Your numbers are for a lot more than 500 nodes and
> I'll repeat my earlier claim that for something that small you just
> aren't going to lose a lot of performance (in fact, depending on what
> you're doing with the data, you'll often gain speed) by doing a naive
> select of almost everything (or everything restricted by parent name,
> say) and then just working Python. You cut out a bunch of database level
> comparisons at that point.

That's debatable. For instance, for such a small table, both nested
sets and
materialized path trees will return sets ordered by an index that is
so
small that will fit completely in RAM. Databases are VERY good at
this, and
both nested sets and materialized paths take advantage of this fact.
There is no need for database level comparisons.

And I agree that you could order your trees in python, but you'll
waste
resources. Granted, not much for 500 records, but if you use this
approach
in a site with traffic you will be dependant on caching much earlier
than
would be really needed if you did things _right_ (just look at the
resource
hog wordpress is right now for design decisions like this).

A good suggestion would be, as usual: MEASURE, know your problem and
set
goals. That's why I published the results of these benchmarks (and the
benchmark sources themselves): to help people decide.

> Your benchmarks match what other people see with those various
> implementation strategies, so they look believable (and the comparisons
> between the various database engines look believable, too), so I think
> they're good measures, but at various ends of the scale (and, let's be
> serious here: 500 is small), alternative strategies work very well.

The comparison between the database engines would be better if I could
make mysql run entirely on RAM, like I did with postgres82/83 and
sqlite.
Any suggestion on how to do this is very welcome.

> Don't misunderstand me: I think treebeard is an interesting project and
> having multiple implementations is a good idea both for drop-in
> practicality and as reference code. I think it's great what you've done.
> Trees are a fascinating storage structure and there are lots of
> trade-offs involved in code complexity and performance (insert vs.
> different retrieval patterns vs number of nodes, for example), so
> anybody picking just one implementation is going to be disappointed at
> some point.

Agreed, and that's kinda why treebeard exists: I needed a materialized
path
tree, but the only tree library available for django was MPTT (that
uses
nested sets). So I wrote mine. Later I added AL and NS trees.

I made some decisions while writing these backends, there are useful
optimizations that I just didn't apply because each one has a
drawback,
for instance:

- Adding a `level` field to AL nodes, would speed up some reads, but
would make writes slower, and writes are the ONLY advantage of
using
AL trees.
- Adding holes between nodes (spread) in a nested sets tree. This
makes
writes/deletes faster (these are _very_ expensive in NS trees), but
then
some read operations would be slower.

So I decided that I would just write a classic backend for these, and
in the
future write the optimizations as subclasses from the original AL/NS
trees.

> (By the way, are those benchmark numbers averages for each run over the
> 1000 nodes, or totals for the "several times" that you did the
> retrievals?)

It's the time taken to retrieve every node in the tree (1000 nodes) 10
times.

Regards.

- tabo

Steve Holden

unread,

Nov 19, 2008, 9:26:18 PM11/19/08

to django...@googlegroups.com

Malcolm Tredinnick wrote:
>
> On Mon, 2008-11-17 at 11:07 -0800, Gustavo Picón wrote:
> [...]
>> So the usual recommendation is:
>>
>> - if you're going to insert a lot more than you read, use adjacency
>> list
>> - if, as is the most common case, you're going to read your tree more
>> than you insert nodes, use nested sets or materialized path
>
> Size matters, though. Your numbers are for a lot more than 500 nodes and
> I'll repeat my earlier claim that for something that small you just
> aren't going to lose a lot of performance (in fact, depending on what
> you're doing with the data, you'll often gain speed) by doing a naive
> select of almost everything (or everything restricted by parent name,
> say) and then just working Python. You cut out a bunch of database level
> comparisons at that point.
>

Sure, but ultimately the question then becomes "what data growth do you
expect", since at some stage you can expect a DBMS-based solution to be
required if volumes grow to the extent that a pure Python solution
starts to seriously overload memory. This is not an attempt to negate
your pragmatic approach, since I find pragmatism very Pythonic.

> Your benchmarks match what other people see with those various
> implementation strategies, so they look believable (and the comparisons
> between the various database engines look believable, too), so I think
> they're good measures, but at various ends of the scale (and, let's be
> serious here: 500 is small), alternative strategies work very well.
>

In RDBMS terms 500 is *tiny*, and handling the complexity in Python is a
very viable strategy at that level unless the rows are 100kB apiece.

> [...]

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Reply all

Reply to author

Forward