Performance on high activity trees

183 views
Skip to first unread message

TheBeardedTemplar

unread,
Sep 1, 2016, 5:20:57 PM9/1/16
to django-mptt-dev
Hey all,

We recently implemented Django MPTT on our production app - our tree has about 140,000 nodes with 400-500 new ones created each day (usually in bursts where 100-200 will be created in a few minutes). There are also lookups every few seconds where a few thousand nodes will be fetched. As soon as the server started getting significant load our database essentially melted. The db server spiked to 100% usage for several hours (this particular server had never been above 40% usage in the past). I'm curious if anyone is using django-mptt on similarly active trees and if so, what should I have done to prevent this (besides the obvious "test under load before deploying"). I found a blog post suggesting that heavy usage be done via raw SQL with later tree rebalancing, but the rebalance under load took more than half an hour and I imagine the user wouldn't be able to access those new nodes until it was done. Here's a log snippet (we're using postgreSQL):

2016-09-01 08:08:51 MDT [32744-1] user@company ERROR:  deadlock detected
2016-09-01 08:08:51 MDT [32744-2] user@company DETAIL:  Process 32744 waits for ShareLock on transaction 145224041; blocked by process 437.
        Process 437 waits for ShareLock on transaction 145223847; blocked by process 32352.
        Process 32352 waits for ExclusiveLock on tuple (4772,114) of relation 701507 of database 701506; blocked by process 335.
        Process 335 waits for ShareLock on transaction 145223980; blocked by process 337.
        Process 337 waits for ShareLock on transaction 145223940; blocked by process 32744.
        Process 32744:
                    UPDATE "nodes_node"
                    SET "lft" = CASE
                            WHEN "lft" > 10379
                                THEN "lft" + 2
                            ELSE "lft" END,
                        "rght" = CASE
                            WHEN "rght" > 10379
                                THEN "rght" + 2
                            ELSE "rght" END
                    WHERE "tree_id" = 1
                      AND ("lft" > 10379 OR "rght" > 10379)
        Process 437:
                    UPDATE "nodes_node"
                    SET "lft" = CASE
                            WHEN "lft" > 10387
                                THEN "lft" + 2
                            ELSE "lft" END,
                        "rght" = CASE
                            WHEN "rght" > 10387
                                THEN "rght" + 2
                            ELSE "rght" END
                    WHERE "tree_id" = 1
                      AND ("lft" > 10387 OR "rght" > 10387)
        Process 32352:
                    UPDATE "nodes_node"
                    SET "lft" = CASE
                            WHEN "lft" > 10375
                                THEN "lft" + 2
                            ELSE "lft" END,
                        "rght" = CASE
                            WHEN "rght" > 10375
                                THEN "rght" + 2
                            ELSE "rght" END
                    WHERE "tree_id" = 1
                      AND ("lft" > 10375 OR "rght" > 10375)
     Process 335:
                    UPDATE "nodes_node"
                    SET "lft" = CASE
                            WHEN "lft" > 10385
                                THEN "lft" + 2
                            ELSE "lft" END,
                        "rght" = CASE
                            WHEN "rght" > 10385
                                THEN "rght" + 2
                            ELSE "rght" END
                    WHERE "tree_id" = 1
                      AND ("lft" > 10385 OR "rght" > 10385)
    Process 337:
                    UPDATE "nodes_node"
                    SET "lft" = CASE
                            WHEN "lft" > 161906
                                THEN "lft" + 2
                            ELSE "lft" END,
                        "rght" = CASE
                            WHEN "rght" > 161906
                                THEN "rght" + 2
                            ELSE "rght" END
                    WHERE "tree_id" = 1
                      AND ("lft" > 161906 OR "rght" > 161906)

Thanks!

Matthias Kestenholz

unread,
Sep 1, 2016, 9:00:22 PM9/1/16
to django-...@googlegroups.com
Hey,

On Thu, Sep 1, 2016 at 11:20 PM TheBeardedTemplar <thebeard...@gmail.com> wrote:
Hey all,

We recently implemented Django MPTT on our production app - our tree has about 140,000 nodes with 400-500 new ones created each day (usually in bursts where 100-200 will be created in a few minutes). There are also lookups every few seconds where a few thousand nodes will be fetched. As soon as the server started getting significant load our database essentially melted. The db server spiked to 100% usage for several hours (this particular server had never been above 40% usage in the past). I'm curious if anyone is using django-mptt on similarly active trees and if so, what should I have done to prevent this (besides the obvious "test under load before deploying"). I found a blog post suggesting that heavy usage be done via raw SQL with later tree rebalancing, but the rebalance under load took more than half an hour and I imagine the user wouldn't be able to access those new nodes until it was done. Here's a log snippet (we're using postgreSQL):

I'm not sure whether using mptt for this use case is a good idea at all. It seems to me that you're writing much too much for mptt and would probably be better served using adjacency lists and recursive CTEs or something. I like this post by David Cramer (of Disqus and Sentry):
mptt is somewhat SQL database agnostic, and I think with the size of your tree and the load you're seeing you should think hard about using more database-specific functionality, especially since you're using PostgreSQL :-)

(Rebuilding a MPTT tree later does not change the fact that PostgreSQL has to write many many rows again because a really big percentage of the tree's lft/rght/tree_id attributes will have to change. I really like mptt, but your use case is probably not a good fit for it. Maybe talk to pgexperts as well? I only hear excellent things about them — Disclaimer: no affiliation at all.)

Best regards,
Matthias
 
--

TheBeardedTemplar

unread,
Sep 2, 2016, 9:52:55 AM9/2/16
to django-mptt-dev
Hi Matthias,

I really appreciate the help. I was originally looking into recursive PostgreSQL queries but settled on MPTT for the nice Django integration - it seems like it might be worth revisiting that approach. And thank you for the post - I'm looking forward to reading it.

-Keilan
Reply all
Reply to author
Forward
0 new messages