Scaling web2py

322 views
Skip to first unread message

Bruce Wade

unread,
Apr 7, 2012, 9:59:20 AM4/7/12
to web...@googlegroups.com
Hi,

So now that my site has been developed with web2py I am now looking to release it this month. However also this month we were hit by a sudden increase of members 10,650 new in the last 2 weeks which doesn't look to be slowing down. We are getting around 100,000 - 800,000 ad views a day.

Is this approach still the best option for scaling? http://www.web2pyslices.com/slice/show/1360/high-availability-cluster-with-pound We currently have 6 servers so this may work well. But is there some more detailed information on this approach where it has been proven to handle the load?

--
--
Regards,
Bruce Wade
http://ca.linkedin.com/in/brucelwade
http://www.wadecybertech.com

Massimo Di Pierro

unread,
Apr 7, 2012, 10:26:45 AM4/7/12
to web...@googlegroups.com
Hello Bruce,

The bottle neck is always the database. 1M reqs/day is 7 reqs/minute. It is not too much but make sure:

static files are not served by web2py
you cache as much as possible

If you need a code review to spot possible problems, let me know.

Massimo

Anthony

unread,
Apr 7, 2012, 10:33:38 AM4/7/12
to web...@googlegroups.com
1M reqs/day is 7 reqs/minute.

Now we know how Massimo gets so much done -- his days are 2400 hours long. ;-) 

Bruce Wade

unread,
Apr 7, 2012, 10:38:43 AM4/7/12
to web...@googlegroups.com
Thanks for the req/sec break down, we get most of our traffic in a 5-6 hour window period.

A code review would be good I am sure there are a lot of areas that could use some improvements.

--
Regards,
Bruce


On Sat, Apr 7, 2012 at 7:33 AM, Anthony <abas...@gmail.com> wrote:
1M reqs/day is 7 reqs/minute.

Now we know how Massimo gets so much done -- his days are 2400 hours long. ;-) 



Massimo Di Pierro

unread,
Apr 7, 2012, 11:16:01 AM4/7/12
to web...@googlegroups.com
Right....what? [changing color to red] Note to self. Never do math before fully waking up. 

Correction: 2 reqs/second/server. (1M/24/60/60/6 servers). Still ok from web2py prospective.
If all requests hit the database once: 12 reqs/second. It really depends of what those requests do.

Massimo

Bruce Wade

unread,
Apr 7, 2012, 11:42:03 AM4/7/12
to web...@googlegroups.com
Well I know the major bottle neck in the site is the binary tree. I am still trying to figure out how to do this the best and most efficient. If this was a stand alone app in c/c++ this would all be in memory which wouldn't be such a problem. However how this is implemented it needs to query through the database to find nodes. As you can imagine as we get more and more members this will take much longer to run.

See bellow code, binary_tree is called from a action and distributor is a database object IE: db(db.distributors.id == id).select().first()
def binary_tree(self, original_dist, distributor, levels, counter=0, month=None):
       
        children
= []
        left_count
= right_count = counter
        target_month
= month
       
       
if distributor.left_id and left_count < levels:
            left_count
+= 1
           
#print "left - A", distributor.left_id
            children
.append(self.binary_tree(original_dist, distributor.left_id, levels, left_count, target_month))
       
       
if distributor.right_id and right_count < levels:
            right_count
+= 1
           
#print "right - B", distributor.right_id
            children
.append(self.binary_tree(original_dist, distributor.right_id, levels, right_count, target_month))
               
       
if distributor.sponsor_id == original_dist:
            sponsored_by_original
= True
       
else:
            sponsored_by_original
= False
           
        tree
= {
           
'id': ("B" if distributor.side else "A") + "-" + str(distributor.id),
           
'name': "<img src='/static/images/" + \
           
("top_tree_icon.png" if distributor.id == original_dist else ("sponsored_tree_icon.png" if sponsored_by_original else  "user_tree_icon.png")) + \
           
"' id='img-" + ("B" if distributor.side else "A") + "-" + str(distributor.id) + "' /><span class='idlabel'>" + \
            str
("0"*(7-len(str(distributor.id))))+str(distributor.id) + "-" + ("B" if distributor.side else "A") + \
           
"</span><div class='userinfo' id='id-" + ("B" if distributor.side else "A") + "-" + \
            str
(distributor.id) + "'><ul><li class='userinfotop'><h4>" + distributor.display_name + "</h4><span>" + \
            self
.T("A CV:") + "<b>" + (str(distributor.left_id.binary_cv_total) if distributor.left_id else str(0)) + \
           
"</b></span><span>" + self.T("B CV:") + "<b>" + (str(distributor.right_id.binary_cv_total) if distributor.right_id else str(0)) + \
           
"</b></span></li><li  class='userinfobottom'><p>" + self.T("Sponsored by me ") + (self.T("YES") if sponsored_by_original else self.T("NO")) + \
           
"</p><p>" + (self.T("Has down lines") if distributor.right_id or distributor.left_id else self.T("Has no down lines")) + "</p><p>" + \
           
(self.T("Disqualified ") if not distributor.quali_bonus else self.T("Qualified ")) + self.T("to get bonus") + "</p></li></ul></div>",
           
'children': children
       
}
     
       
return tree
# This is the biggest bottle neck of our entire application. Need a much better way to handle tree traversing.
def traversingTree(self, distributor, list):        
       
if distributor:
            left_node
= distributor.left_id
            right_node
= distributor.right_id
           
if left_node:
                list
.append(left_node)
                self
.traversingTree(left_node,list)
       
           
if right_node:
                list
.append(right_node)
                self
.traversingTree(right_node,list)
   
   
def bottom_right_b(self, distributor):
        right_id
= None
        cur_dist
= distributor
       
while cur_dist.right_id:
            right_id
= cur_dist.right_id
            cur_dist
= cur_dist.right_id
       
print "Right Bottom", right_id
       
return right_id
   
   
def bottom_left_a(self, distributor):
        left_id
= None
        cur_dist
= distributor
       
while cur_dist.left_id:
            left_id
= cur_dist.left_id
            cur_dist
= cur_dist.left_id
       
print "Left Bottom", left_id
       
return left_id

Mengu

unread,
Apr 8, 2012, 10:31:47 AM4/8/12
to web...@googlegroups.com
hi bruce,

what are the specs of your web servers and db server/s? 

what web server are you using and how? 

Bruce Wade

unread,
Apr 8, 2012, 11:07:37 AM4/8/12
to web...@googlegroups.com
Each server:
Ubuntu 10.04
2GB Ram
80GM HD

Currently running apache 2 however switching to nginx + uwsgi.

Database postgresql 9.1

Massimo Di Pierro

unread,
Apr 8, 2012, 11:19:47 AM4/8/12
to web...@googlegroups.com
The recipe book described a 

class TreeProxy(object): ...

which implements unordered tree traversal. It is the most efficient way to store records in a tree and retrieve them efficiently in a single query.
Anyway, if this is the bottle neck, you should definitively cache it somehow.

Bruce Wade

unread,
Apr 8, 2012, 11:34:44 AM4/8/12
to web...@googlegroups.com
Yeah the TreeProxy won't work in our case, as this is a MLM tree which means there is really no order. For example:

              1
    2                  10
11    9            3      4

Etc, I tried to convince the CEO to use a balanced tree when we first started programming but that wasn't an option he wants people/nodes in the tree to determine where the new person goes and on which leg. So I am not sure how to query all left and or right without using a loop/recursive function.

Ron McOuat

unread,
Apr 9, 2012, 12:10:34 AM4/9/12
to web...@googlegroups.com

Bruce,

It might help, maybe not, but Pragmatic Programmers has a book called SQL AntiPatterns with Chapter 3 dedicated to tree structures in databases

http://pragprog.com/book/bksqla/sql-antipatterns

They show several alternatives to the usual starting point of adjacency lists to describe trees in databases.

I don't understand your problem domain of MLM and I guess am not sure why there would be only left and right child nodes at a particular level, I would think a person could sponsor more than 2 people.

The book discusses the SQL99 WITH keyword for recursive queries which PostgreSQL supports since 8.,4 as well as alternative ways to express a hierarchy in a table.

Hope it is some use.

Ron

Bruce Wade

unread,
Apr 9, 2012, 12:21:43 AM4/9/12
to web...@googlegroups.com
Thanks Ron I will take a look.

You are correct someone can sponsor more then 2 people, they can sponsor as many people as they wish. However the sponsor tree and binary tree a very different.

For example a binary tree has two legs, our company populates one leg, and the opposite leg is up to the member to populate.

IE: Say we populate B for both B and A
                       root
         A                            B
               B                             B
                    B                             B
                A                             A
            A      B                     A      B

Now if we populate B in a strait line, meaning the A found under B that contains A and B it is up to A to fill both of them legs under it. Don't worry if you don't understand this, this entire process is quite complicated and also now our tree has over 20,000 nodes in it already. We predict 200,000 -> 500,000 nodes by the end of the summer.

Where as a sponsor tree isn't a binary tree and is based on generations:
                                     you
sponsor L1     sponsor L1      sponsor L1    sponsor L1    ...
sponsor L2                           sponsor L2    sponsor L2
                                           sponsor L3
                                                ...

stefaan

unread,
Apr 10, 2012, 7:32:58 AM4/10/12
to web...@googlegroups.com



Well I know the major bottle neck in the site is the binary tree. I am still trying to figure out how to do this the best and most efficient.

 

Richard Vézina

unread,
Apr 10, 2012, 1:39:53 PM4/10/12
to web...@googlegroups.com
Yes AntiPattern cover 4 or 5 kind of tree representation, classify them depending of usage and gives pros and cons.

I choose Closure table since it was one of the more complet for my case. But if parent node is changing frequently it's not the more effecient tree antipattern.

Have look it is pretty instructive.

Book contain little more explanation, but there is this slide :


Richard

Bruce Wade

unread,
Apr 11, 2012, 9:45:14 AM4/11/12
to web...@googlegroups.com
Ok so I have read the chapter I think the best option is postgres recursive queries.

Richard Vézina

unread,
Apr 11, 2012, 10:05:27 AM4/11/12
to web...@googlegroups.com
Recursion is slow because the union... But it may fit yours need better I don't know. 

Richard

Bruce Wade

unread,
Apr 11, 2012, 10:09:40 AM4/11/12
to web...@googlegroups.com
Yeah not sure have never used recursion at the database level. However it seems to be the only option, none of the other options in that chapter fit my needs.

Richard Vézina

unread,
Apr 11, 2012, 10:29:25 AM4/11/12
to web...@googlegroups.com
I made a lot of that last year... Sometimes it was driving me nuts, needing to have the same columns for each table in the union... Good naming convention helps to make thing clearer when you get back to the code...

Note, I would try to make it with web2py first if I were needing to write those code again.

Richard

Bruce Wade

unread,
Apr 11, 2012, 10:41:05 AM4/11/12
to web...@googlegroups.com
Yeah probably is I already have it with web2py but it is slow with 20,000 nodes takes around 5-10 seconds to load the results.

Richard Vézina

unread,
Apr 11, 2012, 10:45:20 AM4/11/12
to web...@googlegroups.com
Maybe need some code optimisation... Also I would look at the database level and make sure you have index on the right columns.

Those links could helps :



Richard

villas

unread,
Apr 11, 2012, 2:58:34 PM4/11/12
to web...@googlegroups.com
I've just been working with a tree myself.  Database recursion (CTE) seems very effective and is now supported by most of the larger DBs (although sadly not Sqlite yet,  I don't think). 
This is the reference link for Postgres and a SQL query I wrote for Firebird, 
I thought it might vaguely help if you are heading in that direction:

http://old.storytotell.org/blog/2009/08/11/postgresql84-recursive-queries.html
        sql = """
            WITH RECURSIVE breadcrumb(id, descr, parent_id) AS (
              SELECT id, descr, parent_id FROM area a WHERE id = %s
              UNION ALL
                SELECT a.id, a.descr, a.parent_id FROM area a, breadcrumb
                WHERE breadcrumb.parent_id = a.id)
            SELECT * FROM breadcrumb;
        """
% id
        rows
= db.executesql(sql,as_dict=True)

However,  I also like the method of saving paths in a field (I believe the method is called "Materialized Path").  It seems simple and versatile and I already decided to try it next time I need a simple tree and of course this would work fine on Sqlite too. 

Best wishes,
David

Bruce Wade

unread,
Apr 11, 2012, 3:11:21 PM4/11/12
to web...@googlegroups.com
Thanks that will be helpful, I liked the idea of paths also. However if you have 200,000 + nodes in a tree the paths might be coming hard to work with.

villas

unread,
Apr 12, 2012, 8:50:56 AM4/12/12
to web...@googlegroups.com

Thanks that will be helpful, I liked the idea of paths also. However if you have 200,000 + nodes in a tree the paths might be coming hard to work with.


Yes, my tree was somewhat smaller!  It seems to me that "with recursive" is the easiest method to manage this,  although of course the DAL does not have any support for this.

I have been thinking more about the path method and for most cases can see that this would be a very high performance and easy method.  Having read through several papers on tree structures which seem absurdly complex,  I am amazed that the path method is not everyone's first choice for most simple tree structures.  Even more so than the widely used nested sets.  The paths are so intuitive and,  not least,  solve everyone's breadcrumb problems at a stroke!  I wonder to myself why on earth everyone only seems to mention the path method as an after-thought.

Best wishes
David

Reply all
Reply to author
Forward
0 new messages