Speed Improvements: Properties, Database Denormalization, Caching, Connection Pooling, etc.

65 views
Skip to first unread message

Carlos D.

unread,
Oct 30, 2013, 11:47:01 PM10/30/13
to django...@googlegroups.com
So, first thing: I come from a C++ background and I've been building my first site with Django.  As I normally try to do in C++, I don't want to store any more information than I have to since that can lead to big problems with data consistency.  In C++, I do this with private members / methods, and a defined public interface.  For my Django project, I've been using properties since it seems like a nice way to be able to access data as if it were a member, but not actually store that information.  However, that brings me to my problem.  Since so much of the database querying has been abstracted away from me, I definitely haven't been thinking about overhead with database access time.  This is definitely not a knock on Django, I find it to be a great thing -- I only need to think about this now that I have some speed issues.  Anyway, I have multiple properties that end up calling a method that is something like:

def _get_item_total(self, prop_id):
     total = 0
     for item in self.item_set.all():
           att1 = item.att1 #Did this for speed?  I thought this would save a couple of hits on the DB.
           total += (item.one_to_one.one_to_one.m2mfield_set.all().get(id=prop_id).value * att1.quantity * att1.weight)

Note: I understand that the first two 1-1 fields seem redundant.  We're using an existing DB that we're slowly adding additional information to some of the entries (we probably won't do all of them, even in a final product).  We don't want any entry that we haven't identified the additional information for to be displayed in a drop down on another model.  Maybe there's a better way to do this, but that's what I've implemented for now.  If there are any other additional problems with that line of code, I'd love to hear it!

Anyway, In some views, I'm looking at ~5 instances of models that each have a 5 or so properties that use this method.  Each the item set has, on average, 5 items.  This means that the line:

(item.one_to_one.one_to_one.m2mfield_set.all().get(id=specified_prop_id).value * att1.quantity * att1.weight)

Is being hit 5*5*5 = 125 times for some views.  If I assumed 5ms per connection (is this reasonable??) this is ~0.6 seconds by itself just for overhead opening and closing connections to the DB.  There's also whatever time is associated with executing this query (which seems like it may be substantial given the number of tables I'm moving through).

The great news is that there seems to be many ways I might be able to address this:
  1. DB Connection pooling.  (Might address time associated with opening / closing connection)
  2. It sounds like Django may soon be supporting keeping a connection open for a set period of time.  (Same as above)
  3. Switching my properties to actual columns in the DB.  I should be able to keep these consistent since many of the tables are actually static information.  I think there's one point where saving occurs for any of the dynamic DB information that goes into the calculation. (Should help with some overhead -- a complex query would be reduced to a simple query).
  4. Django's cache framework.  (Could make significant improvements -- but this cache is only kept for a certain period of time.  Some users will still have to wait.)
  5. Database tuning (Don't know much about this...)
  6. It seems like it might be inefficient that each property is individually moving through the first three tables (is this the right way to say this??) since the prop_id isn't used until we get to m2mfield_set_all().  I could condense the number of tables that need to be traversed   However, it seems like this would make the # of calls to the DB equal to 2*n+1 rather than 2*n where n is the number of prop_ids I'm using.
  7. Maybe I need to check out the query this is making and write my own custom query if it's being inefficient?
I guess my questions are:
  • Is that line of code terrible?  Should I be doing this a better way?
  • If not, how do I do some time profiling to determine which of the above I should do?  Or should I just do each one and see if it improves?

Carlos D.

unread,
Oct 31, 2013, 12:09:39 AM10/31/13
to django...@googlegroups.com
Note: one other thing is that the legacy DB we're using has a character varying (PostGres) type as its primary key, even though it's really just an integer that should probably be auto incrementing serial type.  We're not adding anything new to this DB so the auto behavior isn't a big deal.  It seems like this might be just another speed penalty (although it doesn't seem like that would be what's really limiting us right now).

Mike Dewhirst

unread,
Oct 31, 2013, 12:33:52 AM10/31/13
to django...@googlegroups.com
On 31/10/2013 3:09pm, Carlos D. wrote:
> Note: one other thing is that the legacy DB we're using has a character
> varying (PostGres) type as its primary key, even though it's really just
> an integer that should probably be auto incrementing serial type.

If you install South you should be able to adjust that varchar to
integer. I haven't tried it myself but South is definitely magical and
is the way I'd try first. Otherwise, I'd practise dump, edit the data
and reload until I could happily script the conversion.

I would say there are definite speed benefits if you can delegate
primary key increments to the ORM/database.

Also, install django-debug-toolbar which will count all your queries for
you as you try different approaches.

Good luck

Mike


We're
> not adding anything new to this DB so the auto behavior isn't a big
> deal. It seems like this might be just another speed penalty (although
> it doesn't seem like that would be what's really limiting us right now).
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/4f330629-3d27-4edd-af2a-59c0900fdf81%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Mike Dewhirst

unread,
Oct 31, 2013, 9:52:18 PM10/31/13
to django...@googlegroups.com
Couple of links which might be of interest ...

http://www.jeffknupp.com/blog/2012/02/14/profiling-django-applications-a-journey-from-1300-to-2-queries/

http://gun.io/blog/fast-as-fuck-django-part-1-using-a-profiler/

ymmv

Mike
> 1. DB Connection pooling. (Might address time associated with opening
> / closing connection)
> 2. It sounds like Django may soon be supporting keeping a connection
> open for a set period of time. (Same as above)
> 3. Switching my properties to actual columns in the DB. I should be
> able to keep these consistent since many of the tables are actually
> static information. I think there's one point where saving occurs
> for any of the dynamic DB information that goes into
> the calculation. (Should help with some overhead -- a complex query
> would be reduced to a simple query).
> 4. Django's cache framework. (Could make significant improvements --
> but this cache is only kept for a certain period of time. Some
> users will still have to wait.)
> 5. Database tuning (Don't know much about this...)
> 6. It seems like it might be inefficient that each property is
> individually moving through the first three tables (is this the
> right way to say this??) since the prop_id isn't used until we get
> to m2mfield_set_all(). I could condense the number of tables that
> need to be traversed However, it seems like this would make the #
> of calls to the DB equal to 2*n+1 rather than 2*n where n is the
> number of prop_ids I'm using.
> 7. Maybe I need to check out the query this is making and write my own
> custom query if it's being inefficient?
>
> I guess my questions are:
>
> * Is that line of code terrible? Should I be doing this a better way?
> * If not, how do I do some time profiling to determine which of the
> above I should do? Or should I just do each one and see if it improves?
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/9f1ff8a8-140a-4071-9c9f-551b55a2b204%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages