Project optimisation stage: Advice to boost speed of database queries across tables?

13 views
Skip to first unread message

Sam Walters

unread,
Dec 30, 2009, 1:46:53 AM12/30/09
to django...@googlegroups.com
Hi
I have reached the optimisation stage of my project.
I am trying to work out how to reduce the turnaround time for my queries.
I implemented 'db_index=True' for the fields where i determined there
should be a performance boost by indexing that particular field.
This has not really improved the speed of the queries.

Worst case senario is that my website will have to display about 500
out of 6000 entries in the db.

Each entry bridges a many-to-many relationship with another table and
also follows a reverse one-to-one which will return a set of related
entries. Often the query takes 12-14 seconds from start to finish.
(and 4-5 seconds to display around 100 results)
The major performance penalty thus far has been measured to be when
the MySQL statements for these queries are executed.
The many-to-may and one-to-many components only return 1-3 related
records at most for each entry and they always hit indexes so that
part of the design is optimised.

I have been using the following tools:

print connection.queries
import time
and looking at the raw MySQL to see whatother optmisations could be made.
use of MySQL EXPLAIN

Anyway, here are the details:

*in views.py the method that packages the results from the mysql query:
http://pastebin.com/m3eef56e5

*the models associated with this from two applications: 'directory' and 'common'
http://pastebin.com/m3868a1fc
http://pastebin.com/m18ec3765

*python manage.py sqlall directory && python manage.py sqlall common:

http://pastebin.com/m63a50593
http://pastebin.com/m6f958cda

As far as I can tell indexes are working and the queries should be fast.

Here is the MySQL per iteration (will execute 500 times for my worst
case scenario which takes 12 seconds on average)

SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
`common_addresstype` WHERE `common_addresstype`.`id` = 1;
SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
`common_addresstype` WHERE `common_addresstype`.`id` = 2;
SELECT `common_addradmin`.`id`,
`common_addradmin`.`surfaceMail_works`,
`common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE
`common_addradmin`.`id` = 1;
SELECT `common_address`.`id`, `common_address`.`airfield_id`,
`common_address`.`country_id`, `common_address`.`addresstype_id`,
`common_address`.`addradmin_id`, `common_address`.`location_id`,
`common_address`.`street1`, `common_address`.`street2`,
`common_address`.`user_lat_dec`, `common_address`.`user_long_dec`,
`common_address`.`zoom` FROM `common_address` INNER JOIN
`common_address_directory` ON (`common_address`.`id` =
`common_address_directory`.`address_id`) WHERE
`common_address_directory`.`directory_id` = 4267;

Last but not least Explain for the above MySQL for 1 instance of 500 entries.

mysql> EXPLAIN SELECT `common_addresstype`.`id`,
`common_addresstype`.`adrtype` FROM `common_addresstype` WHERE
`common_addresstype`.`id` = 1;
+----+-------------+--------------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | common_addresstype | const | PRIMARY |
PRIMARY | 4 | const | 1 | |
+----+-------------+--------------------+-------+---------------+---------+---------+-------+------+-------+

mysql> EXPLAIN SELECT `common_addresstype`.`id`,
`common_addresstype`.`adrtype` FROM `common_addresstype` WHERE
`common_addresstype`.`id` = 2;
+----+-------------+--------------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | common_addresstype | const | PRIMARY |
PRIMARY | 4 | const | 1 | |
+----+-------------+--------------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT `common_addradmin`.`id`,
`common_addradmin`.`surfaceMail_works`,
`common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE
`common_addradmin`.`id` = 1;
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | common_addradmin | const | PRIMARY |
PRIMARY | 4 | const | 1 | |
+----+-------------+------------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)


mysql> EXPLAIN SELECT `common_address`.`id`,
`common_address`.`airfield_id`, `common_address`.`country_id`,
`common_address`.`addresstype_id`, `common_address`.`addradmin_id`,
`common_address`.`location_id`, `common_address`.`street1`,
`common_address`.`street2`, `common_address`.`user_lat_dec`,
`common_address`.`user_long_dec`, `common_address`.`zoom` FROM
`common_address` INNER JOIN `common_address_directory` ON
(`common_address`.`id` = `common_address_directory`.`address_id`)
WHERE `common_address_directory`.`directory_id` = 4267;
+----+-------------+--------------------------+--------+--------------------------------------------------+---------------------------------------+---------+----------------------------------------------------+------+-------+
| id | select_type | table | type | possible_keys
| key
| key_len | ref |
rows | Extra |
+----+-------------+--------------------------+--------+--------------------------------------------------+---------------------------------------+---------+----------------------------------------------------+------+-------+
| 1 | SIMPLE | common_address_directory | ref |
address_id,directory_id_refs_id_4b74d39abef4d575 |
directory_id_refs_id_4b74d39abef4d575 | 4 | const
| 1 | |
| 1 | SIMPLE | common_address | eq_ref | PRIMARY
| PRIMARY
| 4 | aeroclub_devel.common_address_directory.address_id |
1 | |
+----+-------------+--------------------------+--------+--------------------------------------------------+---------------------------------------+---------+----------------------------------------------------+------+-------+
2 rows in set (0.00 sec)

What I dont want to do is use caching id rather solve this first. I
also dont want to refactor the db into less tables, it is designed
near perfect so far as db Object Oriented design principles and
requirements for the project.


cheers
-Sam

Adam Playford

unread,
Dec 30, 2009, 10:15:26 AM12/30/09
to Django users
I'm not an expert on this, but a few thoughts.

First, if I'm reading your message right, it sounds like your problem
probably isn't with the query, but with how many times you're running
it. If you think about it, if it's taking you 12 seconds to run *500*
queries, that's actually pretty darn good -- only 24 ms per query,
which is quite fast.

So (again, if I understand you correctly) I would try focusing on how
you can execute that query fewer times. It should be possible to not
run a different query for every single item; you'll just need to look
at how you're using Django's ORM.

A few thoughts:

1) If you haven't looked at select_related, give it a shot. It might
fix your problem entirely, particularly if you can find the right
depth: http://docs.djangoproject.com/en/1.1/ref/models/querysets/#id4

2) If that doesn't work, try setting up django-debugtoolbar. In fact,
I'd do that anyway. It's a great help at narrowing down problems like
this, because it shows you every query a page uses, and then shows you
how long each individual query takes to execute. This should let you
narrow in on the problem more precisely: http://github.com/robhudson/django-debug-toolbar

Good luck.

Adam

> *the models associated with this from two applications: 'directory' and 'common'http://pastebin.com/m3868a1fchttp://pastebin.com/m18ec3765


>
> *python manage.py sqlall directory && python manage.py sqlall common:
>

> http://pastebin.com/m63a50593http://pastebin.com/m6f958cda

> 1 row in set (0.00 sec)
>
> mysql> EXPLAIN SELECT `common_addradmin`.`id`,
> `common_addradmin`.`surfaceMail_works`,
> `common_addradmin`.`addr_enabled` FROM `common_addradmin` WHERE
> `common_addradmin`.`id` = 1;
> +----+-------------+------------------+-------+---------------+---------+-- -------+-------+------+-------+

> | id | select_type | table            | type  | possible_keys | key
>  | key_len | ref   | rows | Extra |

> +----+-------------+------------------+-------+---------------+---------+-- -------+-------+------+-------+


> |  1 | SIMPLE      | common_addradmin | const | PRIMARY       |
> PRIMARY | 4       | const |    1 |       |
> +----+-------------+------------------+-------+---------------+---------+-- -------+-------+------+-------+
> 1 row in set (0.00 sec)
>
> mysql> EXPLAIN SELECT `common_address`.`id`,
> `common_address`.`airfield_id`, `common_address`.`country_id`,
> `common_address`.`addresstype_id`, `common_address`.`addradmin_id`,
> `common_address`.`location_id`, `common_address`.`street1`,
> `common_address`.`street2`, `common_address`.`user_lat_dec`,
> `common_address`.`user_long_dec`, `common_address`.`zoom` FROM
> `common_address` INNER JOIN `common_address_directory` ON
> (`common_address`.`id` = `common_address_directory`.`address_id`)
> WHERE `common_address_directory`.`directory_id` = 4267;
> +----+-------------+--------------------------+--------+------------------- -------------------------------+---------------------------------------+--- ------+----------------------------------------------------+------+-------+

> | id | select_type | table                    | type   | possible_keys
>                                    | key
>     | key_len | ref                                                |
> rows | Extra |

> +----+-------------+--------------------------+--------+------------------- -------------------------------+---------------------------------------+--- ------+----------------------------------------------------+------+-------+

Nick Arnett

unread,
Dec 30, 2009, 10:41:02 AM12/30/09
to django...@googlegroups.com
On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford <adam.p...@gmail.com> wrote:
I'm not an expert on this, but a few thoughts.

First, if I'm reading your message right, it sounds like your problem
probably isn't with the query, but with how many times you're running
it.

I'll echo that... the problem is not the database - the queries are as good as it gets.  The problem is running them repeatedly.

If all else fails, I'd replace those queries that execute 500 times with raw SQL that uses the IN operator to get the required rows.

E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
`common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)

I imagine there's an ORM query that will do the same thing, but I know MySQL far better than I know Django.

Nick

Sam Walters

unread,
Dec 30, 2009, 7:56:39 PM12/30/09
to django...@googlegroups.com
Thanks for the replies.

Yes, there is the option of going to raw MySQL. However the project
requirements mean i can't use raw SQL. (portability, readability)
From what i can see using django's db API i have to execute the
queries 500 times.
I am very familiar with the query documentation and i know that
select_related will prevent foward facing foreign keys translating to
an individual sql queries which hit the db and would slow it down.

Fact is even when i dont use 'select_related' the major performance
problem occurs with the 'many-to-many' and 'reverse foreign' keys
(some 75% of the performance penalty for my package method is with
these) and only 20% can be solved by select_related.

To be specific about how the multiplicities unfold:

search_querySet is a Directory.objects.filter(...

for s in search_querySet:
address_info = s.address_set.all() #.select_related(depth=2) -
yes i can/will put select related here but it really does not help
that much 20% tops
#address_info is usually 2-3 rows from an address table
for a in address_info:#.select_related(depth=2):
if a.addresstype.adrtype == 'Physical' and
a.addradmin.addr_enabled == True:
#further reduction in the number of rows which we need to
get values from.
related_phone=a.phone_set.all()
related_email=s.email_set.all()
#phones and emails are a small number of rows 2-3 tops

It is these lines which produce the performance hit.
I cant see a way of using django's query language to avoid having to
descend into each of the 500 'directory' objects because of the
necessity to get all rows from the related tables in phones an emails
and to inspect the type of 'address' object.

thanks for the ideas. will continue testing and looking for answers

-Sam

> --
>
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>

Tomasz Zieliński

unread,
Jan 3, 2010, 8:05:23 AM1/3/10
to Django users

On 30 Gru 2009, 16:41, Nick Arnett <nick.arn...@gmail.com> wrote:
>
> E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
> `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)
>
> I imagine there's an ORM query that will do the same thing, but I know MySQL
> far better than I know Django.
>

This is as simple as using .filter(addresstype__id__in=[1,6,8,52,173])

--
Tomasz Zielinski
http://pyconsultant.eu

Tomasz Zieliński

unread,
Jan 3, 2010, 8:11:10 AM1/3/10
to Django users
On 31 Gru 2009, 01:56, Sam Walters <mr.sam...@gmail.com> wrote:

> for s in search_querySet:
>         address_info = s.address_set.all() #.select_related(depth=2) -
> yes i can/will put select related here but it really does not help
> that much 20% tops
>         #address_info is usually 2-3 rows from an address table
>         for a in address_info:#.select_related(depth=2):
>             if a.addresstype.adrtype == 'Physical' and
> a.addradmin.addr_enabled == True:
>             #further reduction in the number of rows which we need to
> get values from.
>             related_phone=a.phone_set.all()
>             related_email=s.email_set.all()
>             #phones and emails are a small number of rows 2-3 tops
>
> It is these lines which produce the performance hit.
> I cant see a way of using django's query language to avoid having to
> descend into each of the 500 'directory' objects because of the
> necessity to get all rows from the related tables in phones an emails
> and to inspect the type of 'address' object.
>

I solved very similar problem by creating database VIEWs with data I
needed,
wrapping those VIEWs with unmanaged Django models and then using
simple .filter(...)-s.

Re those VIEWs, they are SELECTs with JOINs, which effectively
produce up to a few rows for single object (e.g. if you have person
with 3 phone numbers,
you're getting 3 rows), but it turns out to be much more efficient to
process/merge
that in Python code than to issue hundreds of SQL queries.

Sam Walters

unread,
Jan 4, 2010, 1:30:36 AM1/4/10
to django...@googlegroups.com
Hi Tomasz
Yes, i have followed a raw sql approach now im looking at my test data
to see which objects have multiple rows and cleaning that up.

Its a shame that '__in' has limited use under these scenarios:

directories = search_querySet.distinct()
addresses = Address.objects.filter(directory__in=directories)
addresses.values('directory__id', ..... *some other relevent fields*)

this certainly allows me to select the related sets of addresses for
each directory however i need to be able to relate each address object
back to its directory (by id would be great) trying to get the
directory 'id' packaged in the values() gives me an error saying
invalid field even though 'directory' is listed as a valid field.
If i could do that then i could iterate through each dictionary and
zip related items together based on their directory id's or something
nice like that.

"
Re those VIEWs, they are SELECTs with JOINs, which effectively
produce up to a few rows for single object (e.g. if you have person
with 3 phone numbers,
you're getting 3 rows), but it turns out to be much more efficient to
process/merge
that in Python code than to issue hundreds of SQL queries.
"

Yes this seems to be the best way, do you have any links where i can
see how various people have implemented this? Would be good to write a
'pythonic' solution


cheers

-sam
2010/1/4 Tomasz Zieliński <tomasz.z...@pyconsultant.eu>:

Tomasz Zieliński

unread,
Jan 5, 2010, 1:12:27 PM1/5/10
to Django users

On 4 Sty, 07:30, Sam Walters <mr.sam...@gmail.com> wrote:
> Hi Tomasz
> Yes, i have followed a raw sql approach now im looking at my test data
> to see which objects have multiple rows and cleaning that up.
>
> Its a shame that '__in' has limited use under these scenarios:
>
> directories = search_querySet.distinct()
> addresses = Address.objects.filter(directory__in=directories)
> addresses.values('directory__id', ..... *some other relevent fields*)
>
> this certainly allows me to select the related sets of addresses for
> each directory however i need to be able to relate each address object
> back to its directory (by id would be great) trying to get the
> directory 'id' packaged in the values() gives me an error saying
> invalid field even though 'directory' is listed as a valid field.

I'm not 100% sure, but AFAIR you can only take values of fields
contained directly in models.

> If i could do that then i could iterate through each dictionary and
> zip related items together based on their directory id's or something
> nice like that.
>
> "
> Re those VIEWs, they are SELECTs with JOINs, which effectively
> produce up to a few rows for single object (e.g. if you have person
> with 3 phone numbers,
> you're getting 3 rows), but it turns out to be much more efficient to
> process/merge
> that in Python code than to issue hundreds of SQL queries.
> "
>
> Yes this seems to be the best way, do you have any links where i can
> see how various people have implemented this? Would be good to write a
> 'pythonic' solution
>

I wrote simple function that works for my purposes, I suspect that
you use-case also doesn't require anything fancy, but I agree
that 'pythonic' solutions are good for morale :-)

koenb

unread,
Jan 6, 2010, 7:07:15 AM1/6/10
to Django users
> On Thu, Dec 31, 2009 at 2:41 AM, Nick Arnett <nick.arn...@gmail.com> wrote:
>
> > On Wed, Dec 30, 2009 at 7:15 AM, Adam Playford <adam.playf...@gmail.com>

> > wrote:
>
> >> I'm not an expert on this, but a few thoughts.
>
> >> First, if I'm reading your message right, it sounds like your problem
> >> probably isn't with the query, but with how many times you're running
> >> it.
>
> > I'll echo that... the problem is not the database - the queries are as good
> > as it gets.  The problem is running them repeatedly.
>
> > If all else fails, I'd replace those queries that execute 500 times with raw
> > SQL that uses the IN operator to get the required rows.
>
> > E.g.: SELECT `common_addresstype`.`id`, `common_addresstype`.`adrtype` FROM
> > `common_addresstype` WHERE `common_addresstype`.`id` IN (1,6,8,52,173)
>
> > I imagine there's an ORM query that will do the same thing, but I know MySQL
> > far better than I know Django.
>
> > Nick
>

You can take a look at apps like django-selectreverse [1] or django-
batch-select [2] for some ideas to make this kind of optimisations
easier.

Koen

[1] http://code.google.com/p/django-selectreverse/
[2] http://github.com/lilspikey/django-batch-select

Sam Walters

unread,
Jan 7, 2010, 5:39:17 AM1/7/10
to django...@googlegroups.com
Thanks Koen
I had suspected such things would exist but couldnt find them.
I will take a look at this. Looks like a neat addon which would
greatly help my project turnaround time.

cheers
Sam

Philippe Raoult

unread,
Jan 7, 2010, 8:18:34 AM1/7/10
to Django users
Hi Sam,

I've read your description and it doesn't mention how often your data
changes. If it's not too often, you might want to consider simply
caching the whole results or parts of it. You'll then have to add a
bit of code to regenerate the cache every time the data changes. It
looks to me like you have one query for selecting which entries to
print, then many repetitive queries to find information about each
entry. If you had a per-entry cache then you would probably be down to
one main query plus a lot of cache lookups.

Also you were on the right track when using addresses =
Address.objects.filter(directory__in=directories). I've often used
that kind of queries to reduce (1 + many queries) down to (2 + lots of
dictionnary operations)

Hope that helps.

Regards,
Philippe


On 7 jan, 11:39, Sam Walters <mr.sam...@gmail.com> wrote:
> Thanks Koen
> I had suspected such things would exist but couldnt find them.
> I will take a look at this. Looks like a neat addon which would
> greatly help my project turnaround time.
>
> cheers
> Sam
>

Sam Walters

unread,
Mar 25, 2010, 10:14:24 PM3/25/10
to django...@googlegroups.com
I also read a good article. As it turns out im using a lot of inner joins:

http://www.caktusgroup.com/blog/2009/09/

This was quite handy.
My processis usually build it to be as fast as possible in raw SQL
then go back in and try and makre it with the ORM.

"I've read your description and it doesn't mention how often your data
changes. If it's not too often, you might want to consider simply
caching the whole results or parts of it. You'll then have to add a
bit of code to regenerate the cache every time the data changes."

Yes actually in some of the scenarios the data is quite static,
however im doing a lot of stuff with lat/long coordinates now so it
means i have a bunch of keys in the results dict returned which are
all dynamic, per click values and need to be recalculated a lot.
Note: I have taken your advice on board for some of my views Philippe.

Cheers

Sam_W

Reply all
Reply to author
Forward
0 new messages