DB optimization docs

22 views
Skip to first unread message

Luke Plant

unread,
Jan 8, 2010, 9:04:43 AM1/8/10
to django-d...@googlegroups.com
Hi all,

I was prompted by this post:

http://it.toolbox.com/blogs/database-soup/stuff-id-love-to-see-from-
django-36278

to add some notes about some DB access optimizations (essentially the
things I mentioned in my comment on that page), but then thought that
even if I add them, people are unlikely to find them. It's also not
possible to put admonitions everywhere in the docs to stop people
doing potentially expensive things. Rather, we need a list of tips
for someone looking to optimize DB queries. It could link to all the
relevant documentation (select_related() etc), as well as having misc
other tips.

Good idea/bad idea? At the moment, the information is mainly all
there, but spread out in many places - I think we need an "optimize DB
access" topic.

Luke

--
"My capacity for happiness you could fit into a matchbox without
taking out the matches first." (Marvin the paranoid android)

Luke Plant || http://lukeplant.me.uk/

Mat Clayton

unread,
Jan 8, 2010, 9:12:56 AM1/8/10
to django-d...@googlegroups.com
As someone going through this pain right now, this would be very helpful.

Mat

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.






--
--
Matthew Clayton | Founder/CEO
Wakari Limited

twitter http://www.twitter.com/matclayton

email m...@wakari.co.uk
mobile +44 7872007851

skype matclayton

Russell Keith-Magee

unread,
Jan 8, 2010, 9:30:39 AM1/8/10
to django-d...@googlegroups.com
On Fri, Jan 8, 2010 at 10:04 PM, Luke Plant <L.Pla...@cantab.net> wrote:
> Hi all,
>
> I was prompted by this post:
>
> http://it.toolbox.com/blogs/database-soup/stuff-id-love-to-see-from-
> django-36278
>
> to add some notes about some DB access optimizations (essentially the
> things I mentioned in my comment on that page), but then thought that
> even if I add them, people are unlikely to find them.  It's also not
> possible to put admonitions everywhere in the docs to stop people
> doing potentially expensive things.  Rather, we need a list of tips
> for someone looking to optimize DB queries.  It could link to all the
> relevant documentation (select_related() etc), as well as having misc
> other tips.
>
> Good idea/bad idea? At the moment, the information is mainly all
> there, but spread out in many places - I think we need an "optimize DB
> access" topic.

Sounds like a great idea to me. +1.

Russ %-)

Luke Plant

unread,
Jan 8, 2010, 10:33:56 AM1/8/10
to django-d...@googlegroups.com
On Friday 08 January 2010 14:12:56 Mat Clayton wrote:
> As someone going through this pain right now, this would be very
> helpful.
>
> Mat

I don't know when I'll have more time to work on this, but I've
committed the beginnings of draft of docs/topics/db/optimization.txt
to my hg repos.

http://bitbucket.org/spookylukey/django-trunk-
lukeplant/src/tip/docs/topics/db/optimization.txt

It needs to be hyperlinked to all the right bits, and filled out. But
it might be of some help to you in its current state.

If anyone wants to take this on as a mini-project, it would be
gratefully received :-) I'm not planning on doing any more on this
for at least a few days.

Mat Clayton

unread,
Jan 8, 2010, 12:35:54 PM1/8/10
to django-d...@googlegroups.com
Thanks, much aooreciated

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.



Christopher Petrilli

unread,
Jan 8, 2010, 1:02:53 PM1/8/10
to django-d...@googlegroups.com
Luke,

I wonder if it's the right place, but one thing I have noticed most
people don't realize is that they need to run something to do pooling
between Django and the DB. I run pgpool, even when there is only one
of each since it does the connection pooling for you. This can be a
HUGE win in many cases.

Chris

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>
>
>

--
| Chris Petrilli
| petr...@amber.org

Luke Plant

unread,
Jan 15, 2010, 10:32:04 PM1/15/10
to django-d...@googlegroups.com
I've added these docs now, or at least a good first stab at them.
Suggestions for improvements are welcome, patches are more welcome, as
always :-)

I backported to 1.1.X, and tried to remove any anachronisms.

Regards,

Luke

--
"Outside of a dog, a book is a man's best friend... inside of a
dog, it's too dark to read."

Luke Plant || http://lukeplant.me.uk/

Mat Clayton

unread,
Jan 16, 2010, 6:52:51 AM1/16/10
to django-d...@googlegroups.com
I know this isn't really related, but figure I would bring it up here. When working on optimizing our django app, it became very clear that one of the biggest issues as usual is Indexes, particularly multi column indexes, which at present django doesnt appear to support.

I know ticket 373 http://code.djangoproject.com/ticket/373 covers multi column primary keys, but are there any intentions to cover multi column indexes as a smaller issue prior to this. The reason I bring this up, is after checking the mysql logs we discovered our most costly queries are actually due to django comments and the lack of multicolumn indexes, after adding the correct index query times went from 300mS to <10mS giving us a nice speed boost. 

If this is likely to be a nice addition to django, we would be willing to commit some resources to trying to get it into core, any thoughts on this? Or does anyone else have a need for it? Or would we be standing on the toes of the 373 ticket?

I was thinking of cloning the unique_together style and adding an index_together Meta option, thoughts?

Mat

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.






--

Russell Keith-Magee

unread,
Jan 16, 2010, 7:27:47 AM1/16/10
to django-d...@googlegroups.com
On Sat, Jan 16, 2010 at 7:52 PM, Mat Clayton <m...@wakari.co.uk> wrote:
> I know this isn't really related, but figure I would bring it up here. When
> working on optimizing our django app, it became very clear that one of the
> biggest issues as usual is Indexes, particularly multi column indexes, which
> at present django doesnt appear to support.
> I know ticket 373 http://code.djangoproject.com/ticket/373 covers multi
> column primary keys, but are there any intentions to cover multi column
> indexes as a smaller issue prior to this. The reason I bring this up, is
> after checking the mysql logs we discovered our most costly queries are
> actually due to django comments and the lack of multicolumn indexes, after
> adding the correct index query times went from 300mS to <10mS giving us a
> nice speed boost.
> If this is likely to be a nice addition to django, we would be willing to
> commit some resources to trying to get it into core, any thoughts on this?

Sounds like a reasonable suggestion to me. I've had similar thoughts
over the last couple of weeks as I've been tuning the performance of
the database at work.

> Or does anyone else have a need for it? Or would we be standing on the toes
> of the 373 ticket?

I don't think it stands on the toes of #373 either. #373 will probably
imply the need for multi-column indexes, but that doesn't remove the
broader need for multi-column indexes.

> I was thinking of cloning the unique_together style and adding an
> index_together Meta option, thoughts?

A Meta option sounds like the right way to define it. Getting out my
bikeshed-painting brush, I'd probably call the Meta option
'db_index_together', so as to maintain the parity between
'unique->unique_together' and 'db_index->db_index_together'.

We have about 10 days until the 1.2 beta freeze; if you can work up a
patch quickly, I can see about getting this in for 1.2. Any
suggestions on multi-column indexes that we should add to shipped
Django models are also welcome (i.e., the indexes that are needed for
comment models).

Yours,
Russ Magee %-)

Mat Clayton

unread,
Jan 16, 2010, 7:56:11 AM1/16/10
to django-d...@googlegroups.com
Great, have to confess I don't know the ORM internals at all, so not sure how long this could take me, shall see what I can do. Also any indications of where/what to start looking for would be appreciated.

'db_index_together' seems fair enough, will be trivial to change later anyway.

With respect to db support I assume all backend's need support in any patch?

Mat

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.



Russell Keith-Magee

unread,
Jan 16, 2010, 8:34:53 AM1/16/10
to django-d...@googlegroups.com
On Sat, Jan 16, 2010 at 8:56 PM, Mat Clayton <m...@wakari.co.uk> wrote:
> Great, have to confess I don't know the ORM internals at all, so not sure
> how long this could take me, shall see what I can do. Also any indications
> of where/what to start looking for would be appreciated.

In the end, db_index_together will be a combination of
unique_together's Meta handling, and db_index's index building. If you
follow the threads that make those two features work, you should be
able to cobble together something.

> 'db_index_together' seems fair enough, will be trivial to change later
> anyway.
>
> With respect to db support I assume all backend's need support in any patch?

All backends will need to be supported before anything hits trunk.

However, if you don't have the facilities to test a specific backends,
it's ok to submit a patch that hasn't been fully tested. As long as
you declare up front anything that you know to be untested, I'll
probably accept a patch that makes allowances for the differences
between backends and "looks right on visual inspection".

Of course, if you *are* able to produce a patch that works for all
backends, that would be excellent, and will dramatically improve the
chances of this getting into trunk for 1.2. The more work I have to do
to finish the patch, the less likely it is I will have the time to
complete the work.

Regardless of whether you attempt the patch or not, you should open a
ticket so that the idea isn't forgotten.

Yours,
Russ Magee %-)

Mat Clayton

unread,
Jan 16, 2010, 8:55:44 AM1/16/10
to django-d...@googlegroups.com
Thanks, I've looked over both sets of code, seem's fine to me. The only complex stuff is that Indexes can only be created on certain field types, so additional checks may need doing. Also different index types are available on different storage engines, for MySQL. Would people want support for selecting the index type and direction? 

Finally char field index lengths need to be considered, I'm assuming the Meta field would have to take these into account as well.

Also have reopened a ticket http://code.djangoproject.com/ticket/5805 to cover this

Mat

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.



Russell Keith-Magee

unread,
Jan 16, 2010, 9:03:44 AM1/16/10
to django-d...@googlegroups.com
On Sat, Jan 16, 2010 at 9:55 PM, Mat Clayton <m...@wakari.co.uk> wrote:
> Thanks, I've looked over both sets of code, seem's fine to me. The only
> complex stuff is that Indexes can only be created on certain field types, so
> additional checks may need doing. Also different index types are available
> on different storage engines, for MySQL. Would people want support for
> selecting the index type and direction?

I'd say don't bother. If you *really* want to fine tune your database,
you aren't going to use the ORM - you're going to use initial SQL
defintions, or leave it up to your handy neighborhood db admin. On top
of that, anything beyond a simple "CREATE INDEX foo ON bar (attr,
attr)" is going to start walking into areas that are specific to
individual databases.

db_index=True has been sufficient for indexes of all types so far; I'm
happy to continue the trend here.

> Finally char field index lengths need to be considered, I'm assuming the
> Meta field would have to take these into account as well.
> Also have reopened a ticket http://code.djangoproject.com/ticket/5805 to
> cover this

Cheers. I should have known this had been suggested before :-)

Yours,
Russ Magee %-)

Mat Clayton

unread,
Jan 17, 2010, 1:03:35 PM1/17/10
to django-d...@googlegroups.com
Thrown together a first attempt at adding multi column indexes and just want some comments on the syntax used, for example.

from django.db import models

class Poll(models.Model):
    question = models.CharField(max_length=200, db_index=True)
    pub_date = models.DateTimeField('date published')

    class Meta:
        db_index_together = ('id', '-question:1',),

will create an extra index on the Poll model using

BEGIN;
CREATE INDEX `app_poll_question` ON `app_poll` (`question`);
CREATE INDEX `app_poll_3d2b7686` ON `app_poll` (`id`, `question`(1) DESC);
COMMIT;

As you can see I decided to implement the ASC/DESC using the conventional '-' sign to indicate a DESC, although MySQL accepts this at the moment, it currently is ignored by the parser, but other backends including sqlite3 can use it. Also I've gone with syntax which allows the index length to be defined. The index name is also a hash of the various fields, preventing it from becoming too long.

What isn't demo'ed here is db_index_together also accepts a tuple or tuple of tuple's to match the interface to unique_together.

Anyone got any comments or feedback on the interface?

Ive not had time to put together docs/tests yet, but don't expect the doc's to take too long. Does anyone have any recommendation on how to approach the tests? Can't say i'm big on tests, and not sure where to start with these.

Mat

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.



Russell Keith-Magee

unread,
Jan 17, 2010, 9:35:18 PM1/17/10
to django-d...@googlegroups.com
On Mon, Jan 18, 2010 at 2:03 AM, Mat Clayton <m...@wakari.co.uk> wrote:
> Thrown together a first attempt at adding multi column indexes and just want
> some comments on the syntax used, for example.
> from django.db import models
> class Poll(models.Model):
>     question = models.CharField(max_length=200, db_index=True)
>     pub_date = models.DateTimeField('date published')
>     class Meta:
>         db_index_together = ('id', '-question:1',),
> will create an extra index on the Poll model using
> BEGIN;
> CREATE INDEX `app_poll_question` ON `app_poll` (`question`);
> CREATE INDEX `app_poll_3d2b7686` ON `app_poll` (`id`, `question`(1) DESC);
> COMMIT;
> As you can see I decided to implement the ASC/DESC using the conventional
> '-' sign to indicate a DESC, although MySQL accepts this at the moment, it
> currently is ignored by the parser, but other backends including sqlite3 can
> use it. Also I've gone with syntax which allows the index length to be
> defined. The index name is also a hash of the various fields, preventing it
> from becoming too long.
> What isn't demo'ed here is db_index_together also accepts a tuple or tuple
> of tuple's to match the interface to unique_together.
> Anyone got any comments or feedback on the interface?

I'm not really keen on the :1 syntax. Like I said last time, if you're
serious about database optimization, you don't use the ORM, you do it
by hand. Django provides support for this sort of manual optimization
with initial SQL files. I'd be perfectly happy to just allow full
content indexes for text/char fields, and leave partial indexes as a
hand-optimization activity.

The DESC syntax is almost in the same boat for me, but given that it's
fairly simple to add, and consistent with order_by/ordering, I can
live with it.

> Ive not had time to put together docs/tests yet, but don't expect the doc's
> to take too long. Does anyone have any recommendation on how to approach the
> tests? Can't say i'm big on tests, and not sure where to start with these.

I would suggest that there are two test groups that are requried:

Firstly, error handling. A bad db_index_together definition (e.g.,
referencing a column that doesn't exist) should raise a
ValidationError. You should be able to do this in the invalid_models
test - define a bad model, and check that the validation error is
returned as expected. As an aside, it looks like we're not actually
testing unique_together in this way, so while you're in the area...
:-)

Secondly, that a well defined db_index_together statement will sync,
and doesn't break queries. To do this, create an app with a couple of
tables that have various db_index_together definitions that span the
interesting combinations. Then do some simple queries that should hit
the index. It doesn't really matter what the input is or the results
are - all you're looking to test is that a model with a
db_index_together definition will create tables and not break.

Strictly, this doesn't actually prove that the index is working - a
no-op for index handling would pass the same test - but there isn't
much we can to to validate query plans or table definitions. In this
case, we have to rely on eyeballing that indexes are actually created,
and then rely on the fact that the tests pass as a measure that what
is created is valid.

Yours,
Russ Magee %-)

Reply all
Reply to author
Forward
0 new messages