[Django] #22438: Slow INNER JOIN in MySQL can be fixed in Django ORM, but should it?

307 views
Skip to first unread message

Django

unread,
Apr 14, 2014, 1:40:42 PM4/14/14
to django-...@googlegroups.com
#22438: Slow INNER JOIN in MySQL can be fixed in Django ORM, but should it?
-------------------------------------+-------------------------------------
Reporter: frol | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) | Keywords: mysql, orm, slow query,
Severity: Normal | join
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
There is the bug in MySQL INNER JOIN optimization, but it can be fixed in
Django ORM by replacing INNER JOIN with STRAIGHT_JOIN. In short, ordered
inner join selections take 3+ seconds, where if replace INNER JOIN with
STRAIGHT_JOIN, we get result in 0.001.

Here are steps to reproduce:
1) We need a model with ForeignKey:
{{{
from django.conf import settings
from django.db import models

class Child(models.Model):
name = models.CharField("Name", max_length=255)
owner = models.ForeignKey(settings.AUTH_USER_MODEL)
}}}

2) Initialize data:
{{{
from random import choice
from django.contrib.auth import get_user_model
from qq.models import *

User = get_user_model()
users = User.objects.all()
Child.objects.bulk_create(Child(name='child_%d' % i, owner=choice(users))
for i in xrange(400000))
}}}

3) Query data with join (the bug appears only if order by is applied):
{{{
Child.objects.all().order_by('-id').select_related('owner')[:2]
}}}
The resulting query would be:
{{{
{u'sql': u'SELECT `qq_child`.`id`, `qq_child`.`name`,
`qq_child`.`other_field`, `qq_child`.`owner_id`,
`auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`,
`auth_user`.`is_superuser`,
`auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`,
`auth_user`.`email`,
`auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`
FROM `qq_child`
INNER JOIN `auth_user` ON ( `qq_child`.`owner_id` = `auth_user`.`id` )
ORDER BY `qq_child`.`id` ASC
LIMIT 2',
u'time': u'4.608'},
}}}

(MySQL caches result until you update the table so the next time it would
take 0.001, but not in production if your table updates frequently)

If I replace INNER JOIN with STRAIGHT_JOIN:
{{{
list(Child.objects.raw(
Child.objects.all().order_by('-id').select_related('owner')[:2]\
.query.sql_with_params()[0].replace('INNER JOIN', 'STRAIGHT_JOIN')
))
}}}

I get this result:
{{{
{u'sql': u'SELECT `qq_child`.`id`, `qq_child`.`name`,
`qq_child`.`other_field`, `qq_child`.`owner_id`,
`auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`,
`auth_user`.`is_superuser`,
`auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`,
`auth_user`.`email`,
`auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`
FROM `qq_child`
STRAIGHT_JOIN `auth_user` ON ( `qq_child`.`owner_id` = `auth_user`.`id` )
ORDER BY `qq_child`.`id` DESC
LIMIT 2',
u'time': u'0.001'}]
}}}

'''4.6 seconds vs 0.001. Really?'''

'''Solutions:'''
1) SQLAlchemy implemented .with_prefix() method for query:
http://stackoverflow.com/a/16743228
2) Replace INNER JOIN for MySQL backend with STRAIGHT_JOIN, but it might
be not a good idea. I will investigate this today.

'''Wrong attempts to fix this:'''
Since select_related is used in django admin change_list if you ask for
related fields in columns, devs try to override it with .raw() -
http://stackoverflow.com/a/15978732/1178806 (I answered in comments why it
doesn't work).

--
Ticket URL: <https://code.djangoproject.com/ticket/22438>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 14, 2014, 9:51:35 PM4/14/14
to django-...@googlegroups.com
#22438: Slow INNER JOIN in MySQL can be fixed in Django ORM, but should it?
-------------------------------------+-------------------------------------
Reporter: frol | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: mysql, orm, slow | Unreviewed
query, join | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by frol):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

I've tried completely replace INNER JOIN with STRAIGHT_JOIN and run tests.
There were some fails, but only because those tests tried to search 'INNER
JOIN' string in queries. Execution time of testing without changes was 561
and after I changed INNER JOIN to STRAIGHT_JOIN, overall execution time
was 574. It might be just random fluctuation, but also a regression.

Thus I'm going to create separate project on github with 'enchanced' MySQL
backend and ask people to try it.

--
Ticket URL: <https://code.djangoproject.com/ticket/22438#comment:1>

Django

unread,
Apr 15, 2014, 3:50:58 PM4/15/14
to django-...@googlegroups.com
#22438: Slow INNER JOIN in MySQL can be fixed in Django ORM, but should it?
-------------------------------------+-------------------------------------
Reporter: frol | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: master
(models, ORM) | Resolution:
Severity: Normal | Triage Stage:
Keywords: mysql, orm, slow | Unreviewed
query, join | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by frol):

I made a workaround for SQLCompiler and pushed it to GitHub:
https://github.com/frol/django-mysql-fix
Good news: It works! It passes all Django-test without changes in Django.
Bad news: It looks awful because it manipulates with JoinInfo objects and
tables order.

It has to be tested in real projects...

--
Ticket URL: <https://code.djangoproject.com/ticket/22438#comment:2>

Django

unread,
Apr 29, 2014, 2:04:53 PM4/29/14
to django-...@googlegroups.com
#22438: Slow INNER JOIN in MySQL can be fixed in Django ORM, but should it?
-------------------------------------+-------------------------------------
Reporter: frol | Owner: nobody
Type: Uncategorized | Status: closed

Component: Database layer | Version: master
(models, ORM) | Resolution: wontfix

Severity: Normal | Triage Stage:
Keywords: mysql, orm, slow | Unreviewed
query, join | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by timo):

* status: new => closed
* resolution: => wontfix


Comment:

I'm not sure, but I found http://stackoverflow.com/a/516720 which
indicates this change may not be a good idea. Could you reopen the ticket
if you have information suggesting otherwise? If you still think the
change is a good one, it would probably be a good topic to bring up on the
django-developers mailing list.

--
Ticket URL: <https://code.djangoproject.com/ticket/22438#comment:3>

Django

unread,
May 17, 2014, 10:51:48 PM5/17/14
to django-...@googlegroups.com
#22438: Slow INNER JOIN in MySQL can be fixed in Django ORM, but should it?
-------------------------------------+-------------------------------------
Reporter: frol | Owner: nobody
Type: Uncategorized | Status: closed

Component: Database layer | Version: master
(models, ORM) | Resolution: wontfix
Severity: Normal | Triage Stage:
Keywords: mysql, orm, slow | Unreviewed
query, join | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by frol):

I've found another workaround. Once you specify null=True to the
ForeignKey field, Django will use LEFT OUTER JOINs instead of INNER JOIN.
LEFT OUTER JOIN works without performance issues in this case, but you may
face other issues that I'm not aware of yet.

@timo I've seen that comment, but, since Django tests passed and I
released the project for everybody, I've expected to hear real complains.

--
Ticket URL: <https://code.djangoproject.com/ticket/22438#comment:4>

Reply all
Reply to author
Forward
0 new messages