[Django] #27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN

16 views
Skip to first unread message

Django

unread,
Sep 22, 2016, 7:16:30 AM9/22/16
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
----------------------------------------------+--------------------
Reporter: Sven R. Kunze | Owner: nobody
Type: Cleanup/optimization | Status: new
Component: Database layer (models, ORM) | Version: 1.8
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
Original Query:

{{{
BigTable.objects.filter(
Q(tablea__item_id__in=my_items) | Q(tableb__item_id__in=my_items)
)
}}}

resulting in:

{{{
SELECT *
FROM "big_table"
LEFT OUTER JOIN "tablea" ON ("big_table"."id" = "tablea"."big_table_id")
LEFT OUTER JOIN "tableb" ON ("big_table"."id" = "tableb"."big_table_id")
WHERE ("tablea"."item_id" IN (<handful of items>) OR
"tableb"."item_id" IN (<handful of items>))
}}}

The {{{LEFT OUTER JOIN}}} combined with a growing number of rows in
big_table results in very slow execution of the query.
We temporarily rewrote the query by using two separate queries and merging
the sets via Python:


{{{
BigTable.objects.filter(
Q(tablea__item_id__in=my_items)
)
}}}
{{{
SELECT *
FROM "big_table"
INNER JOIN "tablea" ON ("big_table"."id" = "tablea"."big_table_id")
WHERE "tablea"."item_id" IN (<handful of items>)
}}}

and

{{{
BigTable.objects.filter(
Q(tableb__item_id__in=my_items)
)
}}}
{{{
SELECT *
FROM "big_table"
INNER JOIN "tableb" ON ("big_table"."id" = "tableb"."big_table_id")
WHERE "tableb"."item_id" IN (<handful of items>)
}}}

Ideally, we would like to use the original query because it is clearer and
less code to maintain. If you want to see the models, I can post them as
well. Ideally again, we don't actually expect models to influence the type
of the joins.

We want BigTable to be filtered, so, if there's no appropriate match on
the TableA or TableB, the BigTable item simply does not appear in the
result set. However, {{{LEFT OUTER JOIN}}} would result in such a row
which then needs to be removed afterwards laboriously (at least PostgreSQL
struggles to do so). If the original query had used {{{INNER JOIN}}}, the
query would have been fast.

--
Ticket URL: <https://code.djangoproject.com/ticket/27260>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 22, 2016, 9:12:57 AM9/22/16
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

Please also provide your models.

I'm not certain whether or not your proposed changes are correct. If you
can provide a patch so we can check whether or not it breaks existing
behavior, it'll be more easily to evaluate the idea.

Looking for possibly related tickets, I found #8439.

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:1>

Django

unread,
Sep 22, 2016, 9:34:05 AM9/22/16
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Sven R. Kunze):

Replying to [comment:1 Tim Graham]:


> Please also provide your models.

{{{
class BigTable:
# a lot of unrelated fields
}}}
{{{
class TableA:
big_table = models.OneToOneField(BigTable, related_name='...',
primary_key=True)
item_id = models.CharField(max_length=1024)
}}}
{{{
class TableB:
big_table = models.ForeignKey(BigTable, related_name='...')
item_id = models.CharField(max_length=1024)
}}}

> I'm not certain whether or not your proposed changes are correct. If you
can provide a patch so we can check whether or not it breaks existing
behavior, it'll be more easily to evaluate the idea.

Neither are we. Maybe, a {{{UNION}}} could also work. The real issue we
have is the slow execution of the original query.

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:2>

Django

unread,
Sep 22, 2016, 10:02:10 AM9/22/16
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

FYI, if you're not sure it's a bug, you should
[wiki:TicketClosingReasons/UseSupportChannels use our support channels]
before opening a ticket.

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:3>

Django

unread,
Sep 22, 2016, 12:30:50 PM9/22/16
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: closed

Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* status: new => closed
* resolution: => invalid


Comment:

I think [https://gist.github.com/akaariai/a4c1acbfedac0f2cbf3e Ansii's
write up on join promotion] explains why `LEFT OUTER JOIN` is required
here.

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:4>

Django

unread,
Sep 23, 2016, 2:40:30 AM9/23/16
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Sven R. Kunze):

Thanks, Tim. You are right, that {{{OUTER LEFT JOIN}}} is required here.

However, I might state again that our real problem is the performance
issue. I simultaneously posted a question on the PostgreSQL mailing list:

https://www.postgresql.org/message-id/flat/97dbd6f9-a289-94d1-325f-
997df87c672a%40mail.de#97dbd6f9-a289-94d1...@mail.de

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:5>

Django

unread,
Sep 23, 2016, 8:26:41 AM9/23/16
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

I haven't worked with the ORM enough to know if changing the query
operator is feasible.

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:6>

Django

unread,
Sep 29, 2016, 8:33:35 AM9/29/16
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Sven R. Kunze):

@Tim

On the one hand, I somehow doubt that it's Django responsibility to do it
(cf. https://www.postgresql.org/message-id/8ccdd027-f4a0-2e0d-
1e26-2a403fed8d17%40mail.de ).

On the other hand, there's already an asymmetry between a single "filter"
and an "OR filter". The former uses "INNER JOIN", the latter uses "OUTER
JOIN".

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:7>

Django

unread,
Mar 13, 2017, 11:49:47 AM3/13/17
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Thomas Güttler):

Just for the records. We have performance issues on postgres because some
SQL statements have up to six OUTER joins.

Union and intersection would be easier to read for me (human being) and
maybe better to optimize for the postgres query planer.

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:8>

Django

unread,
Mar 13, 2017, 12:04:36 PM3/13/17
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

Support for `QuerySet.union()`, `intersection()`, `difference()` was added
in #27718.

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:9>

Django

unread,
Mar 15, 2017, 5:43:20 AM3/15/17
to django-...@googlegroups.com
#27260: Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
-------------------------------------+-------------------------------------

Reporter: Sven R. Kunze | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Database layer | Version: 1.8
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Thomas Güttler):

Just for the records, we use this pattern until we update to Django 1.11:

Use case: Select users which have permission "some_app.permission_name"
and which are in group "special_group"
{{{
user_ids_with_group =
User.objects.filter(groups__in=[special_group]).distinct().values_list('pk',
flat=True)
user_ids_with_permission =
User.objects.filter(authutils.users_with_perm_q('some_app.permission_name')).distinct().values_list('pk',
flat=True)
qs =
User.objects.filter(pk__in=set(user_ids_with_group).intersection(user_ids_with_permission)).distinct().order_by('last_name',
'first_name')


# authutils
def users_with_perm_q(perm, is_active=True, is_superuser=True):
try:
app_label, codename = perm.split('.')
except ValueError:
raise ValueError('Permission name should be in the form
"app_label.perm_name".')

permission = Permission.objects.get(codename=codename,
content_type__app_label=app_label)
user_q = Q(user_permissions=permission)
group_q = Q(groups__permissions=permission)
return Q(is_active=is_active) & (Q(is_superuser=is_superuser) | user_q
| group_q)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/27260#comment:10>

Reply all
Reply to author
Forward
0 new messages