[Django] #18785: Extra-Join Regression from ticket #15316

19 views
Skip to first unread message

Django

unread,
Aug 17, 2012, 6:28:43 PM8/17/12
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
----------------------------------------------+--------------------
Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer (models, ORM) | Version: 1.4
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
----------------------------------------------+--------------------
The change to prevent over-trimming in trim_joins() for
https://code.djangoproject.com/ticket/15316 appears to have introduced
new, unneeded joins (LEFT OUTERs) as a side effect. The result is that in
certain cases Django 1.4+ creates less efficient SQL than Django 1.3.

The following test passes in 1.3.x and fails w/ #15316's change. The
failure is due to a new LEFT OUTER JOIN appearing against Document's
delete_log.

queries/models.py
{{{
class AuditLog(models.Model):
info = models.CharField(max_length=50)

class Person(models.Model):
name = models.CharField(max_length=50)
delete_log = models.ForeignKey(AuditLog)

class Document(models.Model):
status = models.CharField(max_length=50)
delete_log = models.ForeignKey(AuditLog)
person = models.ForeignKey(Person)
}}}

queries/tests:
{{{
class OneFourRegressionProof(unittest.TestCase):
"""
Should evolve this into a better test case that proves something
positive, but for now this
is attempting to provide evidence that for some queries we got less
efficient in Django 1.4
"""
def setUp(self):
pass

def test_it(self):
"""
Ideally, this would only create on INNER join on person.
"""
qs =
Document.objects.exclude(delete_log__isnull=False).filter(status='something',
person__delete_log__isnull=True)
self.assertEquals(1, str(qs.query).count('INNER JOIN'))
self.assertEquals(0, str(qs.query).count('OUTER JOIN'))
}}}

git bisect results:
{{{
(django-base)~/projects/elation/django-fork ((2e56066...)|BISECTING)
philltornroth → git bisect bad
2e56066a5b93b528fbce37285bac591b44bc6ed7 is the first bad commit
commit 2e56066a5b93b528fbce37285bac591b44bc6ed7
Author: Malcolm Tredinnick <malcolm.t...@gmail.com>
Date: Tue Aug 23 03:38:42 2011 +0000

Fixed an isnull=False filtering edge-case. Fixes #15316.

The bulk of this patch is due to some fine analysis from Aleksandra
Sendecka.

git-svn-id: http://code.djangoproject.com/svn/django/trunk@16656
bcc190cf-cafb-0310-a4f2-bffc1f526a37

:100644 100644 5635f2141d5ed22bfcc03df7fd5456a48f7f051e
38c7601579c5b366a5027358a3806b24baeaddfb M AUTHORS
:040000 040000 33a384a63b8e8d7e3b325640339f13a14babc315
fe57cf0e7bc932f15d181d3a186bffe453d6bb2a M django
:040000 040000 afb2570e431195108c3d8b0b634e4b30eac42168
e1213b550270d389ffdca827116642b036855a6b M tests
}}}

It's worth noting that this is effectively an exacerbation of
https://code.djangoproject.com/ticket/10790 . I began the discussion on
that ticket and a patch is starting to take shape, but it looks too
drastic to be a 1.4 merge candidate. I wanted to stand up this related
ticket so that a more surgical fix to 1.4 could be considered, if
reasonable.

Anecdotally, I'm seeing queries in our system where this change added 3-6
new joins, making an upgrade to 1.4 a complete non-starter for us unless
there's a workaround that I haven't turned up yet.

--
Ticket URL: <https://code.djangoproject.com/ticket/18785>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Aug 25, 2012, 7:50:32 AM8/25/12
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------
Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by akaariai):

* needs_better_patch: => 0
* stage: Unreviewed => Accepted
* needs_tests: => 0
* needs_docs: => 0


Comment:

I am accepting this as a regression in 1.4. To get this fixed in 1.4 will
need a simple patch. I have hopes that 1.5 will contain a more thorough
setup_joins() refactoring from #10790, but as said, that will not help
this ticket as the changes are way too large to backpatch.

--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:1>

Django

unread,
Aug 25, 2012, 12:41:33 PM8/25/12
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------
Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by famousactress@…):

Awesome! Thanks much for the support on this ticket and #10790. It's
definitely appreciated. I'm still not well enough versed in the ORM code
to be much help putting a patch together in short order, but I'd be more
than willing to write more/better unit tests that verify the
regression/fix... at this point I'm plenty familiar with the queries
regression tests :)

--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:2>

Django

unread,
Aug 25, 2012, 2:04:38 PM8/25/12
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------
Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by akaariai):

I think you will need two thing here:
- better pruning logic in trim_joins(). You will need to have better
information about which joins are direct foreign keys and thus trimmable.
- you will likely need to do similar improvement to Query.combine() as
was done here:
https://github.com/akaariai/django/blob/d301010fb92d6588d75fb3f9ecfe95ca6042cfed/django/db/models/sql/query.py#L480,
that is, do not skip unused aliases, just unref them after addition.

I think you should try to avoid changing existing return value of
setup_joins() or adding new mandatory parameters to trim_joins(). Breaking
these APIs in a minor release doesn't look nice, even if they are private
APIs.

Alternatively you might also want to check if you could solve #15316 in a
way that doesn't introduce the regression. It seems that handling
nonnull_check in the way it is done currently will simply not work.

If you can test #10790 and see if it solves this regression for you, that
would be very welcome.

--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:3>

Django

unread,
Aug 26, 2012, 8:14:33 AM8/26/12
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------
Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by akaariai):

I have a pretty good understanding of what is going on. The trim joins
logic should go as follows: if you have joins to forward direction (you
are following a foreign key), and the value you want is the thing the
foreign key points to (usually the PK of the foreign model), then you can
trim the join. This will always be valid, as the value of the foreign key
_must_ be the same it is one the referenced model, and there can only be
one value on the other side (the other side must be unique). If the
foreign key is nullable and contains a null, the join wouldn't produce
anything anyways.

On the other hand if you are following a foreign key to the reverse
direction (usually from some model's PK to the foreign key in the
referenced model), then you can _never_ trim the join. The reason for this
is that you can not know if there is anything on the other side, that is,
even if you know the value the foreign key must have, you can't know if
there is anything on the other side. As an example:
{{{
table a table b
id 1 a_id 1
id 2
}}}
Now, if you make the assumption that if you know the value of id in table
a, then you know the value of a_id in table b then you are wrong, as the
id=2 value doesn't have any matching row on the other side.

Current Django code makes the assumption you can trim joins in the reverse
case. This was blocked for the !OneToOneKey case in #15316 by the
nonnull_check, but I don't believe that to be the correct fix. Just block
all trimming in the reverse case and you have your solution. The question
is how do you pass the direction information from setup_joins to
trim_joins with minimal (internal) API breakages. You might want to add a
completely new variable to the Query class ("join_directions" or something
like that), which tracks the join directions for each alias generated and
nothing else. This way you need only internal changes to setup_joins,
trim_joins, `Query.__init__` and !Query.clone(). This solution isn't too
beautiful, but I don't expect it to bite us. In any case it seems obvious
code changes in this are will not be backportable, as the API is going to
change a lot in 1.5.

I might be overcautious above. Another solution is to just add the
direction to alias_map items and be done with it. This will break 3rd
party code that uses alias_map. As this is totally private API one could
claim that they had it coming all along... Still, it seems we can avoid
this break somewhat easily, so why be evil...

You can get the directions from the patch in #10790, names_to_path().

If you still keep the left outer joins in place in trim_joins(), I believe
you don't need any changes to !Query.combine(). In #10790 the left joins
are trimmed, too, as I can't see any reason why they are non-trimmable.

The reverse join trimming issue isn't commonly seen, as it is somewhat
rare to have reverse filters which are trimmable to begin with. The most
common case is !OneToOneKey, or doing something like
`A.objects.filter(b__a_id=1)`, but that usually doesn't make sense,
instead you would just do A.objects.filter(id=1).

--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:4>

Django

unread,
Sep 27, 2012, 10:06:20 AM9/27/12
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------

Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by trbs):

* cc: trbs@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:5>

Django

unread,
May 31, 2013, 4:59:22 PM5/31/13
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------

Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by famousactress@…):

This regression from 1.4 still appears broken in the last 1.4 and 1.5
builds. Are there any plans to fix it? Will improvements in 1.6 resolve
this issue? Our project is orphaned on 1.3 because this change causes a
huge query spike and abysmal performance.

--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:6>

Django

unread,
May 31, 2013, 5:21:42 PM5/31/13
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------

Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.4
(models, ORM) | Resolution:
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by famousactress@…):

I should point out though, the problem looks fixed in 1.6. I confirmed
that our project unit test is now failing in an awesome direction using
the newest bits.. the join count for the little test I wrote to remind us
to look out for this error is 2 joins in v1.3, 4 joins in 1.4 & 1.5, and
1 join in 1.6. Huge thanks to Anssi who's work on
https://code.djangoproject.com/ticket/10790 I believe is responsible for
the major improvements. This ought to mean that basically everyone with a
non-trivial Django project just got a really significant performance
boost. If a 1.4 or 1.5 patch is too hairy, I suppose we can jump past them
to 1.6 when it's stable.

--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:7>

Django

unread,
May 31, 2013, 5:29:13 PM5/31/13
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------
Reporter: famousactress@… | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 1.4
(models, ORM) | Resolution: fixed

Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------
Changes (by Anssi Kääriäinen <akaariai@…>):

* status: new => closed
* resolution: => fixed


Comment:

In [changeset:"84909377f2f2eea07ab8bf398dafc69e2d736b50"]:
{{{
#!CommitTicketReference repository=""
revision="84909377f2f2eea07ab8bf398dafc69e2d736b50"
Fixed #18785 -- Added Test join trimming regression

The regression was caused by patch to ticket #15316 and was fixed by a
patch to #10790.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:8>

Django

unread,
May 31, 2013, 5:32:52 PM5/31/13
to django-...@googlegroups.com
#18785: Extra-Join Regression from ticket #15316
-------------------------------------+-------------------------------------
Reporter: famousactress@… | Owner: nobody

Type: Bug | Status: closed
Component: Database layer | Version: 1.4
(models, ORM) | Resolution: fixed
Severity: Normal | Triage Stage: Accepted
Keywords: | Needs documentation: 0
Has patch: 0 | Patch needs improvement: 0
Needs tests: 0 | UI/UX: 0
Easy pickings: 0 |
-------------------------------------+-------------------------------------

Comment (by akaariai):

Backpatching of all the ORM changes to 1.5 can't be done. Way too much
risk, I am sure there are still regressions hiding in there. If a very
targeted patch for this issue is written then that might be a candidate to
1.5 as a regression fix.

--
Ticket URL: <https://code.djangoproject.com/ticket/18785#comment:9>

Reply all
Reply to author
Forward
0 new messages