Querying across FK produces too many OUTER JOINs (1.6) or FieldError (1.7)

721 views
Skip to first unread message

Jon Dufresne

unread,
Jul 4, 2014, 5:42:39 PM7/4/14
to django-d...@googlegroups.com
Suppose I have the following models:

---
class Container(models.Model):
pass

class Item(models.Model):
container = models.ForeignKey(Container)
previous = models.ForeignKey('self', null=True)
current = models.BooleanField()
flag = models.BooleanField()
---

Item represents a chain of items, all grouped by a container (like a
linked list). The field "current" represents the most recent item
(front of the list). The field "flag" is simply something to query on.

Suppose I perform the following query:

---
Item.objects.filter(current=True,
container__item__previous__isnull=True, container__item__flag=True)
---

That is, I'm looking for all current items such that the first item in
the chain has flag = true. Django 1.6 produces the following SQL for
this query:

---
SELECT ...
FROM "myapp_item"
INNER JOIN "myapp_container"
ON ( "myapp_item"."container_id" = "myapp_container"."id" )
LEFT OUTER JOIN "myapp_item" T3
ON ( "myapp_container"."id" = T3."container_id" )
WHERE ("myapp_item"."current" = True AND T3."previous_id" IS NULL AND
T3."flag" = True )
---

The OUTER JOIN is the problem. Why is this not a more efficient INNER
JOIN? This query is very inefficient as the the database gets larger.
This causes slow downs on pages.

Interestingly, if I try this on 1.7, this query is apparently not even
supported. I get the following error:

---
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/jon/djtest/djtest/test.py", line 3, in <module>
qs = Item.objects.filter(current=True,
container__item__previous__isnull=True, container__item__flag=True)
File "/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/manager.py",
line 92, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/query.py",
line 689, in filter
return self._filter_or_exclude(False, *args, **kwargs)
File "/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/query.py",
line 707, in _filter_or_exclude
clone.query.add_q(Q(*args, **kwargs))
File "/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/sql/query.py",
line 1287, in add_q
clause, require_inner = self._add_q(where_part, self.used_aliases)
File "/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/sql/query.py",
line 1314, in _add_q
current_negated=current_negated, connector=connector)
File "/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/sql/query.py",
line 1181, in build_filter
lookups, value)
File "/home/jon/djtest/venv/lib/python2.7/site-packages/django/db/models/fields/related.py",
line 1506, in get_lookup_constraint
raise exceptions.FieldError('Relation fields do not support nested lookups')
FieldError: Relation fields do not support nested lookups
---

A slow query is better than no query at all. Any particular reason
this was removed? How would one model and query what I'm trying to
achieve moving forward to avoid inefficient queries and exceptions?

Should I file this as a bug?

Cheers,
Jon

Javier Guerra Giraldez

unread,
Jul 4, 2014, 5:54:49 PM7/4/14
to django-d...@googlegroups.com
On Fri, Jul 4, 2014 at 4:42 PM, Jon Dufresne <jon.du...@gmail.com> wrote:
> Item.objects.filter(current=True,
> container__item__previous__isnull=True, container__item__flag=True)
> ---
>
> That is, I'm looking for all current items such that the first item in
> the chain has flag = true. Django 1.6 produces the following SQL for
> this query:


what purpose does the "container__item__previous__isnull=True"
argument serve here? i think it means "an item that belongs to a
container that has an item with no 'previous'", which if it's a linked
list, then any non-empty container would comply, and since you start
the query from the item, then the container is non-empty by
definition.

btw, a linked list in a database? can you elaborate on that idea?

--
Javier

Jon Dufresne

unread,
Jul 4, 2014, 6:08:09 PM7/4/14
to django-d...@googlegroups.com
On Fri, Jul 4, 2014 at 2:54 PM, Javier Guerra Giraldez
<jav...@guerrag.com> wrote:
> what purpose does the "container__item__previous__isnull=True"
> argument serve here?

To filter on the initial item in the list. I'm looking for flag=True
*only* on the first item in the list. flag=True could be set on other
items in the same chain, but I don't care about those. The goal of the
query is:

I'm looking for all *current* items such that the *first item* in the
chain has *flag = true*.

In raw SQL I want the query to be:

SELECT myapp_item.*
FROM "myapp_item"
INNER JOIN "myapp_container"
ON ( "myapp_item"."container_id" = "myapp_container"."id" )
INNER JOIN "myapp_item" T3
ON ( "myapp_container"."id" = T3."container_id" )
WHERE ("myapp_item"."current" = True AND T3."previous_id" IS NULL AND
T3."flag" = True )

So the WHERE condition flag=True, only applies to the initial item.

>i think it means "an item that belongs to a container that has an item with no 'previous'"

Correct, but not just any container, but the same container as the
item being filtered.

> which if it's a linked list, then any non-empty container would comply

Right, except then the "container__item__flag=True" should influence it further.

> and since you start the query from the item, then the container is non-empty by definition.

Yes, correct.

> btw, a linked list in a database? can you elaborate on that idea?

It is just one way to think about it. It is not really a linked list.
I simply meant the items are linked together by the "previous" point.

In my actual application item represent a step with history (previous
being that history). Container represents the process of the step (has
actual fields). So each step has a previous step in the process.
Sometimes I want to query for the current step in the process based on
the initial step.

Cheers,
Jon

Stephen J. Butler

unread,
Jul 4, 2014, 6:26:44 PM7/4/14
to django-d...@googlegroups.com
On Fri, Jul 4, 2014 at 4:42 PM, Jon Dufresne <jon.du...@gmail.com> wrote:
Suppose I have the following models:

---
class Container(models.Model):
    pass

class Item(models.Model):
    container = models.ForeignKey(Container)
    previous = models.ForeignKey('self', null=True)
    current = models.BooleanField()
    flag = models.BooleanField()
---

[snip]
 
A slow query is better than no query at all. Any particular reason
this was removed? How would one model and query what I'm trying to
achieve moving forward to avoid inefficient queries and exceptions?

Two ways come to mind:

* If it's only "flag" on the first item in a sequence then move "flag" to be on present on Container also. A "first_flag" field. Could manage in a signal or override Item.save(). That would let you not nest the query, and could be more efficient.

* Build a list of relevant containers first: Container.objects.filter(item__previous__isnull=True, item__flag=True).values_list('pk', flat=True). Then Items.objects.filter(current=True, container__in=flagged_containers). Not a problem as long as your query doesn't grow too large for your DB (that is, too many containers that might be considered)

Javier Guerra Giraldez

unread,
Jul 4, 2014, 6:37:37 PM7/4/14
to django-d...@googlegroups.com
On Fri, Jul 4, 2014 at 5:23 PM, Stephen J. Butler
<stephen...@gmail.com> wrote:
> * Build a list of relevant containers first:
> Container.objects.filter(item__previous__isnull=True,
> item__flag=True).values_list('pk', flat=True). Then
> Items.objects.filter(current=True, container__in=flagged_containers). Not a
> problem as long as your query doesn't grow too large for your DB (that is,
> too many containers that might be considered)


i guess Items.objects.filter(current=True,
container__in=Container.objects.filter(item__previous__isnull=True,
item__flag=True)) would be better, it gives the the opportunity to not
get the container list in Python.

... and this is more appropriate on django-users

--
Javier

Jon Dufresne

unread,
Jul 4, 2014, 6:47:30 PM7/4/14
to django-d...@googlegroups.com
> ... and this is more appropriate on django-users

Sorry, but I felt like I was reporting information interesting to developers.

1.) Wrong JOIN type (OUTER vs INNER) producing inefficient queries
2.) FieldError in 1.7 where there wasn't one in 1.6

If these are of no interest, I will not continue the discussion.

Cheers,
Jon

Anssi Kääriäinen

unread,
Jul 4, 2014, 10:44:02 PM7/4/14
to django-d...@googlegroups.com

This have been fixed in 1.7. Pre-1.7 there was no join "demotion". That is, when the ORM generated a LEFT JOIN expression it was never changed back to INNER JOIN. Here the container__item__previous__isnull=True generates LEFT JOIN for the item join.

In 1.7 there exists some new code to do join demotion. In this case it works because container__item__flag=True can produce results only when the item join produces results => the query never produces results with LEFT JOIN of item => the existing LEFT JOIN for alias T3 can be "demoted" to INNER JOIN.

Interestingly, if I try this on 1.7, this query is apparently not even
supported. I get the following error:

I wasn't able to reproduce this - are you sure the query is exactly the same both on 1.6 and 1.7? Are models are set up correctly before generating the query? If after double-checking you still get the error, could you post a sample project somewhere?

 - Anssi

Anssi Kääriäinen

unread,
Jul 4, 2014, 10:46:34 PM7/4/14
to django-d...@googlegroups.com


On Saturday, July 5, 2014 1:47:30 AM UTC+3, Jon Dufresne wrote:
Sorry, but I felt like I was reporting information interesting to developers.

1.) Wrong JOIN type (OUTER vs INNER) producing inefficient queries
2.) FieldError in 1.7 where there wasn't one in 1.6

If these are of no interest, I will not continue the discussion.

This is the right list, especially concerning 2). Even if 2) turns out to be a false alarm, I much rather get a couple of false alarms than a regression in final release of 1.7.

 - Anssi

Jon Dufresne

unread,
Jul 5, 2014, 2:11:00 PM7/5/14
to django-d...@googlegroups.com
On Fri, Jul 4, 2014 at 7:44 PM, Anssi Kääriäinen
<anssi.ka...@thl.fi> wrote:
> I wasn't able to reproduce this - are you sure the query is exactly the same
> both on 1.6 and 1.7? Are models are set up correctly before generating the
> query? If after double-checking you still get the error, could you post a
> sample project somewhere?

I have created a test repository at
<https://github.com/jdufresne/django-test-field-error>

Upon recreating the test scenario, I have discovered that this is a
false alarm. In my original testing I didn't add my app with the
models to the list of INSALLED_APPS. Without this step (which is
clearly a bug in my test application) the FieldError occurs.

Testing, I see that the OUTER JOIN in 1.6 did in fact change to an
INNER JOIN in 1.7! Thanks for that! Sorry about the confusion and my
mistakes.

Cheers,
Jon
Reply all
Reply to author
Forward
0 new messages