anti-join query in Django

94 views
Skip to first unread message

dmitry b

unread,
Nov 18, 2011, 7:41:15 PM11/18/11
to Django users
Is there a recommended approach to anti-join queries? Here's the
query I'm having an issue with:

Branch.objects.filter(branchgroup__isnull=True)[:1]

where branchgroup is a ManytoMany relationship to another model. I
want to get a set of Branch objects that aren't mapped to any
BranchGroups. This query gets translated to the following (a
simplified equivalent):


SELECT "scm_branch"."id"
FROM "scm_branch"
LEFT OUTER JOIN "scm_branchgroup_branches"
ON ("scm_branch"."id" =
"scm_branchgroup_branches"."branch_id")
LEFT OUTER JOIN "scm_branchgroup"
ON ("scm_branchgroup_branches"."branchgroup_id" =
"scm_branchgroup"."id")
WHERE "scm_branchgroup"."id" IS NULL
LIMIT 1

This query is very slow: Limit (cost=1072479.36..6256437.79 rows=1
width=145)

However, a slightly modified, but functionally equivalent query:

SELECT "scm_branch"."id"
FROM "scm_branch"
LEFT OUTER JOIN "scm_branchgroup_branches"
ON ("scm_branch"."id" =
"scm_branchgroup_branches"."branch_id")
LEFT OUTER JOIN "scm_branchgroup"
ON ("scm_branchgroup_branches"."branchgroup_id" =
"scm_branchgroup"."id")
WHERE "scm_branchgroup_branches"."branch_id" IS NULL
LIMIT 1

Is orders of magnitude faster: Limit (cost=1518.71..1533.35 rows=1
width=145)

The difference is with the WHERE clause. Django generates WHERE
"scm_branchgroup"."id" IS NULL, but a properly optimized query should
use WHERE "scm_branchgroup_branches"."branch_id" IS NULL. This is
because Postgres recognizes the second query as a anti-join query and
can do a lot of optimization.

The basic question is: How do I make Django generate a faster query?


Thanks
Dmitry

Russell Keith-Magee

unread,
Nov 19, 2011, 5:36:35 AM11/19/11
to django...@googlegroups.com
On Sat, Nov 19, 2011 at 8:41 AM, dmitry b <dmitry...@gmail.com> wrote:
> Is there a recommended approach to anti-join queries?  Here's the
> query I'm having an issue with:
>
> Branch.objects.filter(branchgroup__isnull=True)[:1]
>
> where branchgroup is a ManytoMany relationship to another model.  I
> want to get a set of Branch objects that aren't mapped to any
> BranchGroups.

Hi Dmitry,

There isn't any specific way to force the non-creation of a join
condition; Django doesn't expose the internals of query construction,
and should be choosing the optimal SQL for the ORM query.

The sort of optimization you describe (i.e., don't do the join if you
can just check the primary key value on the local model) is an
optimization that Django is definitely capable of performing. If the
ORM isn't performing this optimization in this case, it's possible
you've found a bug.

There are some cases where the optimization *isn't* possible, but
without seeing the models for your test case, it's impossible to say
for certain.

If you think you've found a way that Django could optimize it's
queries better, the best way forward is to open a ticket, with a set
of sample models and the expected/actual query that is generated for
those specific models.

Yours,
Russ Magee %-)

dmitry b

unread,
Nov 21, 2011, 12:39:35 PM11/21/11
to Django users
Hi Russ,

Thanks for a quick response.

I don't think Django needs to make the decision of whether or not to
do a join. That can be done by the underlying database as long as
Django generates a query that the database can understand and optimize
properly. In this case, it would imply that Django would still need
to generate a join, but check the null condition not on the primary
key field but on one of the fields the join condition.

Regardless, I'm going to file a ticket on this.

Thanks again.
D.

On Nov 19, 2:36 am, Russell Keith-Magee <russ...@keith-magee.com>
wrote:

Russell Keith-Magee

unread,
Nov 21, 2011, 8:53:20 PM11/21/11
to django...@googlegroups.com
On Tue, Nov 22, 2011 at 1:39 AM, dmitry b <dmitry...@gmail.com> wrote:
> Hi Russ,
>
> Thanks for a quick response.
>
> I don't think Django needs to make the decision of whether or not to
> do a join.  That can be done by the underlying database as long as
> Django generates a query that the database can understand and optimize
> properly.

This assumes that databases are equally good at performing
optimizations of this kind. It's true that PostgreSQL does, and I
believe that Oracle does as well; however, MySQL and SQLite are not
equally blessed (to the best of my knowledge).

Yours,
Russ Magee %-)

Reply all
Reply to author
Forward
0 new messages