I have a code snippet like this:
{{{
qs = Order.objects.exclude(
payment__isnull=True
)
}}}
Which creates the following SQL:
{{{
SELECT "order"."id",
...
FROM "order"
WHERE NOT (EXISTS
(SELECT 1 AS "a"
FROM "order" U0
LEFT OUTER JOIN "payment" U1 ON (U0."id" = U1."payable_id"
AND
(U1."payable_type_id" = 30))
WHERE (U1."id" IS NULL
AND U0."id" = ("order"."id"))
LIMIT 1))
LIMIT 10
}}}
(This is part of original query)
In my large DB, this section increases response time about 10s.
By changing `exclude` to `filter`, we will not have subquery anymore:
{{{
qs = Order.objects.filter(
payment__isnull=False
)
}}}
And the SQL:
{{{
SELECT "order"."id",
...
FROM "order"
INNER JOIN "payment" ON ("order"."id" = "payment"."payable_id"
AND
("payment"."payable_type_id" = 30))
WHERE "payment"."id" IS NOT NULL
LIMIT 10
}}}
And with this, I don't face high response time anymore and everything is
fine.
Why Django uses subquery when using `exclude` ?
I just wanted to make sure it was intended.
--
Ticket URL: <https://code.djangoproject.com/ticket/34845>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Comment (by David Sanders):
Hello,
Trac isn't the right place to get help using Django, please refer to one
of our friendly communities where someone will be able to help you:
https://www.djangoproject.com/community/
Thanks!
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:1>
* status: new => closed
* resolution: => invalid
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:2>
Comment (by Amin Aminian):
Replying to [comment:1 David Sanders]:
> Hello,
>
> Trac isn't the right place to get help using Django, please refer to one
of our friendly communities where someone will be able to help you:
https://www.djangoproject.com/community/
>
> Thanks!
Thanks for your response David.
But I wasn't looking for help, I think it can be a performance issue ...
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:3>
Comment (by David Sanders):
So there's a couple of things here:
You asked:
> Why Django uses subquery when using exclude ?
Questions are usually interpreted as seeking help ;) Please refer all
questions to the community and not Trac ;)
> I just wanted to make sure it was intended.
We're aware of some possible performance issues related to the way Django
renders `NOT EXISTS`, eg: https://code.djangoproject.com/ticket/34597
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:4>
Comment (by David Sanders):
PS: Please read the thread for #34597 as there may be some useful
information for you. Be aware that there was a small patch for that which
looks like it will be released in the upcoming 5.0 release.
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:5>
* resolution: invalid => duplicate
Comment:
Possible duplicate of #34597
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:6>
Comment (by Simon Charette):
Amin, you haven't provided the full picture here.
I'm not sure if you're using `FilteredRelation`, `extra`, or a third party
application that does so explicitly for you but the ORM won't insert a
`AND ("payment"."payable_type_id" = 30)` in the `JOIN` from a simple
{{{#!python
class Order(models.Model):
pass
class Payment(models.Model):
order = models.ForeignKey(Order)
qs = Order.objects.exclude(
payment__isnull=True
)
}}}
Something else is at play here and I'm pretty sure you are running into a
lack of split exclude handling optimisations for `__null` when dealing
with `FilteredRelation` or something else (e.g. third-party app) that
makes use of it.
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:7>
Comment (by Simon Charette):
To add to the above, assuming you are using
`annotate(payment=FilteredRelation("payments",
Q(payments__payable_type=payable_type)))` or an equivalent, the reason why
the ORM cannot simply use a `LEFT OUTER JOIN` instead of `NOT EXISTS` when
doing `exclude(payment__isnull=False)` is that it lacks introspection
capabilities into `FilteredRelation.condition` to be sure that the
condition added to the `JOIN` wouldn't interfere.
In a sense
{{{#!python
INNER JOIN "payment" ON ("order"."id" = "payment"."payable_id"
AND
("payment"."payable_type_id" = 30))
WHERE "payment"."id" IS NOT NULL
}}}
is a ''tautology''. The `WHERE "payment"."id" IS NOT NULL` part is
unnecessary given the usage of an `INNER JOIN` already ensures that
`"payment"."id" IS NOT NULL` since primary keys cannot be null.
Normally the ORM would be able to ''know'' that `"payment"."id"` cannot be
null as it's a primary key but I suspect the usage of a `FilteredRelation`
prevents the ORM from knowing.
What #34597 did is switch the query from using `"order"."id" NOT IN
(subquery using LEFT JOIN)` to `NOT EXISTS(subquery using LEFT JOIN
referring to "order"."id" )` so if you can confirm that your query is
faster with `NOT IN` than with `EXISTS` it would be great to have another
data point added to #34597.
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:8>
Comment (by Amin Aminian):
Replying to [comment:7 Simon Charette]:
> Amin, you haven't provided the full picture here.
>
> I'm not sure if you're using `FilteredRelation`, `extra`, or a third
party application that does so implicitly for you but the ORM won't insert
a `AND ("payment"."payable_type_id" = 30)` in the `JOIN` from a simple
Hi Simon,
You are right, I've not provided full detail.
My models is sth like this:
{{{
class Order(models.Model);
...
payment = GenericRelation(
"Payment",
related_query_name="payment",
object_id_field="payable_id",
content_type_field="payable_type",
)
class Payment(models.Model):
...
payable_type = models.ForeignKey(
ContentType, on_delete=models.PROTECT, verbose_name=_("payable
type")
)
payable_id = models.PositiveIntegerField(verbose_name=_("payable id"))
payable_object = GenericForeignKey("payable_type", "payable_id")
}}}
Also note that the ORM the I provided earlier, is not the full ORM,
because I don't know if I'm allowed to share that here or not (it's for
company).
The thing was that I had a slow query, which was taking about 15s, and
fixed that just by using `filter` instead of `exclude`, so I just provided
that section of the query.
--
Ticket URL: <https://code.djangoproject.com/ticket/34845#comment:9>