Unfortunately the problem is a bit deeper in the ORM. In general, when
filtering against a multivalued relation (reverse foreign key or m2m
relation) Django uses joins instead of subqueries. This can result in
duplicating rows if multiple related rows match the filter. The fix
recommended in the docs is to apply .distinct() to the query. Using
distinct can result in slow query execution speed, and many DBAs see
this kind of usage of distinct as a surefire sign that the query is
written poorly (and I agree here).
We really should be using IN/EXISTS queries for filters spanning
multivalued relations, but for historical reasons we don't. We do that
for exclude and negated filters already.
There is also a technical problem which needs to be solved before
changing how __isnull=False works. If we want to make __isnull=False to
do EXISTS/IN query, then the additional icontains filter in the query
qs.filter(somefield__isnull=False, somefield__othercol__icontains='a')
must be pushed to the same IN/EXISTS clause. This isn't easy to do
correctly (and we don't do it correctly for exclude queries currently).
In addition, there are cases where subqueries aren't wanted, older
versions of MySQL for example choke on subqueries.
So, there are a couple of complications for this idea:
- Support for pushing multiple filter conditions to the same subquery
is needed even if we consider support for only isnull=False filter.
- We should consider using IN/EXISTS + subquery for multivalued
relations in filters. This is challenging both technically and from
backwards compatibility perspective.
- Anssi
> it, send an email to django-developers
> +
unsub...@googlegroups.com.