.filter() and .exclude() don't add up

65 views
Skip to first unread message

Daniele Procida

unread,
Jul 26, 2013, 11:57:38 AM7/26/13
to Django Users
How is this possible?

# we start with a queryset actual_events

# get forthcoming_events using filter()

forthcoming_events = actual_events.filter(
Q(single_day_event = True, date__gte = datetime.now()) | \
Q(single_day_event = False, end_date__gte = datetime.now())
)

# get previous_events using exclude() and exactly the same terms as above

previous_events = actual_events.exclude(
Q(single_day_event = True, date__gte = datetime.now()) | \
Q(single_day_event = False, end_date__gte = datetime.now())
)

# And now:

# actual_events.count(): 467
# forthcoming_events.count(): 24
# previous_events.count(): 442

SInce I have run .filter() and .exclude() with identical terms, should they not between them contain all the items in the queryset they acted upon, *whatever* the terms used?

Daniele

Steve McConville

unread,
Jul 26, 2013, 1:00:07 PM7/26/13
to django...@googlegroups.com
Firstly (and I don't think this is the cause of the problem) you're
calling datetime.now() four times, which will give you four different
datetimes (ie. the queries will not be completely identical). Secondly
SQL uses a 3-valued logic (with null) so if any of the fields you're
filtering on are nullable you may not be able to rely on the law of
the excluded middle:

https://en.wikipedia.org/wiki/Null_(SQL)#Law_of_the_excluded_fourth_.28in_WHERE_clauses.29
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
steve
http://stevemcconville.com/

Daniele Procida

unread,
Jul 26, 2013, 5:28:38 PM7/26/13
to django...@googlegroups.com
On Fri, Jul 26, 2013, Steve McConville <mcconvil...@gmail.com> wrote:

>Firstly (and I don't think this is the cause of the problem) you're
>calling datetime.now() four times, which will give you four different
>datetimes (ie. the queries will not be completely identical).

Good point, I will address that.

> Secondly
>SQL uses a 3-valued logic (with null) so if any of the fields you're
>filtering on are nullable you may not be able to rely on the law of
>the excluded middle

So, if one of the fields can be Null, then *neither*:

queryset.filter(field=value)

queryset.exclude(field=value)

will match a record where it's Null?

In that case, is there a better - more reliable - way than using both .filter() and .exclude() with the same terms to split a queryset into all those items that match a filter, and all those that don't?

Thanks,

Daniele

Bill Freeman

unread,
Jul 26, 2013, 5:36:59 PM7/26/13
to django-users
You really should figure out which record isn't showing up in either sub case and look at it in detail to see if NULLs are involved before you spend time trying to fix a problem that you don't have.

You could, for example collect all the ids from the several queries into python sets, union the sub queries, and take the difference of that from the total query.


Steve McConville

unread,
Jul 26, 2013, 5:50:05 PM7/26/13
to django...@googlegroups.com
> So, if one of the fields can be Null, then *neither*:
>
> queryset.filter(field=value)
>
> queryset.exclude(field=value)
>
> will match a record where it's Null?

As I understand it, this is correct - it's certainly the way SQL was designed.

> In that case, is there a better - more reliable - way than using both .filter() and .exclude() with the same terms to split a queryset into all those items that match a filter, and all those that don't?

As well as doing the queries above, you can get the nulls by doing

queryset.filter(field__isnull=True)

and then it's just a matter of deciding which of your two sets the
nulls should be included in (probably with a logical OR). There are a
number of other approaches as well, but somewhere we'll still have to
decide what the meaning of null. My bias is towards making that
explicit. The other obvious approach is to find queryset.all() and
then the (null-less) set queryset.filter(field=value) and work with
them.

--
steve
http://stevemcconville.com/

Daniele Procida

unread,
Jul 27, 2013, 4:09:59 AM7/27/13
to django...@googlegroups.com
On Fri, Jul 26, 2013, Bill Freeman <ke1...@gmail.com> wrote:

>You really should figure out which record isn't showing up in either sub
>case and look at it in detail to see if NULLs are involved before you spend
>time trying to fix a problem that you don't have.
>
>You could, for example collect all the ids from the several queries into
>python sets, union the sub queries, and take the difference of that from
>the total query.

Thanks, after a bit I got hold of the field=Null items, which were appearing unexpectedly in an earlier queryset.

Danie

Daniele Procida

unread,
Jul 27, 2013, 11:52:31 AM7/27/13
to django...@googlegroups.com
On Fri, Jul 26, 2013, Steve McConville <mcconvil...@gmail.com> wrote:

>> So, if one of the fields can be Null, then *neither*:
>>
>> queryset.filter(field=value)
>>
>> queryset.exclude(field=value)
>>
>> will match a record where it's Null?
>
>As I understand it, this is correct - it's certainly the way SQL was designed.
>
>> In that case, is there a better - more reliable - way than using
>both .filter() and .exclude() with the same terms to split a queryset
>into all those items that match a filter, and all those that don't?
>
>As well as doing the queries above, you can get the nulls by doing
>
>queryset.filter(field__isnull=True)

I meant is there a more general way of doing something like:

red_things = queryset.filter(colour="red") # get red things

non_red_things = queryset - red_things # get all other things

Maybe this simply isn't possible in SQL.

Daniele

Steve McConville

unread,
Jul 27, 2013, 1:46:58 PM7/27/13
to django...@googlegroups.com
Perhaps I'm not sure exactly what you mean by "more general", but I
was recommending something like

red_things = queryset.filter(Q(color="red"))
non_red_things = queryset.filter(~Q(color="red") | Q(color__isnull=True)

This will produce SQL like

SELECT * FROM queryset WHERE color IS 'red';
SELECT * FROM queryset WHERE color IS NOT 'red' OR color IS NULL;

The set non_red_things will be the complement of red_things.

--
steve
http://stevemcconville.com/

Daniele Procida

unread,
Jul 28, 2013, 3:44:49 AM7/28/13
to django...@googlegroups.com
I understood that part. But by "more general" I mean one that will work for any case, without having to know where the Nulls might be.

So given queryset A, and its subset queryset B, we can place B against A and obtain its complement.

Or to put it another way: give me all the items in A that are not in B.

Daniele



akaariai

unread,
Jul 29, 2013, 3:36:55 AM7/29/13
to django...@googlegroups.com

You can do this with a subquery in Django. non_red_things = queryset.exclude(pk__in=red_things). If this performs well is a different thing.

I think that in SQL one can use WHERE (original_condition) is not true; which will match both unknown (null comparison's result) and false in the original condition.

 - Anssi

Daniele Procida

unread,
Jul 30, 2013, 11:26:47 AM7/30/13
to django...@googlegroups.com
On Mon, Jul 29, 2013, akaariai <akaa...@gmail.com> wrote:

>> I understood that part. But by "more general" I mean one that will work
>> for any case, without having to know where the Nulls might be.
>>
>> So given queryset A, and its subset queryset B, we can place B against A
>> and obtain its complement.
>>
>> Or to put it another way: give me all the items in A that are not in B.
>>
>
>You can do this with a subquery in Django. non_red_things =
>queryset.exclude(pk__in=red_things). If this performs well is a different
>thing.

It seems to take about twice as long to execute, so no, it doesn't perform very well.

>I think that in SQL one can use WHERE (original_condition) is not true;
>which will match both unknown (null comparison's result) and false in the
>original condition.

But this isn't available as a Django query, without using raw SQL?

Daniele

akaariai

unread,
Jul 31, 2013, 2:56:21 AM7/31/13
to django...@googlegroups.com
No it isn't. Writing a patch that adds QuerySet.negate() operation would be fairly straightforward. If such an operation will be accepted to Django is a different question. In my opinion the main question is if queries written as "WHERE (original_condition) is not true" will perform well enough. If not, then adding the operation isn't a good idea, but if it generally performs well, then addition of it seems like a good idea to me.

 - Anssi
Reply all
Reply to author
Forward
0 new messages