Is a Complex filter with __in and __exact possible?

7,315 views
Skip to first unread message

Alfonso

unread,
Mar 4, 2009, 12:58:36 PM3/4/09
to Django users
I've set up a simple filter to grab a queryset of objects matching a
list of (many to many) categories like this:

Book.objects.filter(categories__in=[1,2,3]).order_by('name').distinct
()

It's good, it works... it returns any books in those three
categories.But I'd like to return ONLY Books with those three
categories?

I thought something like categories__exact__in would work but clearly
not.

Any ideas?

Thanks

Daniel Hepper

unread,
Mar 4, 2009, 6:06:58 PM3/4/09
to Django users
You can try this query:

Book.objects.filter(categories=1,categories=2,categories=3)

Hope that helps

-- Daniel

Malcolm Tredinnick

unread,
Mar 4, 2009, 10:14:40 PM3/4/09
to django...@googlegroups.com
On Wed, 2009-03-04 at 15:06 -0800, Daniel Hepper wrote:
> You can try this query:
>
> Book.objects.filter(categories=1,categories=2,categories=3)
>
> Hope that helps

It won't. A filter() method is a normal Python function or method call.
You can only specify each keyword argument exactly once. Because of the
way Django implements queryset methods, only the last keyword argument
will be used (no error will be raised), but it's still not what is
expected.

Regards,
Malcolm

Malcolm Tredinnick

unread,
Mar 4, 2009, 10:19:27 PM3/4/09
to django...@googlegroups.com
On Wed, 2009-03-04 at 09:58 -0800, Alfonso wrote:
> I've set up a simple filter to grab a queryset of objects matching a
> list of (many to many) categories like this:
>
> Book.objects.filter(categories__in=[1,2,3]).order_by('name').distinct
> ()
>
> It's good, it works... it returns any books in those three
> categories.But I'd like to return ONLY Books with those three
> categories?

There are two different versions of this question. One of them is
possible, the other probably not just yet (at least, I haven't worked
out a neat answer).

First version: I want books that are in all three categories (plus
possibly some extra ones):

Answer: Relies on using Django trunk (or the upcoming Django 1.1), since
it uses aggregation/annotations.

from django.db.models import Count

Book.objects.filter(categories__in=[1, 2, 3]).annotate(num_cats=Count('categories').filter(num_cats=3)

Second version: I want books that are in all three categories, but just
those three categories and *no others*.

Answer: hmmm... :-(

This one is possible in raw SQL, but it's a fairly fiddly query. I can't
make it work with annotate() yet, although I just played around for
about 30 minutes to see what was going on. It might be a subtle
limitation in annotate(), but that's only a new feature, so we can
extend it in the future. Basically, the count I do in the first answer
counts the number of categories that have been selected, not the total
number of categories available for each book (which would require some
extra table joins). I'd also want to filter on the total number of
categories available in this version of the question, so I get stuck.

Still, I suspect you want the first version of the question, which has
the "simple" solution I've given.

Regards,
Malcolm


Malcolm Tredinnick

unread,
Mar 4, 2009, 10:26:16 PM3/4/09
to django...@googlegroups.com
On Thu, 2009-03-05 at 14:19 +1100, Malcolm Tredinnick wrote:
[...]

> Second version: I want books that are in all three categories, but just
> those three categories and *no others*.
>
> Answer: hmmm... :-(
>
> This one is possible in raw SQL, but it's a fairly fiddly query. I can't
> make it work with annotate() yet, although I just played around for
> about 30 minutes to see what was going on. It might be a subtle
> limitation in annotate(), but that's only a new feature, so we can
> extend it in the future. Basically, the count I do in the first answer
> counts the number of categories that have been selected, not the total
> number of categories available for each book (which would require some
> extra table joins). I'd also want to filter on the total number of
> categories available in this version of the question, so I get stuck.

So, naturally, as soon as I hit "send", the solution occurs to me. Not
short, but it works. Here's how to get all books that are in precisely
those three categories: no more and no less (line breaks inserted for
readability, hopefully):

Book.objects.
filter(categories__in=[1, 2, 3]).
annotate(num_cats=Count('categories').
filter(num_cats=3).
exclude(id__in=
Book.objects.annotate(all_cats=Count('categories')).
filter(all_cats__gt=3))


It's a pretty ugly SQL query and I haven't put in the time to work out
if it's the most efficient we can do. But it does the right thing.
Again, requires Django 1.1-alpha because it relies on annotations and
nested querysets.

Regards,
Malcolm

Daniel Hepper

unread,
Mar 5, 2009, 10:13:11 AM3/5/09
to Django users
Yes, this does obviously not work as expected. Sorry for the
misinformation.

But would it work if every condition was encapsulated in a Q-object?

Book.objects.filter(Q(categories=1), Q(categories=2), Q(categories=3))

-- Daniel

On Mar 5, 4:14 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Malcolm Tredinnick

unread,
Mar 5, 2009, 6:29:49 PM3/5/09
to django...@googlegroups.com
On Thu, 2009-03-05 at 07:13 -0800, Daniel Hepper wrote:
> Yes, this does obviously not work as expected. Sorry for the
> misinformation.
>
> But would it work if every condition was encapsulated in a Q-object?
>
> Book.objects.filter(Q(categories=1), Q(categories=2), Q(categories=3))

Not if you were trying to solve the original poster's question. Your
query is exactly the same as what he tried to do originally.

Regards,
Malcolm


Daniel Hepper

unread,
Mar 6, 2009, 3:11:43 AM3/6/09
to Django users
> > Book.objects.filter(Q(categories=1), Q(categories=2), Q(categories=3))
>
> Not if you were trying to solve the original poster's question. Your
> query is exactly the same as what he tried to do originally.

I played a bit with the query and just wanted to clarify that it is
not exactly the same. (Note that the Q-objects are "AND"ed, not
"OR"ed).

>>> Publication.objects.filter(article__in=[1,2,3])
[<Publication: Highlights for Children>, <Publication: Science News>,
<Publication: Science Weekly>, <Publication: The Python Journal>,
<Publication: The Python Journal>]

>>> Publication.objects.filter(Q(article=1), Q(article=2), Q(article=3))
[]

(I've used the models from the Many-To-Many-Example _1.)

The second query will always return an empty result. This is the
generated SQL:

{'sql': u'SELECT "bookstore_publication"."id",
"bookstore_publication"."title" FROM "bookstore_publication" INNER
JOIN "bookstore_article_publications" ON ("bookstore_publication"."id"
= "bookstore_article_publications"."publication_id") WHERE
("bookstore_article_publications"."article_id" = 1 AND
"bookstore_article_publications"."article_id" = 2 AND
"bookstore_article_publications"."article_id" = 3 ) ORDER BY
"bookstore_publication"."title" ASC LIMIT 21',
'time': '0.000'}

Thinking about it, this makes perfect sense, as the condition is on
the article, not on the set.

It would be nice to have special lookups for Many-To-Many-Fields or
related sets, which would allow something like
Publication.objects.filter(article_set__exact=[1,2,3])

-- Daniel

1_ http://www.djangoproject.com/documentation/models/many_to_many/

Malcolm Tredinnick

unread,
Mar 6, 2009, 8:03:09 PM3/6/09
to django...@googlegroups.com
On Fri, 2009-03-06 at 00:11 -0800, Daniel Hepper wrote:
> > > Book.objects.filter(Q(categories=1), Q(categories=2), Q(categories=3))
> >
> > Not if you were trying to solve the original poster's question. Your
> > query is exactly the same as what he tried to do originally.
>
> I played a bit with the query and just wanted to clarify that it is
> not exactly the same. (Note that the Q-objects are "AND"ed, not
> "OR"ed).
>
> >>> Publication.objects.filter(article__in=[1,2,3])
> [<Publication: Highlights for Children>, <Publication: Science News>,
> <Publication: Science Weekly>, <Publication: The Python Journal>,
> <Publication: The Python Journal>]
>
> >>> Publication.objects.filter(Q(article=1), Q(article=2), Q(article=3))
> []
>
> (I've used the models from the Many-To-Many-Example _1.)

Yes, you're completely correct. My apologies (and thanks for not just
believe me)! I was being lazy and didn't think it through. The second
query tries to find a single article objects that simultaneously has pk
values of 1, 2 and 3, as you notice in your SQL. Unsurprisingly, that
doesn't exist.

>
> The second query will always return an empty result. This is the
> generated SQL:
>
> {'sql': u'SELECT "bookstore_publication"."id",
> "bookstore_publication"."title" FROM "bookstore_publication" INNER
> JOIN "bookstore_article_publications" ON ("bookstore_publication"."id"
> = "bookstore_article_publications"."publication_id") WHERE
> ("bookstore_article_publications"."article_id" = 1 AND
> "bookstore_article_publications"."article_id" = 2 AND
> "bookstore_article_publications"."article_id" = 3 ) ORDER BY
> "bookstore_publication"."title" ASC LIMIT 21',
> 'time': '0.000'}
>
> Thinking about it, this makes perfect sense, as the condition is on
> the article, not on the set.
>
> It would be nice to have special lookups for Many-To-Many-Fields or
> related sets, which would allow something like
> Publication.objects.filter(article_set__exact=[1,2,3])

That can be done already. Although, as I mentioned in my first reply to
the original poster, the question is ambiguous.

I also realised last night that my first reply on this topic also had a
stupid mistake caused by me trying to answer both questions at once.
I've already shown how to get all objects that are in precisely those
categories and no others. Making that easier (e.g. two straight filters
or something similar) is actually fairly hard internally, since the API
shouldn't suck (I've spent a couple of hours thinking about it and
poking at the code, so I'm not just guessing here).

However, find things that are simultaneously in all those categories can
be done without all the annotation nonsense I posted. Simply

Publication.objects.filter(article=1).filter(article=2).filter(article=3)

It's a short loop to build up such a set dynamically. The difference
between that (3 filter calls) and you Q-object version is described in
the documentation:
http://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

Regards,
Malcolm

Malcolm Tredinnick

unread,
Mar 7, 2009, 12:00:55 AM3/7/09
to django...@googlegroups.com
One day I'll stop posting in this thread. Really.

On Sat, 2009-03-07 at 12:03 +1100, Malcolm Tredinnick wrote:

> However, find things that are simultaneously in all those categories can
> be done without all the annotation nonsense I posted. Simply
>
> Publication.objects.filter(article=1).filter(article=2).filter(article=3)
>
> It's a short loop to build up such a set dynamically. The difference
> between that (3 filter calls) and you Q-object version is described in
> the documentation:
> http://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

I forgot to mention that this approach is probably only practical for a
small number of article instances. Each filter() adds a new table join,
so at some point the database will complain. Either literally, about too
many joins, or by taking ages to attempt to optimise and execute the
query.

For a large number of article instances, the annotation-based solution I
gave earlier is more appropriate.

Regards,
Malcolm

Tassilo Karge

unread,
Apr 9, 2018, 6:52:52 PM4/9/18
to Django users
Probably I should not post into a ten-year old thread, 
but it was this thread which helped me in my problem, so I want to share
my solution to the problem in case anyone else finds this.


Am Samstag, 7. März 2009 06:00:55 UTC+1 schrieb Malcolm Tredinnick:
One day I'll stop posting in this thread. Really.

On Sat, 2009-03-07 at 12:03 +1100, Malcolm Tredinnick wrote:

> However, find things that are simultaneously in all those categories can
> be done without all the annotation nonsense I posted. Simply
>
>         Publication.objects.filter(article=1).filter(article=2).filter(article=3)

 
I tried that solution, and it works, but as you said is slow. One thing to note is
that it includes tables with at least those three articles, so another query is needed
to exclude the remaining ones (if an exact matching is what we are looking for).

>        
> It's a short loop to build up such a set dynamically. The difference
> between that (3 filter calls) and you Q-object version is described in
> the documentation:
> http://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

I forgot to mention that this approach is probably only practical for a
small number of article instances. Each filter() adds a new table join,
so at some point the database will complain. Either literally, about too
many joins, or by taking ages to attempt to optimise and execute the
query.


Exactly, in my tests I found a large difference even with only about 20 joins
and 300 objects in the database. Definitely nothing for performance.
 

For a large number of article instances, the annotation-based solution I
gave earlier is more appropriate.


It does not filter exactly however: If one of the articles is contained, and
the num_cats is coincidentially exactly 3, it will include the article.
But I wrote, based on your annotation suggestion, a working solution
for exact matching of a many-to-many field:

def many_to_many_exact(manager, fieldname, foreign_keys, foreign_manager):
from django.db.models import Count
result = manager
filter_dict = {fieldname + '__in': foreign_keys}
result = result.filter(**filter_dict)
count_name = 'count_' + fieldname
annotate_dict = {count_name: Count(fieldname)}
filter_dict = {count_name: len(foreign_keys)}
result = result.annotate(**annotate_dict).filter(**filter_dict)
exclude_dict = {fieldname + '__in': foreign_manager.exclude(pk__in=foreign_keys)}
return result.exclude(**exclude_dict)
 
As for the example, this would be the same as the following code:

Book.objects
.filter(categories__in=[1, 2, 3])
   .annotate(num_cats=Count('categories'))
.filter(num_cats=3)
.exclude(categories__in=Categories.objects.exclude(pk__in=[1, 2, 3]))

All of the above code, including the categories for exclude, are executed in one query.
Probably the categories of a book are way less than the number
of existing categories, thus the database system will be smart and check
them against an index in the exclusion part.
The resulting query has the following structure:

SELECT table.*, COUNT(many_to_many_table.related_id) AS "count_related" 
FROM table INNER JOIN many_to_many_table
ON (table.id = many_to_many_table.table_id)
WHERE (many_to_many_table.related_id IN ( ...(e.g. 1,2,3) )
AND NOT (table.id IN
(SELECT V1.table_id FROM many_to_many_table V1
WHERE V1.related_id IN
(SELECT U0.id FROM related U0
WHERE NOT (U0.id IN ( ...(same as above) ))))))
GROUP BY table.id
HAVING COUNT(many_to_many_table.related_id) = ...(e.g. 3)

Of course there are two nestings, but I am afraid 
it is not to be solved more efficiently without custom sql.

Best regards,
Tassilo

Reply all
Reply to author
Forward
0 new messages