It won't. A filter() method is a normal Python function or method call.
You can only specify each keyword argument exactly once. Because of the
way Django implements queryset methods, only the last keyword argument
will be used (no error will be raised), but it's still not what is
expected.
Regards,
Malcolm
There are two different versions of this question. One of them is
possible, the other probably not just yet (at least, I haven't worked
out a neat answer).
First version: I want books that are in all three categories (plus
possibly some extra ones):
Answer: Relies on using Django trunk (or the upcoming Django 1.1), since
it uses aggregation/annotations.
from django.db.models import Count
Book.objects.filter(categories__in=[1, 2, 3]).annotate(num_cats=Count('categories').filter(num_cats=3)
Second version: I want books that are in all three categories, but just
those three categories and *no others*.
Answer: hmmm... :-(
This one is possible in raw SQL, but it's a fairly fiddly query. I can't
make it work with annotate() yet, although I just played around for
about 30 minutes to see what was going on. It might be a subtle
limitation in annotate(), but that's only a new feature, so we can
extend it in the future. Basically, the count I do in the first answer
counts the number of categories that have been selected, not the total
number of categories available for each book (which would require some
extra table joins). I'd also want to filter on the total number of
categories available in this version of the question, so I get stuck.
Still, I suspect you want the first version of the question, which has
the "simple" solution I've given.
Regards,
Malcolm
So, naturally, as soon as I hit "send", the solution occurs to me. Not
short, but it works. Here's how to get all books that are in precisely
those three categories: no more and no less (line breaks inserted for
readability, hopefully):
Book.objects.
filter(categories__in=[1, 2, 3]).
annotate(num_cats=Count('categories').
filter(num_cats=3).
exclude(id__in=
Book.objects.annotate(all_cats=Count('categories')).
filter(all_cats__gt=3))
It's a pretty ugly SQL query and I haven't put in the time to work out
if it's the most efficient we can do. But it does the right thing.
Again, requires Django 1.1-alpha because it relies on annotations and
nested querysets.
Regards,
Malcolm
Not if you were trying to solve the original poster's question. Your
query is exactly the same as what he tried to do originally.
Regards,
Malcolm
Yes, you're completely correct. My apologies (and thanks for not just
believe me)! I was being lazy and didn't think it through. The second
query tries to find a single article objects that simultaneously has pk
values of 1, 2 and 3, as you notice in your SQL. Unsurprisingly, that
doesn't exist.
>
> The second query will always return an empty result. This is the
> generated SQL:
>
> {'sql': u'SELECT "bookstore_publication"."id",
> "bookstore_publication"."title" FROM "bookstore_publication" INNER
> JOIN "bookstore_article_publications" ON ("bookstore_publication"."id"
> = "bookstore_article_publications"."publication_id") WHERE
> ("bookstore_article_publications"."article_id" = 1 AND
> "bookstore_article_publications"."article_id" = 2 AND
> "bookstore_article_publications"."article_id" = 3 ) ORDER BY
> "bookstore_publication"."title" ASC LIMIT 21',
> 'time': '0.000'}
>
> Thinking about it, this makes perfect sense, as the condition is on
> the article, not on the set.
>
> It would be nice to have special lookups for Many-To-Many-Fields or
> related sets, which would allow something like
> Publication.objects.filter(article_set__exact=[1,2,3])
That can be done already. Although, as I mentioned in my first reply to
the original poster, the question is ambiguous.
I also realised last night that my first reply on this topic also had a
stupid mistake caused by me trying to answer both questions at once.
I've already shown how to get all objects that are in precisely those
categories and no others. Making that easier (e.g. two straight filters
or something similar) is actually fairly hard internally, since the API
shouldn't suck (I've spent a couple of hours thinking about it and
poking at the code, so I'm not just guessing here).
However, find things that are simultaneously in all those categories can
be done without all the annotation nonsense I posted. Simply
Publication.objects.filter(article=1).filter(article=2).filter(article=3)
It's a short loop to build up such a set dynamically. The difference
between that (3 filter calls) and you Q-object version is described in
the documentation:
http://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Regards,
Malcolm
On Sat, 2009-03-07 at 12:03 +1100, Malcolm Tredinnick wrote:
> However, find things that are simultaneously in all those categories can
> be done without all the annotation nonsense I posted. Simply
>
> Publication.objects.filter(article=1).filter(article=2).filter(article=3)
>
> It's a short loop to build up such a set dynamically. The difference
> between that (3 filter calls) and you Q-object version is described in
> the documentation:
> http://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
I forgot to mention that this approach is probably only practical for a
small number of article instances. Each filter() adds a new table join,
so at some point the database will complain. Either literally, about too
many joins, or by taking ages to attempt to optimise and execute the
query.
For a large number of article instances, the annotation-based solution I
gave earlier is more appropriate.
Regards,
Malcolm
One day I'll stop posting in this thread. Really.On Sat, 2009-03-07 at 12:03 +1100, Malcolm Tredinnick wrote:
> However, find things that are simultaneously in all those categories can
> be done without all the annotation nonsense I posted. Simply
>
> Publication.objects.filter(article=1).filter(article=2).filter(article=3)
>
> It's a short loop to build up such a set dynamically. The difference
> between that (3 filter calls) and you Q-object version is described in
> the documentation:
> http://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationshipsI forgot to mention that this approach is probably only practical for a
small number of article instances. Each filter() adds a new table join,
so at some point the database will complain. Either literally, about too
many joins, or by taking ages to attempt to optimise and execute the
query.
For a large number of article instances, the annotation-based solution I
gave earlier is more appropriate.
def many_to_many_exact(manager, fieldname, foreign_keys, foreign_manager):
from django.db.models import Count
result = manager
filter_dict = {fieldname + '__in': foreign_keys}
result = result.filter(**filter_dict)
count_name = 'count_' + fieldname
annotate_dict = {count_name: Count(fieldname)}
filter_dict = {count_name: len(foreign_keys)}
result = result.annotate(**annotate_dict).filter(**filter_dict)
exclude_dict = {fieldname + '__in': foreign_manager.exclude(pk__in=foreign_keys)}
return result.exclude(**exclude_dict)
Book.objects
.filter(categories__in=[1, 2, 3])
.annotate(num_cats=Count('categories'))
.filter(num_cats=3)
.exclude(categories__in=Categories.objects.exclude(pk__in=[1, 2, 3]))
SELECT table.*, COUNT(many_to_many_table.related_id) AS "count_related"
FROM table INNER JOIN many_to_many_table
ON (table.id = many_to_many_table.table_id)
WHERE (many_to_many_table.related_id IN ( ...(e.g. 1,2,3) )
AND NOT (table.id IN
(SELECT V1.table_id FROM many_to_many_table V1
WHERE V1.related_id IN
(SELECT U0.id FROM related U0
WHERE NOT (U0.id IN ( ...(same as above) ))))))
GROUP BY table.id
HAVING COUNT(many_to_many_table.related_id) = ...(e.g. 3)