Django ORM Handling of Reverse Relationships and Multi-Value Relationships

1,416 views
Skip to first unread message

Andrew Standley

unread,
Mar 29, 2018, 5:26:44 PM3/29/18
to Django developers (Contributions to Django itself)
I have recently become acquainted with some ORM behaviour for reverse relationships that "makes no sense", and I'm hoping someone can explain the justification for the current behaviour. 

This specifically relates to `filter` behaviour referenced in 29271, and 16554 which seems tangentially related to several issues with `exclude` (244211464517315) and aggregate expressions (1660319415)

Most of the confusion about 'intended' behaviour and confirmed 'bugged' behaviour seems to relate to the ORM's use of joins for reverse relationships.
I think my personal confusion boils down to two questions.

1) Is there some fundamental limitation in the ORM that prevents reducing the number of joins? Several of these tickets indicate how the ORM could potentially produce similar results with queries that did not use multiple joins. Why is that not desirable behaviour?

2) Why is the current behaviour of `filter` for multi-value relationships 'intended'? I'm hoping I am missing something obvious but it seems to me that `Q` objects would support the type of behaviour suggested in the spanning multi-valued relationships documentation in a much more inituative manner. In a test case with models

class Related(models.Model):
    field
= models.CharField(max_length=100)

class Main(models.Model):
    field_one
= models.CharField(max_length=100)
    field_two
= models.CharField(max_length=100)
    related
= models.ForeignKey(Related, on_delete=models.CASCADE)

both

>>> Related.objects.filter(Q(main__field_two='2')|Q(main__field_one='1'))

SQL:
SELECT "test_app_related"."id", "test_app_related"."field" FROM "test_app_related" INNER JOIN "test_app_main" ON ("test_app_related"."id"= "test_app_main"."related_id") WHERE ("test_app_main"."field_two" = "2" OR "test_app_main"."field_one" = "1")

and

>>> Related.objects.filter(main__field_two='2').filter(main__field_one='1')

SQL:
SELECT "test_app_related"."id", "test_app_related"."field" FROM "test_app_related" INNER JOIN "test_app_main" ON ("test_app_related"."id"= "test_app_main"."related_id") INNER JOIN "test_app_main" T3 ON ("test_app_related"."id" = T3."related_id") WHERE ("test_app_main"."field_two" = "two" AND T3."field_one" = "one")

Produce exactly the same results but the second seems to have an unnecessary extra join, and directly contradicts the behaviour of filter with non multi-valued fields.



In short, could someone be kind enough to explain the justification for all this weird behaviour with multi-value relationships?


Cheers,
  Andrew

Ryan Hiebert

unread,
Mar 29, 2018, 5:57:39 PM3/29/18
to django-d...@googlegroups.com
It's a subtle difference between how a single filter works and two filters work together over to-many relationships. Here's a writeup that I found helpful: https://blog.ionelmc.ro/2014/05/10/django-sticky-queryset-filters/

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/22234c13-3152-473f-86bd-04b41efd9203%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Curtis Maloney

unread,
Mar 29, 2018, 6:54:42 PM3/29/18
to django-d...@googlegroups.com
On 03/30/2018 08:57 AM, Ryan Hiebert wrote:
> It's a subtle difference between how a single filter works and two
> filters work together over to-many relationships. Here's a writeup that
> I found helpful:
> https://blog.ionelmc.ro/2014/05/10/django-sticky-queryset-filters/

It's also addressed in the documentation:

https://docs.djangoproject.com/en/2.0/topics/db/queries/#spanning-multi-valued-relationships

--
Curtis

Josh Smeaton

unread,
Mar 29, 2018, 7:06:49 PM3/29/18
to Django developers (Contributions to Django itself)
It sounds like you understand what the current behaviour is, and you think that it would be better modelled with Q objects, is that the case? I can see where you're coming from, as even the docs explain the difference loosely in terms of AND and OR.

Q(entry__headline__contains='Lennon') & Q(entry__pubdate__year=2008) would be the filter(cond_a, cond_b).
Q(entry__headline__contains='Lennon') | Q(entry__pubdate__year=2008) would be the filter(cond_a).filter(cond_b).

The justification for **not** doing it this way, I suppose, would be that Q objects may not have existed at the time, or this was just never considered. The justification for not changing it to be this way now would be because it'd break user code everywhere.

I agree that the Q &/| logic would be more intuitive, but it's not something we can change at this point.

Marten Kenbeek

unread,
Mar 29, 2018, 7:31:23 PM3/29/18
to Django developers (Contributions to Django itself)
These queries are actually not equivalent. Consider the following code:

>>> r = Related.objects.create(field='related')
>>> r.main_set.create(field_one='1', field_two='3')
<Main: Main object>
>>> r.main_set.create(field_one='2', field_two='4')
<Main: Main object>

>>> Related.objects.filter(Q(main__field_two='2')|Q(main__field_one='1'))
<QuerySet [<Related: Related object>]>

>>> Related.objects.filter(main__field_two='2').filter(main__field_one='1')
<QuerySet []>

Here the first queryset matches the Related instance, because it satisfies Q(main__field_one='1'), so one of the OR conditions is satisfied. The second queryset doesn't match the Related instance because .filter(main__field_two='2') is not true. Further filters can only reduce the data returned by the queryset.

The difference between one or two filter calls is more subtle. Consider the following queries:

>>> Related.objects.filter(main__field_one='1').filter(main__field_two='4')
<QuerySet [<Related: Related object>]>
>>> Related.objects.filter(main__field_one='1', main__field_two='4')
<QuerySet []>

Here the first query returns the Related instance because it has a related Main instance with field_one='1', and it has a (different) related Main instance with field_two='4'. Thus it satisfies both conditions, and the AND condition is fulfilled. These can be different objects because of the two joins, but both conditions still need to be satisfied. The second query does not return the Related instance, because it does not have a single related Main instance that has both field_one='1' and field_two='4'. In this case there's only a single join, so the same Main instance has to satisfy both conditions.

Andrew Standley

unread,
Mar 29, 2018, 9:51:26 PM3/29/18
to Django developers (Contributions to Django itself)
Thank you all for the replies.

@Josh Smeaton 
Essentially yes; specifically I was wondering whether I was failing to consider behaviour that couldn't be modeled via a Q object, since as you mention the current https://docs.djangoproject.com/en/2.0/topics/db/queries/#spanning-multi-valued-relationships documentation uses AND vs OR to justify the behaviour.

@Marten Kenbeek
Got it. Thank you sincerely for the example. I feel a bit foolish now it's clicked as I realize the documentation says nearly the same thing, but for some reason even after reading through the spanning multi-valued relationship documentation multiple times I was completely missing the point. 

So would I be correct in stating that all the reported weirdness with objects appearing multiple times in QuerySets is mostly a side effect of implementing the AND condition on any related objects?  
Reply all
Reply to author
Forward
0 new messages