Possible bug with Q objects

22 visualizações
Pular para a primeira mensagem não lida

Simon Van De Keer

não lida,
19 de mai. de 2022, 13:54:0419/05/2022
para Django users
Hi,

I ran into an issue when querying using Q objects where I got some behaviour which I didn't expect, and I'd like to get some input on whether this behaviour is intended or not.

Assume an app testcase with the following code in a models.py file:
from django.db import models

class Item(models.Model):
    some_field = models.CharField(blank=True, max_length=20)

class TreeNode(models.Model):
    parent = models.ForeignKey('self', null=True, blank=True, related_name='children', on_delete=models.CASCADE)
    items = models.ManyToManyField(Item, related_name='tree_nodes')


Assume we have created the following objects:
tree_node_top = TreeNode.objects.create()
tree_node_child_1 = TreeNode.objects.create(parent=tree_node_top)
tree_node_child_2 = TreeNode.objects.create(parent=tree_node_top)
item_1 = Item.objects.create(some_field="123")
item_2 = Item.objects.create(some_field="456")
item_3 = Item.objects.create(some_field="123456")
tree_node_child_1.items.add(*[item_1, item_2])
tree_node_child_2.items.add(*[item_2, item_3])


When executing the following query:
from django.db.models import Q
test_query = TreeNode.objects.filter(
    Q(children__items__some_field__contains="123") &
    ~Q(children__items__some_field__contains="456")
).values("id", "children__items__some_field")

I get no results. 

However when executing this query:
test_query_2 = TreeNode.objects.filter(
    Q(items__some_field__contains="123") &
    ~Q(items__some_field__contains="456")
).values("id", "items__id")


I do get 1 result, namely tree_node_child_1 with item_1.

My expectation would be that both queries yield a single result, where the first one would result in tree_node_top with item_1. I.e. I would expect that in both cases both the Q object filters would be executed against each (and the same) item row.

The first query results in the following SQL:
SELECT 
       "testcase_treenode"."id",
       "testcase_item"."some_field"
FROM "testcase_treenode"
INNER JOIN "testcase_treenode" T2 ON ("testcase_treenode"."id" = T2."parent_id")
INNER JOIN "testcase_treenode_items" ON (T2."id" = "testcase_treenode_items"."treenode_id")
INNER JOIN "testcase_item" ON ("testcase_treenode_items"."item_id" = "testcase_item"."id")
WHERE (
       "testcase_item"."some_field" LIKE %123% ESCAPE '\'
       AND NOT (
              EXISTS(
                     SELECT (1) AS "a" FROM "testcase_treenode" U1
                     INNER JOIN "testcase_treenode_items" U2 ON (U1."id" = U2."treenode_id")
                     INNER JOIN "testcase_item" U3 ON (U2."item_id" = U3."id")
                     WHERE (U3."some_field" LIKE %456% ESCAPE '\'
                                   AND U1."id" = (T2."id")
                                   AND T2."parent_id" = ("testcase_treenode"."id")
                     ) LIMIT 1
              )
       )
)


The second query results in this SQL:
SELECT
       "testcase_treenode"."id",
       "testcase_item"."some_field"
FROM "testcase_treenode"
INNER JOIN "testcase_treenode_items" ON ("testcase_treenode"."id" = "testcase_treenode_items"."treenode_id")
INNER JOIN "testcase_item" ON ("testcase_treenode_items"."item_id" = "testcase_item"."id")
WHERE (
       "testcase_item"."some_field" LIKE %123% ESCAPE '\'
       AND NOT (
              EXISTS(
                     SELECT (1) AS "a" FROM "testcase_treenode_items" U1
                     INNER JOIN "testcase_item" U2 ON (U1."item_id" = U2."id")
                     WHERE (
                            U2."some_field" LIKE %456% ESCAPE '\'
                            AND U1."id" = ("testcase_treenode_items"."id")
                            AND "testcase_treenode_items"."treenode_id" = ("testcase_treenode"."id")
                     ) LIMIT 1
              )
       )
)

I am confused about the WHERE statement in both EXISTS queries. The first one joins with the outer query based on the treenode, thus causing the EXISTS to yield a result if one of the items of that tree node has "456" in its some_field value. 
The second one joins with the outer query based on the individual treenode_items links, thus checking the filter for each individual item as well, hence keeping individual items that do have "123", but don't have "456" in their some_field value.

The expectation that both of the above queries would yield 1 result stems from the fact that if you would have a query with Q(items__some_field="123") & Q(items__some_field="456") or Q(children__items__some_field="123") & Q(children__items__some_field="456") (so no negation of the second Q object)both Q objects' filter will be validating against each (and the same) item row. 
You could also argue that both queries should yield an empty result depending on how you interpret the filter statement.

I was wondering if this is intended and expected behaviour, or if this might be a bug with how the filter with Q objects is translated into an SQL query. 

Thanks,
Simon
Responder a todos
Responder ao autor
Encaminhar
0 nova mensagem