[Django] #28297: Same queryset result in two different queries on ORM

已查看 37 次
跳至第一个未读帖子

Django

未读,
2017年6月10日 12:50:322017/6/10
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus | Owner: nobody
Renno |
Type: | Status: new
Uncategorized |
Component: Database | Version: 1.11
layer (models, ORM) |
Severity: Normal | Keywords: join, annotation, F
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Sometimes when I run a set of filter/annotation the result is different
for the same variables.

This is the queryset command:


{{{
ingredients = ['tomato']
self.queryset = Recipe.objects.all()
self.queryset =
self.queryset.annotate(total=Count('steps__ingredients', distinct=True))
self.queryset =
self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
self.queryset =
self.queryset.annotate(available=Count('steps__ingredients',
distinct=True))
self.queryset = self.queryset.filter(total=F('available'))
}}}


This is the wrong query result that comes often times:


{{{
SELECT recipebook_recipe.id, recipebook_recipe.name,
recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id) AS total,
COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS
available FROM recipebook_recipe LEFT OUTER JOIN recipebook_recipe_steps
ON (recipebook_recipe.id = recipebook_recipe_steps.recipe_id) LEFT OUTER
JOIN recipebook_steprecipe ON (recipebook_recipe_steps.steprecipe_id =
recipebook_steprecipe.id) LEFT OUTER JOIN
recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id =
recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN
recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER
JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN
recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER
JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id)
INNER JOIN recipebook_ingredient ON (T9.ingredient_id =
recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato")
GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id)) ORDER BY NULL
}}}


And this is the right query that shows up sometimes:


{{{
SELECT recipebook_recipe.id, recipebook_recipe.name,
recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id) AS total,
COUNT(DISTINCT T8.recipeingredient_id) AS available FROM recipebook_recipe
LEFT OUTER JOIN recipebook_recipe_steps ON (recipebook_recipe.id =
recipebook_recipe_steps.recipe_id) LEFT OUTER JOIN recipebook_steprecipe
ON (recipebook_recipe_steps.steprecipe_id = recipebook_steprecipe.id) LEFT
OUTER JOIN recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id
= recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN
recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER
JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN
recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER
JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id)
INNER JOIN recipebook_ingredient ON (T9.ingredient_id =
recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato")
GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT
T8.recipeingredient_id)) ORDER BY NULL
}}}


As you can see they are different. The wrong one does not set the variable
'available' appropriately. I wonder if this is something on the method
`.query.join()`

--
Ticket URL: <https://code.djangoproject.com/ticket/28297>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

未读,
2017年6月10日 12:52:312017/6/10
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:

Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Marcus Renno:

Old description:

New description:

--

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:1>

Django

未读,
2017年6月10日 20:04:322017/6/10
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Hello Marcus,

You'll have to provide a minimal models setup to allow us to identify the
issue as it's almost impossible to figure out what could be wrong it.
Please do so and reopen this ticket with the details.

Also, did you reproduce against Django 1.11 and the master branch?

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:2>

Django

未读,
2017年6月10日 20:04:412017/6/10
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Uncategorized | Status: closed

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: needsinfo

Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* status: new => closed
* resolution: => needsinfo


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:3>

Django

未读,
2017年6月10日 23:26:212017/6/10
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Uncategorized | Status: new

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Marcus Renno):

* status: closed => new
* resolution: needsinfo =>


Old description:

New description:

Sometimes when I run a set of filter/annotation the result is different

for the same variables. I'm using Django 1.11.2

The models used (simplified)


{{{
class Recipe(models.Model):
name = models.CharField(max_length=50)
steps = models.ManyToManyField(StepRecipe)

class StepRecipe(models.Model):
ingredients = models.ManyToManyField(RecipeIngredient)

class RecipeIngredient(models.Model):
ingredient = models.ForeignKey(Ingredient)

class Ingredient(models.Model):
name = models.CharField(max_length=50)

}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:4>

Django

未读,
2017年6月11日 00:24:512017/6/11
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Marcus Renno:

Old description:

> Sometimes when I run a set of filter/annotation the result is different


> for the same variables. I'm using Django 1.11.2
>
> The models used (simplified)
>

> {{{
> class Recipe(models.Model):
> name = models.CharField(max_length=50)
> steps = models.ManyToManyField(StepRecipe)
>
> class StepRecipe(models.Model):
> ingredients = models.ManyToManyField(RecipeIngredient)
>
> class RecipeIngredient(models.Model):
> ingredient = models.ForeignKey(Ingredient)
>
> class Ingredient(models.Model):
> name = models.CharField(max_length=50)
>
> }}}
>

New description:

Sometimes when I run a set of filter/annotation the result is different

for the same variables. I'm using Django 1.11.2

The models used (simplified)


{{{
class Recipe(models.Model):
name = models.CharField(max_length=50)
steps = models.ManyToManyField(StepRecipe)

class StepRecipe(models.Model):
ingredients = models.ManyToManyField(RecipeIngredient)

class RecipeIngredient(models.Model):
ingredient = models.ForeignKey(Ingredient)

class Ingredient(models.Model):
name = models.CharField(max_length=50)

}}}

This is the queryset command:


{{{
ingredients = ['tomato']
self.queryset = Recipe.objects.all()
self.queryset = self.queryset.annotate(total=Count('steps__ingredients',
distinct=True))
self.queryset =
self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
self.queryset =
self.queryset.annotate(available=Count('steps__ingredients',
distinct=True))
self.queryset = self.queryset.filter(total=F('available'))

#I feel like this order_by 'pk' or 'id' is bugging the query. If I remove
this order_by I couldn't replicate the problem yet. As I said it doesn't
happen 100%.
return self.queryset.order_by('pk')
}}}

--

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:5>

Django

未读,
2017年6月11日 00:25:542017/6/11
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Marcus Renno:

Old description:

> Sometimes when I run a set of filter/annotation the result is different


> for the same variables. I'm using Django 1.11.2
>
> The models used (simplified)
>

> {{{
> class Recipe(models.Model):
> name = models.CharField(max_length=50)
> steps = models.ManyToManyField(StepRecipe)
>
> class StepRecipe(models.Model):
> ingredients = models.ManyToManyField(RecipeIngredient)
>
> class RecipeIngredient(models.Model):
> ingredient = models.ForeignKey(Ingredient)
>
> class Ingredient(models.Model):
> name = models.CharField(max_length=50)
>
> }}}
>

> This is the queryset command:
>

> {{{
> ingredients = ['tomato']
> self.queryset = Recipe.objects.all()
> self.queryset = self.queryset.annotate(total=Count('steps__ingredients',
> distinct=True))
> self.queryset =
> self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
> self.queryset =
> self.queryset.annotate(available=Count('steps__ingredients',
> distinct=True))
> self.queryset = self.queryset.filter(total=F('available'))

> #I feel like this order_by 'pk' or 'id' is bugging the query. If I remove
> this order_by I couldn't replicate the problem yet. As I said it doesn't
> happen 100%.
> return self.queryset.order_by('pk')
> }}}
>

> This is the wrong query result that comes often times:
>

New description:

Sometimes when I run a set of filter/annotation the result is different

for the same variables. I'm using Django 1.11.2

The models used (simplified)


{{{
class Recipe(models.Model):
name = models.CharField(max_length=50)
steps = models.ManyToManyField(StepRecipe)

class StepRecipe(models.Model):
ingredients = models.ManyToManyField(RecipeIngredient)

class RecipeIngredient(models.Model):
ingredient = models.ForeignKey(Ingredient)

class Ingredient(models.Model):
name = models.CharField(max_length=50)

}}}

This is the queryset command:

--

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:6>

Django

未读,
2017年6月11日 02:23:222017/6/11
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Uncategorized | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Todor Velichkov):

I think annotating + filter on M2M field is the issue here.

Please check this thread on django-users:
https://groups.google.com/forum/#!topic/django-users/RPU-PnygbLg

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:7>

Django

未读,
2017年6月12日 09:50:072017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Marcus Renno):

* type: Uncategorized => Bug


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:8>

Django

未读,
2017年6月12日 10:48:062017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tom):

This appears to be related to a dictionary iteration somewhere. If you set
a fixed `PYTHONHASHSEED` value (e.g PYTHONHASHSEED=1) then the query is
stable, otherwise it does indeed switch between two types of SQL.

In my experiments `PYTHONHASHSEED=1` produces the wrong query, whereas
`PYTHONHASHSEED=2` produces the correct one.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:9>

Django

未读,
2017年6月12日 11:11:082017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tom):

* owner: nobody => Tom
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:10>

Django

未读,
2017年6月12日 11:24:352017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tom):

It seems to boil down to this line:
https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927

The `alias_map.items()` isn't stable, with `PYTHONHASHSEED=1` then
`reuse[0]` is `recepie_steps`, which is incorrect, but with it set to `2`
then `reuse[0]` is (correctly) `T6`.

If anyone knows how to fix this then please let me know, else I will try
and find a suitable fix.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:11>

Django

未读,
2017年6月12日 11:56:272017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marcus Renno):

That seems to be right, Tom. `join()` is relying on the order of the
dictionary order resulting from `.items()`.

It's weird that setting the hash seed to 2 would solve this problem,
though. I feel like the framework should not rely on the user changing the
hashing seed to a static number to fix this problem, because it breaks the
purpose (security) of the the hashing being random.

Replying to [comment:11 Tom]:


> It seems to boil down to this line:
https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927
>
> The `alias_map.items()` isn't stable, with `PYTHONHASHSEED=1` then
`reuse[0]` is `recepie_steps`, which is incorrect, but with it set to `2`
then `reuse[0]` is (correctly) `T6`.
>
> If anyone knows how to fix this then please let me know, else I will try
and find a suitable fix.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:12>

Django

未读,
2017年6月12日 12:23:142017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz):

* stage: Unreviewed => Accepted


Comment:

Query results depending on dict ordering is a bug.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:13>

Django

未读,
2017年6月12日 12:26:232017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Claude Paroz):

* cc: akaariai@… (added)


Comment:

Looks like this was introduced in
[ab89414f40db1598364a7fe4cfac1766cacd2668].

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:14>

Django

未读,
2017年6月12日 12:49:322017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tom):

I wasn't suggesting setting the hash seed value as a fix, merely as a
debugging aide for anyone else looking at the ticket.

I've made a MR with what I think is the correct fix:
https://github.com/django/django/pull/8631

The issue was that there where two joins, with differing `join_types`,
that where considered equal in the `Join.__eq__` method, and so randomly
one would be chosen, which was sometimes incorrect.

Replying to [comment:12 Marcus Renno]:


> That seems to be right, Tom. `join()` is relying on the order of the

dictionary resulting from `.items()`.


>
> It's weird that setting the hash seed to 2 would solve this problem,
though. I feel like the framework should not rely on the user changing the
hashing seed to a static number to fix this problem, because it breaks the
purpose (security) of the the hashing being random.
>
> Replying to [comment:11 Tom]:
> > It seems to boil down to this line:
https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927
> >
> > The `alias_map.items()` isn't stable, with `PYTHONHASHSEED=1` then
`reuse[0]` is `recepie_steps`, which is incorrect, but with it set to `2`
then `reuse[0]` is (correctly) `T6`.
> >
> > If anyone knows how to fix this then please let me know, else I will
try and find a suitable fix.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:15>

Django

未读,
2017年6月12日 12:50:062017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tom):

* has_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:16>

Django

未读,
2017年6月12日 14:00:582017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* needs_better_patch: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:17>

Django

未读,
2017年6月12日 14:49:512017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

This looks a bit similar to #26522 which made `alias_map` an `OrderedDict`
to prevent
[https://github.com/django/django/commit/9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9
such deterministic failures].

I wonder if it wouldn't be better to figure out what's adding values to
`self.alias_map` in a non-deterministic order instead as it seems to be
the underlying issue here.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:18>

Django

未读,
2017年6月12日 16:03:412017/6/12
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tom):

Hmm, you're right Simon. I missed the OrderedDict change, and tested
initially on 1.11 (where it's not applied). With this, on master, the
results are added deterministically, but the first item is always the
incorrect join.

So I don't think my ticket fixes the issue in a convincing way. If you
extend the example in the ticket with another annotate and filter:

{{{


self.queryset = self.queryset.annotate(total=Count('steps__ingredients',
distinct=True))

self.queryset = self.queryset.filter(steps__ingredients__pk=1)


self.queryset =
self.queryset.annotate(available=Count('steps__ingredients',
distinct=True))

self.queryset = self.queryset.filter(steps__ingredients__pk=2)
self.queryset =
self.queryset.annotate(available2=Count('steps__ingredients',
distinct=True))
self.queryset =
self.queryset.filter(total=F('available')).filter(total=F('available2'))
}}}

Then `reuse` will have *three* aliases found: `[('test_app_recipe_steps',
'LEFT OUTER JOIN'), ('T6', 'INNER JOIN'), ('T10', 'INNER JOIN')]`

In this case my patch will fetch T6, and we have the same issue.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:19>

Django

未读,
2017年6月13日 07:42:392017/6/13
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Tom
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tom):

I've dug into this a fair bit and I'm sorry to say I'm stuck. I also think
there is a pretty big bug lurking here that is out of my league. Or maybe
it's a feature we don't support?

As far as I can tell, if you do:

{{{
Publisher.objects
.annotate(num_books=Count('book', distinct=True))
.filter(book__rating__gt=3.0)
.annotate(num_rated=Count('book', distinct=True))
.filter(num_books=F('num_rated'))

}}}

Then the resulting SQL will have the following `HAVING` clause:

`HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT
"publisher"."id"))`

Am I correct in thinking that this should be:

`HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id"))`,
where `T3` is the correct join?

In any case, either the generated query is incorrect or the documentation
needs to be updated.

I think, at least in the case of a m2m join, one of the issues are that
there seems to be no way to resolve which alias is used by two identical
annotations which are affected by a previous `.filter()` in the
`query.join` method.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:20>

Django

未读,
2017年6月13日 07:43:032017/6/13
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: (none)
Type: Bug | Status: new

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tom):

* owner: Tom => (none)
* status: assigned => new


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:21>

Django

未读,
2017年6月13日 10:54:502017/6/13
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marcus Renno):

Tom, I agree with you that it should be the second query and I believe
that the feature should be supported. It's just a bug and we will fix it
;)

I'll try to work on this issue when I get home. I think you did a
fantastic job narrowing down the problem and it's close to be solved.


Replying to [comment:20 Tom]:


> I've dug into this a fair bit and I'm sorry to say I'm stuck. I also
think there is a pretty big bug lurking here that is out of my league. Or
maybe it's a feature we don't support?
>

> As far as I can tell, if you annotate over foreign keys as well as m2m:


>
> {{{
> Publisher.objects
> .annotate(num_books=Count('book', distinct=True))
> .filter(book__rating__gt=3.0)
> .annotate(num_rated=Count('book', distinct=True))
> .filter(num_books=F('num_rated'))
>
> }}}
>
> Then the resulting SQL will have the following `HAVING` clause:
>
> `HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT
"publisher"."id"))`
>
> Am I correct in thinking that this should be:
>
> `HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id"))`,
where `T3` is the correct join?
>
> In any case, either the generated query is incorrect or the
documentation needs to be updated.
>
> I think, at least in the case of a m2m join, one of the issues are that
there seems to be no way to resolve which alias is used by two identical
annotations which are affected by a previous `.filter()` in the
`query.join` method.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:22>

Django

未读,
2017年6月13日 17:03:262017/6/13
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tom):

* cc: tom@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:23>

Django

未读,
2017年6月14日 08:52:302017/6/14
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Marcus Renno):

* owner: (none) => Marcus Renno


* status: new => assigned


Comment:

First patch I have ever done to Django here:
https://github.com/django/django/pull/8640

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:24>

Django

未读,
2017年6月14日 08:58:242017/6/14
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Marcus Renno):

* needs_better_patch: 1 => 0
* stage: Accepted => Unreviewed


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:25>

Django

未读,
2017年6月14日 09:20:002017/6/14
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* stage: Unreviewed => Accepted


Comment:

"Triage Stage" isn't reset when adding a patch, see
[https://docs.djangoproject.com/en/dev/internals/contributing/triaging-
tickets/ Triaging Tickets].

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:26>

Django

未读,
2017年6月14日 09:58:132017/6/14
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marcus Renno):

Sorry for that, Tim. Thanks for the useful information!

Replying to [comment:26 Tim Graham]:


> "Triage Stage" isn't reset when adding a patch, see
[https://docs.djangoproject.com/en/dev/internals/contributing/triaging-
tickets/ Triaging Tickets].

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:27>

Django

未读,
2017年6月26日 04:12:372017/6/26
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jeppe Vesterbæk):

* cc: jeppe@… (added)


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:28>

Django

未读,
2017年6月26日 15:19:562017/6/26
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* needs_better_patch: 0 => 1


Comment:

As noted on the PR, I'm not sure that the current approach is ideal.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:29>

Django

未读,
2017年6月27日 17:38:452017/6/27
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marcus Renno):

I updated the code in the PR. If someone has some time to check, please go
ahead and share your comments.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:30>

Django

未读,
2017年6月28日 01:53:022017/6/28
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by holvianssi):

To me it seems that pre
https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668
we always used the *first* join, not the last join possible in join(). I
believe this has been that way for a long time, so changing that should be
considered carefully.

I'm not exactly sure why do we get random order for self.alias_map in
join. It's an OrderedDict, so that shouldn't be the case...

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:31>

Django

未读,
2017年6月28日 06:00:202017/6/28
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marcus Renno):

Replying to [comment:31 holvianssi]:


> To me it seems that pre
https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668
we always used the *first* join, not the last join possible in join(). I
believe this has been that way for a long time, so changing that should be
considered carefully.
>
> I'm not exactly sure why do we get random order for self.alias_map in
join. It's an OrderedDict, so that shouldn't be the case...

Hm... I always had the impression it was the last join. For me, it makes
sense to use the tables in sequence, because you use actions (aka
operations) in sequence. If it always grab the first joined table this
wouldn't be possible. From what I understood, here in this ticket, until
recent changes the order was random.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:32>

Django

未读,
2017年6月29日 07:33:222017/6/29
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Marcus Renno):

* needs_better_patch: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:33>

Django

未读,
2017年7月8日 13:38:512017/7/8
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marcus Renno):

Hi, any updates regarding this ticket?

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:34>

Django

未读,
2017年7月19日 16:16:292017/7/19
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM
-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: assigned
Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution:
Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Tim Graham):

* needs_better_patch: 0 => 1


Comment:

As noted on the PR, the change introduces a non-deterministic test
failure.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:35>

Django

未读,
2020年1月15日 04:07:232020/1/15
收件人 django-...@googlegroups.com
#28297: Same queryset result in two different queries on ORM.

-------------------------------------+-------------------------------------
Reporter: Marcus Renno | Owner: Marcus
| Renno
Type: Bug | Status: closed

Component: Database layer | Version: 1.11
(models, ORM) |
Severity: Normal | Resolution: duplicate

Keywords: join, annotation, F | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* status: assigned => closed


* needs_better_patch: 1 => 0

* has_patch: 1 => 0
* resolution: => duplicate


Comment:

Duplicate of #29530. Fixed in 0e64e046a481c345358b1040df7c89480ad198e8.

--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:36>

回复全部
回复作者
转发
0 个新帖子