This is the queryset command:
{{{
ingredients = ['tomato']
self.queryset = Recipe.objects.all()
self.queryset =
self.queryset.annotate(total=Count('steps__ingredients', distinct=True))
self.queryset =
self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
self.queryset =
self.queryset.annotate(available=Count('steps__ingredients',
distinct=True))
self.queryset = self.queryset.filter(total=F('available'))
}}}
This is the wrong query result that comes often times:
{{{
SELECT recipebook_recipe.id, recipebook_recipe.name,
recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id) AS total,
COUNT(DISTINCT recipebook_steprecipe_ingredients.recipeingredient_id) AS
available FROM recipebook_recipe LEFT OUTER JOIN recipebook_recipe_steps
ON (recipebook_recipe.id = recipebook_recipe_steps.recipe_id) LEFT OUTER
JOIN recipebook_steprecipe ON (recipebook_recipe_steps.steprecipe_id =
recipebook_steprecipe.id) LEFT OUTER JOIN
recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id =
recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN
recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER
JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN
recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER
JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id)
INNER JOIN recipebook_ingredient ON (T9.ingredient_id =
recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato")
GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id)) ORDER BY NULL
}}}
And this is the right query that shows up sometimes:
{{{
SELECT recipebook_recipe.id, recipebook_recipe.name,
recipebook_recipe.dificulty, recipebook_recipe.duration, COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id) AS total,
COUNT(DISTINCT T8.recipeingredient_id) AS available FROM recipebook_recipe
LEFT OUTER JOIN recipebook_recipe_steps ON (recipebook_recipe.id =
recipebook_recipe_steps.recipe_id) LEFT OUTER JOIN recipebook_steprecipe
ON (recipebook_recipe_steps.steprecipe_id = recipebook_steprecipe.id) LEFT
OUTER JOIN recipebook_steprecipe_ingredients ON (recipebook_steprecipe.id
= recipebook_steprecipe_ingredients.steprecipe_id) INNER JOIN
recipebook_recipe_steps T6 ON (recipebook_recipe.id = T6.recipe_id) INNER
JOIN recipebook_steprecipe T7 ON (T6.steprecipe_id = T7.id) INNER JOIN
recipebook_steprecipe_ingredients T8 ON (T7.id = T8.steprecipe_id) INNER
JOIN recipebook_recipeingredient T9 ON (T8.recipeingredient_id = T9.id)
INNER JOIN recipebook_ingredient ON (T9.ingredient_id =
recipebook_ingredient.id) WHERE recipebook_ingredient.name IN ("tomato")
GROUP BY recipebook_recipe.id HAVING COUNT(DISTINCT
recipebook_steprecipe_ingredients.recipeingredient_id) = (COUNT(DISTINCT
T8.recipeingredient_id)) ORDER BY NULL
}}}
As you can see they are different. The wrong one does not set the variable
'available' appropriately. I wonder if this is something on the method
`.query.join()`
--
Ticket URL: <https://code.djangoproject.com/ticket/28297>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
New description:
--
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:1>
Comment (by Simon Charette):
Hello Marcus,
You'll have to provide a minimal models setup to allow us to identify the
issue as it's almost impossible to figure out what could be wrong it.
Please do so and reopen this ticket with the details.
Also, did you reproduce against Django 1.11 and the master branch?
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:2>
* status: new => closed
* resolution: => needsinfo
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:3>
* status: closed => new
* resolution: needsinfo =>
Old description:
New description:
Sometimes when I run a set of filter/annotation the result is different
for the same variables. I'm using Django 1.11.2
The models used (simplified)
{{{
class Recipe(models.Model):
name = models.CharField(max_length=50)
steps = models.ManyToManyField(StepRecipe)
class StepRecipe(models.Model):
ingredients = models.ManyToManyField(RecipeIngredient)
class RecipeIngredient(models.Model):
ingredient = models.ForeignKey(Ingredient)
class Ingredient(models.Model):
name = models.CharField(max_length=50)
}}}
--
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:4>
Old description:
> Sometimes when I run a set of filter/annotation the result is different
> for the same variables. I'm using Django 1.11.2
>
> The models used (simplified)
>
> {{{
> class Recipe(models.Model):
> name = models.CharField(max_length=50)
> steps = models.ManyToManyField(StepRecipe)
>
> class StepRecipe(models.Model):
> ingredients = models.ManyToManyField(RecipeIngredient)
>
> class RecipeIngredient(models.Model):
> ingredient = models.ForeignKey(Ingredient)
>
> class Ingredient(models.Model):
> name = models.CharField(max_length=50)
>
> }}}
>
New description:
Sometimes when I run a set of filter/annotation the result is different
for the same variables. I'm using Django 1.11.2
The models used (simplified)
{{{
class Recipe(models.Model):
name = models.CharField(max_length=50)
steps = models.ManyToManyField(StepRecipe)
class StepRecipe(models.Model):
ingredients = models.ManyToManyField(RecipeIngredient)
class RecipeIngredient(models.Model):
ingredient = models.ForeignKey(Ingredient)
class Ingredient(models.Model):
name = models.CharField(max_length=50)
}}}
This is the queryset command:
{{{
ingredients = ['tomato']
self.queryset = Recipe.objects.all()
self.queryset = self.queryset.annotate(total=Count('steps__ingredients',
distinct=True))
self.queryset =
self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
self.queryset =
self.queryset.annotate(available=Count('steps__ingredients',
distinct=True))
self.queryset = self.queryset.filter(total=F('available'))
#I feel like this order_by 'pk' or 'id' is bugging the query. If I remove
this order_by I couldn't replicate the problem yet. As I said it doesn't
happen 100%.
return self.queryset.order_by('pk')
}}}
--
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:5>
Old description:
> Sometimes when I run a set of filter/annotation the result is different
> for the same variables. I'm using Django 1.11.2
>
> The models used (simplified)
>
> {{{
> class Recipe(models.Model):
> name = models.CharField(max_length=50)
> steps = models.ManyToManyField(StepRecipe)
>
> class StepRecipe(models.Model):
> ingredients = models.ManyToManyField(RecipeIngredient)
>
> class RecipeIngredient(models.Model):
> ingredient = models.ForeignKey(Ingredient)
>
> class Ingredient(models.Model):
> name = models.CharField(max_length=50)
>
> }}}
>
> This is the queryset command:
>
> {{{
> ingredients = ['tomato']
> self.queryset = Recipe.objects.all()
> self.queryset = self.queryset.annotate(total=Count('steps__ingredients',
> distinct=True))
> self.queryset =
> self.queryset.filter(steps__ingredients__ingredient__name__in=ingredients)
> self.queryset =
> self.queryset.annotate(available=Count('steps__ingredients',
> distinct=True))
> self.queryset = self.queryset.filter(total=F('available'))
> #I feel like this order_by 'pk' or 'id' is bugging the query. If I remove
> this order_by I couldn't replicate the problem yet. As I said it doesn't
> happen 100%.
> return self.queryset.order_by('pk')
> }}}
>
> This is the wrong query result that comes often times:
>
New description:
Sometimes when I run a set of filter/annotation the result is different
for the same variables. I'm using Django 1.11.2
The models used (simplified)
{{{
class Recipe(models.Model):
name = models.CharField(max_length=50)
steps = models.ManyToManyField(StepRecipe)
class StepRecipe(models.Model):
ingredients = models.ManyToManyField(RecipeIngredient)
class RecipeIngredient(models.Model):
ingredient = models.ForeignKey(Ingredient)
class Ingredient(models.Model):
name = models.CharField(max_length=50)
}}}
This is the queryset command:
--
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:6>
Comment (by Todor Velichkov):
I think annotating + filter on M2M field is the issue here.
Please check this thread on django-users:
https://groups.google.com/forum/#!topic/django-users/RPU-PnygbLg
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:7>
* type: Uncategorized => Bug
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:8>
Comment (by Tom):
This appears to be related to a dictionary iteration somewhere. If you set
a fixed `PYTHONHASHSEED` value (e.g PYTHONHASHSEED=1) then the query is
stable, otherwise it does indeed switch between two types of SQL.
In my experiments `PYTHONHASHSEED=1` produces the wrong query, whereas
`PYTHONHASHSEED=2` produces the correct one.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:9>
* owner: nobody => Tom
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:10>
Comment (by Tom):
It seems to boil down to this line:
https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927
The `alias_map.items()` isn't stable, with `PYTHONHASHSEED=1` then
`reuse[0]` is `recepie_steps`, which is incorrect, but with it set to `2`
then `reuse[0]` is (correctly) `T6`.
If anyone knows how to fix this then please let me know, else I will try
and find a suitable fix.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:11>
Comment (by Marcus Renno):
That seems to be right, Tom. `join()` is relying on the order of the
dictionary order resulting from `.items()`.
It's weird that setting the hash seed to 2 would solve this problem,
though. I feel like the framework should not rely on the user changing the
hashing seed to a static number to fix this problem, because it breaks the
purpose (security) of the the hashing being random.
Replying to [comment:11 Tom]:
> It seems to boil down to this line:
https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927
>
> The `alias_map.items()` isn't stable, with `PYTHONHASHSEED=1` then
`reuse[0]` is `recepie_steps`, which is incorrect, but with it set to `2`
then `reuse[0]` is (correctly) `T6`.
>
> If anyone knows how to fix this then please let me know, else I will try
and find a suitable fix.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:12>
* stage: Unreviewed => Accepted
Comment:
Query results depending on dict ordering is a bug.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:13>
* cc: akaariai@… (added)
Comment:
Looks like this was introduced in
[ab89414f40db1598364a7fe4cfac1766cacd2668].
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:14>
Comment (by Tom):
I wasn't suggesting setting the hash seed value as a fix, merely as a
debugging aide for anyone else looking at the ticket.
I've made a MR with what I think is the correct fix:
https://github.com/django/django/pull/8631
The issue was that there where two joins, with differing `join_types`,
that where considered equal in the `Join.__eq__` method, and so randomly
one would be chosen, which was sometimes incorrect.
Replying to [comment:12 Marcus Renno]:
> That seems to be right, Tom. `join()` is relying on the order of the
dictionary resulting from `.items()`.
>
> It's weird that setting the hash seed to 2 would solve this problem,
though. I feel like the framework should not rely on the user changing the
hashing seed to a static number to fix this problem, because it breaks the
purpose (security) of the the hashing being random.
>
> Replying to [comment:11 Tom]:
> > It seems to boil down to this line:
https://github.com/django/django/blob/stable/1.11.x/django/db/models/sql/query.py#L927
> >
> > The `alias_map.items()` isn't stable, with `PYTHONHASHSEED=1` then
`reuse[0]` is `recepie_steps`, which is incorrect, but with it set to `2`
then `reuse[0]` is (correctly) `T6`.
> >
> > If anyone knows how to fix this then please let me know, else I will
try and find a suitable fix.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:15>
* has_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:16>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:17>
Comment (by Simon Charette):
This looks a bit similar to #26522 which made `alias_map` an `OrderedDict`
to prevent
[https://github.com/django/django/commit/9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9
such deterministic failures].
I wonder if it wouldn't be better to figure out what's adding values to
`self.alias_map` in a non-deterministic order instead as it seems to be
the underlying issue here.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:18>
Comment (by Tom):
Hmm, you're right Simon. I missed the OrderedDict change, and tested
initially on 1.11 (where it's not applied). With this, on master, the
results are added deterministically, but the first item is always the
incorrect join.
So I don't think my ticket fixes the issue in a convincing way. If you
extend the example in the ticket with another annotate and filter:
{{{
self.queryset = self.queryset.annotate(total=Count('steps__ingredients',
distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__pk=1)
self.queryset =
self.queryset.annotate(available=Count('steps__ingredients',
distinct=True))
self.queryset = self.queryset.filter(steps__ingredients__pk=2)
self.queryset =
self.queryset.annotate(available2=Count('steps__ingredients',
distinct=True))
self.queryset =
self.queryset.filter(total=F('available')).filter(total=F('available2'))
}}}
Then `reuse` will have *three* aliases found: `[('test_app_recipe_steps',
'LEFT OUTER JOIN'), ('T6', 'INNER JOIN'), ('T10', 'INNER JOIN')]`
In this case my patch will fetch T6, and we have the same issue.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:19>
Comment (by Tom):
I've dug into this a fair bit and I'm sorry to say I'm stuck. I also think
there is a pretty big bug lurking here that is out of my league. Or maybe
it's a feature we don't support?
As far as I can tell, if you do:
{{{
Publisher.objects
.annotate(num_books=Count('book', distinct=True))
.filter(book__rating__gt=3.0)
.annotate(num_rated=Count('book', distinct=True))
.filter(num_books=F('num_rated'))
}}}
Then the resulting SQL will have the following `HAVING` clause:
`HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT
"publisher"."id"))`
Am I correct in thinking that this should be:
`HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id"))`,
where `T3` is the correct join?
In any case, either the generated query is incorrect or the documentation
needs to be updated.
I think, at least in the case of a m2m join, one of the issues are that
there seems to be no way to resolve which alias is used by two identical
annotations which are affected by a previous `.filter()` in the
`query.join` method.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:20>
* owner: Tom => (none)
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:21>
Comment (by Marcus Renno):
Tom, I agree with you that it should be the second query and I believe
that the feature should be supported. It's just a bug and we will fix it
;)
I'll try to work on this issue when I get home. I think you did a
fantastic job narrowing down the problem and it's close to be solved.
Replying to [comment:20 Tom]:
> I've dug into this a fair bit and I'm sorry to say I'm stuck. I also
think there is a pretty big bug lurking here that is out of my league. Or
maybe it's a feature we don't support?
>
> As far as I can tell, if you annotate over foreign keys as well as m2m:
>
> {{{
> Publisher.objects
> .annotate(num_books=Count('book', distinct=True))
> .filter(book__rating__gt=3.0)
> .annotate(num_rated=Count('book', distinct=True))
> .filter(num_books=F('num_rated'))
>
> }}}
>
> Then the resulting SQL will have the following `HAVING` clause:
>
> `HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT
"publisher"."id"))`
>
> Am I correct in thinking that this should be:
>
> `HAVING COUNT(DISTINCT "publisher"."id") = (COUNT(DISTINCT "T3"."id"))`,
where `T3` is the correct join?
>
> In any case, either the generated query is incorrect or the
documentation needs to be updated.
>
> I think, at least in the case of a m2m join, one of the issues are that
there seems to be no way to resolve which alias is used by two identical
annotations which are affected by a previous `.filter()` in the
`query.join` method.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:22>
* cc: tom@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:23>
* owner: (none) => Marcus Renno
* status: new => assigned
Comment:
First patch I have ever done to Django here:
https://github.com/django/django/pull/8640
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:24>
* needs_better_patch: 1 => 0
* stage: Accepted => Unreviewed
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:25>
* stage: Unreviewed => Accepted
Comment:
"Triage Stage" isn't reset when adding a patch, see
[https://docs.djangoproject.com/en/dev/internals/contributing/triaging-
tickets/ Triaging Tickets].
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:26>
Comment (by Marcus Renno):
Sorry for that, Tim. Thanks for the useful information!
Replying to [comment:26 Tim Graham]:
> "Triage Stage" isn't reset when adding a patch, see
[https://docs.djangoproject.com/en/dev/internals/contributing/triaging-
tickets/ Triaging Tickets].
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:27>
* cc: jeppe@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:28>
* needs_better_patch: 0 => 1
Comment:
As noted on the PR, I'm not sure that the current approach is ideal.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:29>
Comment (by Marcus Renno):
I updated the code in the PR. If someone has some time to check, please go
ahead and share your comments.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:30>
Comment (by holvianssi):
To me it seems that pre
https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668
we always used the *first* join, not the last join possible in join(). I
believe this has been that way for a long time, so changing that should be
considered carefully.
I'm not exactly sure why do we get random order for self.alias_map in
join. It's an OrderedDict, so that shouldn't be the case...
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:31>
Comment (by Marcus Renno):
Replying to [comment:31 holvianssi]:
> To me it seems that pre
https://code.djangoproject.com/changeset/ab89414f40db1598364a7fe4cfac1766cacd2668
we always used the *first* join, not the last join possible in join(). I
believe this has been that way for a long time, so changing that should be
considered carefully.
>
> I'm not exactly sure why do we get random order for self.alias_map in
join. It's an OrderedDict, so that shouldn't be the case...
Hm... I always had the impression it was the last join. For me, it makes
sense to use the tables in sequence, because you use actions (aka
operations) in sequence. If it always grab the first joined table this
wouldn't be possible. From what I understood, here in this ticket, until
recent changes the order was random.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:32>
* needs_better_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:33>
Comment (by Marcus Renno):
Hi, any updates regarding this ticket?
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:34>
* needs_better_patch: 0 => 1
Comment:
As noted on the PR, the change introduces a non-deterministic test
failure.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:35>
* status: assigned => closed
* needs_better_patch: 1 => 0
* has_patch: 1 => 0
* resolution: => duplicate
Comment:
Duplicate of #29530. Fixed in 0e64e046a481c345358b1040df7c89480ad198e8.
--
Ticket URL: <https://code.djangoproject.com/ticket/28297#comment:36>