There's a logical error somewhere in Query.combine or friends.
I have a pretty complicated query with many-to-many self-joins and ORs,
and after the upgrade to Django 1.9 and Python 3 combine() sometimes, but
not always, crashes with the assertion error
{{{
assert set(change_map.keys()).intersection(set(change_map.values())) ==
set()
}}}
Inspecting the change_map, it does indeed contain a circular reference
(BTW shouldn't you use "not" to test for an empty set rather than
comparing with set()?).
At first, I didn't understand how it could crash sometimes and sometimes
not - we're talking about the same query being executed through a view.
But when I examine change_map, its content does indeed change from reload
to reload - I suppose this may have something to do with the order of
dicts/sets the combine() algorithm is using.
Here comes some code, I cut out a bunch of unused stuff, but otherwise
it's the same:
{{{
class Invoice(models.Model):
customer = models.ForeignKey(Customer)
date_created = models.DateField(default=datetime.date.today,
db_index=True)
date_sent = models.DateField(null=True, blank=True)
date_due = models.DateField(null=True, blank=True)
date_paid = models.DateField(null=True, blank=True)
date_credited = models.DateField(null=True, blank=True)
date_collect = models.DateField(null=True, blank=True)
invoice_type = models.CharField(default="invoice", max_length=32)
reminders = models.ManyToManyField("Invoice",
related_name="reminded_set", blank=True)
reminder_counter = models.IntegerField(null=True, blank=True)
}}}
That's the model, and in the view:
{{{
import datetime
from django.db.models import Q
date = datetime.datetime.now()
invoices = Invoice.objects.filter(
Q(date_created__lte=date),
Q(date_paid__gt=date) | Q(date_paid=None),
Q(date_credited__gt=date) | Q(date_credited=None),
customer=1,
)
filtered_invoices = Invoice.objects.none()
not_due = Q(date_due__gte=date) | Q(date_due=None)
not_reminded_yet = ~Q(reminders__date_created__lte=date)
not_collected = Q(date_collect__gt=date) | Q(date_collect=None)
filtered_invoices |= invoices.filter(not_due, not_collected,
date_sent__lte=date, invoice_type="invoice")
filtered_invoices |= invoices.filter(not_collected, not_reminded_yet,
date_sent__lte=date, date_due__lt=date, invoice_type="invoice")
for r in [1, 2, 3]:
qs = invoices.filter(not_collected,
reminders__date_created__lte=date, reminders__reminder_counter=r,
invoice_type="invoice")
for i in range(r + 1, 3 + 1):
qs = qs.filter(~Q(reminders__reminder_counter=i) |
Q(reminders__reminder_counter=i, reminders__date_created__gt=date))
filtered_invoices |= qs
}}}
I realize it's pretty complicated but I'm not sure what's essential and
what's not. I hope someone knowledgeable of how combine() works can figure
out why ordering in some cases matters.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Comment:
Oh, I forgot to mention - it crashes on the very last line {{{
filtered_invoices |= qs }}} in the above with r = 2. So after having
constructed a pretty complicated thing.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:1>
Comment (by timgraham):
Could you [https://docs.djangoproject.com/en/dev/internals/contributing
/triaging-tickets/#bisecting-a-regression bisect the regression]? If you
can provide the test app that you use for that process, it'll be helpful.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:2>
* Attachment "combinebug.tar.gz" added.
Test project for reproducing the problem
Comment (by OleLaursen):
I've attached a test project. Run it with
{{{
cd combinebug
DJANGO_SETTINGS_MODULE=combinebug.settings PYTHONPATH=../Django-1.7.5:.
python combinebug/triggerbug.py
}}}
If it crashes, congrats, you've reproduced it. If not, try running it
multiple times.
I tested it with Python 2.7 and Django 1.7.5 and could reproduce the crash
and Python 3 and Django 1.9.4 and could also reproduce it.
1.6.2 seems to be working - that was the version I upgraded the whole
project from and it's been stable.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:3>
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:4>
Comment (by timgraham):
#26959 looks like a duplicate and includes a more minimal model to
reproduce.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:5>
* has_patch: 0 => 1
Comment:
PR at https://github.com/django/django/pull/8139
I ran into this bug yesterday and spent some time trying to figure out
what's going on. It seems like the circular reference arises when
`alias_map` is iterated in `Query.join` in order to find the next alias to
reuse:
{{{
reuse = [a for a, j in self.alias_map.items()
if (reuse is None or a in reuse) and j == join]
if reuse:
self.ref_alias(reuse[0])
return reuse[0]
}}}
Because `alias_map` is a dict, the alias to reuse is chosen non-
deterministically. Using an `OrderedDict` for `alias_map` solves this
problem.
It can also by solved by changing the reuse code to
{{{
reuse = [a for a, j in self.alias_map.items()
if (reuse is None or a in reuse) and j == join]
if reuse:
if join.table_alias in reuse:
to_use = join.table_alias
else:
to_use = reuse[0]
self.ref_alias(to_use)
return to_use
}}}
But the `OrderedDict` seems cleaner to me.
I don't understand much of the join internals here so I'd love to hear if
someone has a better solution.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:6>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:7>
* status: new => closed
* resolution: => fixed
Comment:
In [changeset:"9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9" 9bbb6e2d]:
{{{
#!CommitTicketReference repository=""
revision="9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9"
Fixed #26522 -- Fixed a nondeterministic AssertionError in QuerySet
combining.
Thanks Andrew Brown for the test case.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:8>
Comment (by Gwildor Sok):
I just ran into this problem as well after updating a project from 1.8 to
1.11.2. Unfortunately, implementing the fix by changing `alias_map` to be
an OrderedDict did not help. I managed to work around it by just rewriting
the query, (because it was doing some dumb things which were obfuscated by
chaining manager methods, as you can see below), but I thought I would
share the smallest test case I could reproduce it with, just in case
someone else runs into it as well.
My models:
{{{
class Assessment(models.Model):
pass
class AnswerGroup(models.Model):
assessment = models.ForeignKey(Assessment)
author = models.ForeignKey(
User, related_name='created_answergroups')
about = models.ForeignKey(
User, blank=True, null=True, related_name='subject_answergroups)
}}}
And my test code where I get the same AssertionError:
{{{
u = User.objects.get(pk=1)
assessments = Assessment.objects.all()
groups = AnswerGroup.objects.filter(assessment__in=assessments)
groups_1 = groups.filter(author=u, about=u)
groups_2 = groups.filter(author=u).exclude(about=None).exclude(about=u)
list(groups_1 | groups_2)
}}}
Note that when I remove the `assessment__in` filter, the code works fine.
Also note that `change_map` is something like `{u'auth_user': 'T4', 'T4':
u'auth_user'}` both with and without the filter, and thus both when an
error occurs and when it does not.
I've worked around it by just using one filter and ditching the combine by
just using `filter(author=u).exclude(about=None)`, but of course not
everyone has the option to rewrite their query that easily.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:9>
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:10>
Comment (by Tim Graham <timograham@…>):
In [changeset:"f7f5edd50d03e8482f8a6da5fb5202b895d68cd6" f7f5edd]:
{{{
#!CommitTicketReference repository=""
revision="f7f5edd50d03e8482f8a6da5fb5202b895d68cd6"
Removed obsolete Query.tables attribute.
Obsolete since Query.alias_map became an OrderedDict (refs #26522).
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:11>
Comment (by Mikalai Radchuk):
It's a very tricky issue. I think, it can damage data in some cases.
In my case Django ORM didn't fail with assertion error: it generated wrong
SQL query so I got wrong data from my DB. On development environment I
didn't get this issue because it appears randomly. I'm a lucky person - in
my case I query for some data for UI, but if you perform some calculations
using data from a queryset that silently returns wrong data - you are in
trouble. It's quite possible that 4/5 of your processes work properly, but
one gives wrong results.
Here is simplified version of my issue.
Models:
{{{
#!python
from django.db import models
class Category(models.Model):
title = models.CharField(max_length=255, unique=True)
class Meta:
verbose_name_plural = 'categories'
def __str__(self):
return self.title
class Document(models.Model):
title = models.CharField(blank=True, max_length=255)
categories = models.ManyToManyField('Category', blank=True,
related_name='+')
def __str__(self):
return self.title
}}}
Build a query set that could return wrong data:
{{{
#!python
from app_with_models.models import Document, Category
queryset = Document.objects.all()
# Exclude documents without taxonomy (Category) tags
queryset = queryset.filter(categories__in=Category.objects.all())
# Apply predefined taxonomy filters
category_predefined = [1] # Should be in fixtures
queryset = queryset.filter(categories__in=category_predefined)
queryset = queryset.distinct()
used_category_ids = queryset.values_list('categories__pk', flat=True)
print(used_category_ids.query)
}}}
This code should generate the following SQL query (formatted):
{{{
#!sql
SELECT DISTINCT
"app_with_models_document_categories"."category_id"
FROM "app_with_models_document"
LEFT OUTER JOIN "app_with_models_document_categories" ON
("app_with_models_document"."id" =
"app_with_models_document_categories"."document_id")
INNER JOIN "app_with_models_document_categories" T4 ON
("app_with_models_document"."id" = T4."document_id")
WHERE (
"app_with_models_document_categories"."category_id" IN (SELECT U0."id"
AS Col1 FROM "app_with_models_category" U0)
AND T4."category_id" IN (1)
);
}}}
But in some cases it generates this:
{{{
#!sql
SELECT DISTINCT
T4."category_id"
FROM "app_with_models_document"
LEFT OUTER JOIN "app_with_models_document_categories" ON
("app_with_models_document"."id" =
"app_with_models_document_categories"."document_id")
INNER JOIN "app_with_models_document_categories" T4 ON
("app_with_models_document"."id" = T4."document_id")
WHERE (
"app_with_models_document_categories"."category_id" IN (SELECT U0."id"
AS Col1 FROM "app_with_models_category" U0)
AND T4."category_id" IN (1)
);
}}}
This query generates absolutely wrong results. In my example it will
always return `1` as `category_id`.
Some more details:
* Can be reproduced on `Django==1.10.7`, `Django==1.11.4` and, most
likely, on order versions too. Fixed in
9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9
* Can be reproduced on Python 3.5, but not on Python 3.6 (at least, I
didn't manage to do that). I guess it's because of
[https://docs.python.org/3/whatsnew/3.6.html#new-dict-implementation new
dict implementation].
* I've tested with Postgres 9.6.3 and `psycopg2==2.6.1` and
`psycopg2==2.7.3.1`, but it looks like the issue is not DB engine/adapter
specific.
Hope it helps.
Can it be backported into supported (or at least into LTS)? I would really
appreciate it. Me an my colleague have spent 4 days debugging our code an
I'm afraid that it could cause more serious issues for other people.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:12>
Comment (by Tim Graham):
I guess it's okay to backport to 1.11. There's some indication based on
the comments that this may be a regression.
[https://github.com/django/django/pull/9003 PR] for the stable/1.11.x
branch.
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:13>
Comment (by Tim Graham <timograham@…>):
In [changeset:"8d66bffbae8e5a230da51c7638d24fdbd327a96b" 8d66bff]:
{{{
#!CommitTicketReference repository=""
revision="8d66bffbae8e5a230da51c7638d24fdbd327a96b"
[1.11.x] Fixed #26522 -- Fixed a nondeterministic AssertionError in
QuerySet combining.
Thanks Andrew Brown for the test case.
Backport of 9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9 from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:14>
Comment (by Tim Graham <timograham@…>):
In [changeset:"04050bff306c0c609d58a8916a96245aa85c2c31" 04050bff]:
{{{
#!CommitTicketReference repository=""
revision="04050bff306c0c609d58a8916a96245aa85c2c31"
Refs #26522 -- Forwardported 1.11.5 release note.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:15>