{{{
class Recursive(models.Model):
name = models.CharField(max_length=10)
link = models.IntegerField()
parent = models.ForeignKey('self', models.CASCADE, null=True)
rec1 = Recursive.objects.create(name="r1", link=1)
rec2 = Recursive.objects.create(name="r2", link=1)
rec3 = Recursive.objects.create(name="r11", parent=rec1, link=2)
latest_parent =
Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')
Recursive.objects.filter(parent__isnull=False) \
.annotate(parent_link=F('parent__link')) \
.annotate(p=Subquery(latest_parent.values('pk')[:1])) \
.update(parent_id=F('p'))
}}}
{{{
Traceback (most recent call last):
File "tests.py", line 88, in
test_update_annotated_using_related_queryset
.update(parent_id=F('p'))
File "/git/django/django/db/models/query.py", line 647, in update
rows = query.get_compiler(self.db).execute_sql(CURSOR)
File "/git/django/django/db/models/sql/compiler.py", line 1204, in
execute_sql
cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
File "/git/django/django/db/models/sql/compiler.py", line 899, in
execute_sql
raise original_exception
OperationalError: no such column: T2.link
}}}
I've tried this in SQLite and MySQL in Django 1.11.11 and Django 2.0 using
Python 3.6.0. All raising similar errors.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
Old description:
New description:
An SQL-error is raised when updating a queryset which includes an
annotation created using a subquery, which in turn also uses an
annotation.
{{{
class Recursive(models.Model):
name = models.CharField(max_length=10)
link = models.IntegerField()
parent = models.ForeignKey('self', models.CASCADE, null=True)
from django.db.models import OuterRef, Subquery, F
--
Comment:
Reproduced at feb683c4c2c5ecfb61e4cb490c3e357450c0c0e8.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:1>
Comment (by Carlton Gibson):
Related: #29542
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:2>
* cc: felixxm (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:3>
Comment (by Oskar Persson):
Just tried this with the fix for #29542 and the problem remains.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:4>
Comment (by Simon Charette):
#30009 was a duplicate that wasn't relying on `update()`.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:5>
Comment (by Simon Charette):
As mentioned in #30009 this has something to do with alias quoting and is
probably related to `Subquery.resolve_expression`'s alteration of
`external_alias` as a comment mentions it's done to
[https://github.com/django/django/blob/c5568340a525ab9c6898ed02c257394cc47285d7/django/db/models/expressions.py#L1031-L1034
prevent quoting].
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:6>
* version: 1.11 =>
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:7>
* version: => master
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:8>
Comment (by Chetan Khanna):
Replying to [comment:5 Simon Charette]:
> #30009 was a duplicate that wasn't relying on `update()`.
I don't think this issue is strictly related to `annotate`. Possibly, its
the `update()` clause. If we remove the `update()` clause, the query runs
just fine.
{{{
latest_parent =
Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')
Recursive.objects.filter(parent__isnull=False) \
.annotate(parent_link=F('parent__link')) \
.annotate(p=Subquery(latest_parent.values('pk')[:1])) \
.values()
}}}
which produces the following SQL:
{{{
SELECT "recursive_model"."id",
"recursive_model"."name",
"recursive_model"."link",
"recursive_model"."parent_id",
T2."link" AS "parent_link",
(
SELECT U0."id"
FROM "recursive_model" U0
WHERE U0."link" = T2."link"
ORDER BY U0."id" ASC
LIMIT 1
) AS "p"
FROM "recursive_model"
INNER JOIN "recursive_model" T2
ON ("recursive_model"."parent_id" = T2."id")
WHERE "recursive_model"."parent_id" IS NOT NULL
LIMIT 21
}}}
Also the query on https://code.djangoproject.com/ticket/30009 seems to run
just fine on current master. (I had to tweak it a bit since the otherwise
the ORM complained about multiple columns being returned)
query:
{{{
Task.objects.annotate(
top_case_id=Coalesce(F('case__parent_case__parent_case_id'),
F('case__parent_case_id'), F('case_id')),
subject=Subquery(Subject.objects.filter(case_id=OuterRef('top_case_id')).values('id'),
output_field=IntegerField())
).all()
}}}
corresponding SQL:
{{{
SELECT "ticket29214_task"."id",
"ticket29214_task"."num",
"ticket29214_task"."case_id",
COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id",
"ticket29214_task"."case_id") AS "top_case_id",
(
SELECT U0."id"
FROM "ticket29214_subject" U0
WHERE U0."case_id" = COALESCE(T3."parent_case_id",
"ticket29214_case"."parent_case_id", "ticket29214_task"."case_id")
) AS "subject"
FROM "ticket29214_task"
LEFT OUTER JOIN "ticket29214_case"
ON ("ticket29214_task"."case_id" = "ticket29214_case"."id")
LEFT OUTER JOIN "ticket29214_case" T3
ON ("ticket29214_case"."parent_case_id" = T3."id")
LIMIT 21
}}}
(No extra quoting aroung "T3" as mentioned in the description)
Database used was PostgreSQL and SQLs are picked from `shell_plus` of
`django_extensions`.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:9>
* cc: Simon Charette (added)
Comment:
Chetan did you reproduce the `update()` crash against latest master as
well? There has been a few tweaks to subquery aliases quoting in the past
weeks.
Could you provide the generated problematic `UPDATE` SQL since it's
missing from the ticket?
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:10>
* cc: Chetan Khanna (added)
Comment:
Yes. On running the same query, I get the following SQL and traceback:
{{{
latest_parent =
Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id')
Recursive.objects.filter(parent__isnull=False) \
.annotate(parent_link=F('parent__link')) \
.annotate(p=Subquery(latest_parent.values('pk')[:1])) \
.update(parent_id=F('p'))
}}}
SQL:
{{{
UPDATE "recursive_model"
SET "parent_id" = (
SELECT U0."id"
FROM "recursive_model" U0
WHERE U0."link" = T2."link"
ORDER BY U0."id" ASC
LIMIT 1
)
WHERE "recursive_model"."id" IN (
SELECT V0."id"
FROM "recursive_model" V0
INNER JOIN "recursive_model" V1
ON (V0."parent_id" = V1."id")
WHERE V0."parent_id" IS NOT NULL
)
}}}
Traceback:
{{{
ProgrammingError: missing FROM-clause entry for table "t2"
LINE 1: ...."id" FROM "recursive_model" U0 WHERE U0."link" = T2."link" ...
}}}
Unfortunately I haven't looked deeper, but I am planning to during this
weekend. I'll report if I get something else.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:11>
Comment (by Simon Charette):
Thanks!
Something suspicious here is the usage of `F('parent__link')` which should
be disallowed per https://code.djangoproject.com/ticket/14104#comment:1.
We don't support `JOIN`s in `UPDATE` unless I'm mistaken so I suspect the
`JOIN`'ed `T2` table is simply get pruned by `SQLUpdateCompiler`.
The only way to add support for such query would be to make
`OuterRef('parent__link')` result in a `JOIN` within the subquery, notice
how the `filter(parent__isnull=False)` predicate results in a subquery
instead of an `INNER JOIN` like it does when `.update` is not used in
comment:9.
I assume we'd want do something similar for `OuterRef` that include a `__`
so the resulting query is
{{{#!python
latest_parent =
Recursive.objects.filter(link=OuterRef('parent__link')).order_by('id')
Recursive.objects.filter(
parent__isnull=False
).update(parent_id=Subquery(latest_parent.values('pk')[:1])
}}}
{{{#!sql
UPDATE "recursive_model"
SET "parent_id" = (
SELECT U0."id"
FROM "recursive_model" U0
WHERE U0."link" IN (
SELECT U1."link"
FROM "recursive_model" U1
WHERE U1."id" = "recursive_model"."parent_id"
)
ORDER BY U0."id" ASC
LIMIT 1
)
WHERE "recursive_model"."id" IN (
SELECT V0."id"
FROM "recursive_model" V0
INNER JOIN "recursive_model" V1
ON (V0."parent_id" = V1."id")
WHERE V0."parent_id" IS NOT NULL
)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:12>
Comment (by Chetan Khanna):
That comment was really insightful, thank you! :)
If you don't mind confirming, then this is what I could get from the
comment:
Whenever there is a `filter()` clause, preceding an `update()` clause, and
containing fields from related models joined via `__`, the ORM gives back
a SQL with an `INNER JOIN` inside a subquery. And for this issue, we want
to have a similar functionality in the `OuterRef` that contain `__`.
If so, I think I have created a test from our test suite that might serve
as a starting point.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:13>
* Attachment "ticket29214.diff" added.
Test file
Comment (by Simon Charette):
Chetan, almost correct. The query will not necessarily involve generating
an `INNER JOIN` but it happens to do so in this case.
Most of the logic lives in
[https://github.com/django/django/blob/a6b3938afc0204093b5356ade2be30b461a698c5/django/db/models/sql/compiler.py#L1513-L1553
SQLUpdateCompiler] but the idea is that if the backend supports it the
whole filtering part of the queryset will be pushed down to a subquery,
`JOIN`s included, and filtered against using `pk IN`.
This will be way trickier to implement for subqueries and I'm starting to
wonder if we didn't get it wrong in the first place by making
`OuteRef('outerqs__lookup')` result in a new `JOIN` in the outer query
instead of encapsulating it in the subquery. The way we do it right now
breaks aggregation in subtle ways #29214 and happens to break update as
well as we came to discover.
If the `JOIN` was always encapsulated in the subquery in the first place
it would solve this ticket, #29214 and I suspect it would make
implementing #28296 way easier.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:14>
* owner: nobody => Chetan Khanna
* status: new => assigned
* has_patch: 0 => 1
Comment:
Yes, it sure is tricky!
https://github.com/django/django/pull/12546
I have prepared a patch that should work for the update case at least. And
sorry for prolonging this, I had my university exams in the middle and the
implementation didn't come to me at once but I still wanted to give it a
shot. I've tried to follow whatever was said in
https://code.djangoproject.com/ticket/29214#comment:12 as closely as
possible. Awaiting review :)
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:15>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:16>
Comment (by Chetan Khanna):
Sorry, I forgot that. I got occupied with my GSoC application since the
submissions have already started.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:17>
Comment (by Mariusz Felisiak):
#32839 is a duplicate with `FilteredRelation` annotations.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:18>
Comment (by Bálint Balina):
Is there a way I can help with this issue? I have just reported
[https://code.djangoproject.com/ticket/32839] and I am eager to resolve
the problem, but I can see here that there is already a work in progress
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:19>
Comment (by Jacob Walls):
Absolutely! If you want to use the previous patch as a starting point, be
sure to credit the original author with `Co-authored-by: Name <email>` in
the commit message. Assign yourself the ticket when you intend to get
going.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:20>
Comment (by Bálint Balina):
Ok, I have super simple solution, but I would like to pre check if this is
a possible way to go. Can we use lower case table aliases?
I have applied the following monkeypatch, and everything works fine for my
case:
{{{
Query.alias_prefix = 't'
Query.subq_aliases = frozenset(['t'])
def bump_prefix(self, outer_query):
...
alphabet = ascii_lowercase
...
Query.bump_prefix = bump_prefix
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:21>
Comment (by Simon Charette):
No it's just working around the issue and will break on backends that
follow the SQL standard with regards to unquoted aliases (e.g. Oracle).
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:22>
* owner: Chetan Khanna => (none)
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:23>
Well, I came here from #32839 and believe that it better explains the
issue. I also see a 'missing FROM-clause entry for table "T5" ' error when
using an annotated field in a subquery with OuterRef which is joined
multiple times in the query and gets the name of T5. When I checked the
raw query I saw that the table joined as T5 but used in the subquery as
"T5" and then raised an error.
--
Ticket URL: <https://code.djangoproject.com/ticket/29214#comment:24>