[Django] #34091: Invalid SQL: FROM clauses can be omitted when QuerySet is accessed from multiple threads

15 views
Skip to first unread message

Django

unread,
Oct 13, 2022, 9:03:55 AM10/13/22
to django-...@googlegroups.com
#34091: Invalid SQL: FROM clauses can be omitted when QuerySet is accessed from
multiple threads
-------------------------------------+-------------------------------------
Reporter: Marti | Owner: nobody
Raudsepp |
Type: Bug | Status: new
Component: Database | Version: 4.1
layer (models, ORM) | Keywords: race-condition,
Severity: Normal | orm, threading, SQLCompiler
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I am experiencing very rare cases, where Django generates invalid SQL for
`select_related()` tables. This occurs in a place where multiple threads
are trying to execute the same QuerySet at the same time. (The access from
multiple threads was unintentional, but I believe this deserves to be
fixed in Django anyway)

When this occurs, tables/joins in the FROM clause can be missing entirely,
but they are still referenced in the SELECT clause, and the query fails,
for example `sqlite3.OperationalError: no such column: app_fk18.id` or
`ORA-00904: "xxx"."xxx": invalid identifier`

I could not reproduce this with vanilla Django, but after patching Django
to add `time.sleep(0)` in `SQLCompiler.compile()`, I can reliably
reproduce this.

The reproducer and longer explanation is here: https://github.com/intgr
/bug-reports/tree/main/django-query-race-condition

I suspect some mutation of the `Query` object is going on during SQL
compilation, but by efforts to narrow down the mutation have been
unsuccessful so far.

--
Ticket URL: <https://code.djangoproject.com/ticket/34091>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Oct 13, 2022, 10:51:19 AM10/13/22
to django-...@googlegroups.com
#34091: Invalid SQL: FROM clauses can be omitted when QuerySet is accessed from
multiple threads
-------------------------------------+-------------------------------------
Reporter: Marti Raudsepp | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: race-condition, | Triage Stage:
orm, threading, SQLCompiler | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

> The access from multiple threads was unintentional, but I believe this
deserves to be fixed in Django anyway

Thank you for your report and investigation but I disagree with your
conclusion.

Django doesn't provide any thread safety guarantees for ORM objects (model
instances, querysets) and it's your responsibility to ensure that copies
of these objects are used in threads instead of sharing them. That's the
reason why thread safe built-in abstractions that interact with the ORM
and share references to objects (e.g. generic model views) make sure
[https://github.com/django/django/blob/da2621c3dfef934000c479750e0276d992025542/django/views/generic/list.py#L32
they create copies of the references before manipulating them].

Even if we were to have the compiler make
[https://github.com/django/django/blob/da2621c3dfef934000c479750e0276d992025542/django/db/models/sql/compiler.py#L709-L926
a copy of its query before altering its alias counts] and
[https://github.com/django/django/blob/da2621c3dfef934000c479750e0276d992025542/django/db/models/sql/compiler.py#L924-L926
then setting it back] it's likely only one of the locations that make
query compilation non-thread safe.

We can't accept this ticket without committing to making the ORM entirely
thread-safe with the performance and complexity penalties it incurs. If
you disagree with my conclusion feel free to open a thread on
[https://docs.djangoproject.com/en/4.1/internals/mailing-lists/#django-
developers the developers mailing list] to discuss the subject further.

--
Ticket URL: <https://code.djangoproject.com/ticket/34091#comment:1>

Django

unread,
Oct 13, 2022, 10:51:25 AM10/13/22
to django-...@googlegroups.com
#34091: Invalid SQL: FROM clauses can be omitted when QuerySet is accessed from
multiple threads
-------------------------------------+-------------------------------------
Reporter: Marti Raudsepp | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: wontfix

Keywords: race-condition, | Triage Stage:
orm, threading, SQLCompiler | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* status: new => closed
* resolution: => wontfix


--
Ticket URL: <https://code.djangoproject.com/ticket/34091#comment:2>

Django

unread,
Oct 13, 2022, 11:34:46 AM10/13/22
to django-...@googlegroups.com
#34091: Invalid SQL: FROM clauses can be omitted when QuerySet is accessed from
multiple threads
-------------------------------------+-------------------------------------
Reporter: Marti Raudsepp | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: race-condition, | Triage Stage:
orm, threading, SQLCompiler | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Marti Raudsepp):

> Django doesn't provide any thread safety guarantees for ORM objects
(model instances, querysets) and it's your responsibility

Do document this! Most Django APIs seem to be thread-safe, how is a user
supposed to know, where the limits of thread (un)safety are?

> make sure ​they create copies of the references before manipulating
them.

So `QuerySet.all()` is thread-safe, but ''iterating'' over QuerySet is
not? How is one supposed to know?

Before investigating this, I explicitly searched to figure out whether
QuerySets are thread safe, the only result that was sort-of authoritative
was https://code.djangoproject.com/ticket/11906 and since it's marked as
"fixed", it looked like this API aims to support thread safety.

--
Ticket URL: <https://code.djangoproject.com/ticket/34091#comment:3>

Django

unread,
Oct 13, 2022, 4:39:02 PM10/13/22
to django-...@googlegroups.com
#34091: Invalid SQL: FROM clauses can be omitted when QuerySet is accessed from
multiple threads
-------------------------------------+-------------------------------------
Reporter: Marti Raudsepp | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: wontfix
Keywords: race-condition, | Triage Stage:
orm, threading, SQLCompiler | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

> Do document this! Most Django APIs seem to be thread-safe, how is a user
supposed to know, where the limits of thread (un)safety are?

I was under the impression that this was formally documented but just like
multi-processing gotchas when dealing with connections it doesn't seem to
be the case.

> So QuerySet.all() is thread-safe, but iterating over QuerySet is not?
How is one supposed to know?

Accessing `shared_qs: QuerySet` in a thread and calling `thread_qs =
shared_qs.all()` creates a copy of the queryset
[https://docs.djangoproject.com/en/4.1/ref/models/querysets/#all as
document] hence threads don't share the same instance.

> he only result that was sort-of authoritative was #11906 and since it's


marked as "fixed", it looked like this API aims to support thread safety.

I again don't agree with your conclusion here. If you read the ticket's
evolution it was marked as ''fixed'' because `QuerySet._fill_cache` was
removed during a refactor, not because thread safety guarantees for ORM
objects changed. There's even a comment in there that points to a page
that is now long gone and stated

> However, QuerySets are known not to be thread-safe, see #11906. Usually
that does not pose problems as '''they are (or should be) not shared
between threads in Django'''. '''The exception to that rule is the use of
exotic global/class-level/shared instance variable querysets in your own
code''' (e.g. when using the ORM outside of the Django dispatch system),
where '''you are assumed to know what you are doing and protect them
appropriately anyway'''.

--
Ticket URL: <https://code.djangoproject.com/ticket/34091#comment:4>

Reply all
Reply to author
Forward
0 new messages