[Django] #29215: Document potential change to behaviour of QuerySet methods when upgrading to Python 3.6

Django

unread,

Mar 13, 2018, 10:20:15 PM3/13/18

to django-...@googlegroups.com

#29215: Document potential change to behaviour of QuerySet methods when upgrading
to Python 3.6
-------------------------------------+-------------------------------------
Reporter: Matt | Owner: nobody
Fisher |
Type: | Status: new
Cleanup/optimization |
Component: | Version: 2.0
Documentation |
Severity: Normal | Keywords: documentation
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Python 3.6 introduces
[https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep468 a change]
that preserves the ordering of keyword arguments -
[https://www.python.org/dev/peps/pep-0468/ PEP 468: Preserving Keyword
Argument Order]

Since this is not a difference between Python 2 and 3 per se, it is not
mentioned in the [https://docs.djangoproject.com/en/1.11/topics/python3/
“Porting to Python 3”] Django 1.11 topic page.

Preserving keyword argument ordering can change the SQL produced by
QuerySet methods that accept `**kwargs`, specifically .get, .filter, and
.exclude, and potentially also .values, .annotate, .aggregate, .create,
.get_or_create, .update, and .update_or_create.

As an example,
`.filter(b__in=some_qs, a=1)`
could produce
`... WHERE (a=1 AND b IN (SELECT …))` in Python < 3.6 and
`... WHERE (b IN (SELECT …) AND a=1)` in Python >= 3.6.
The query is semantically equivalent but the postgres query planner can
produce dramatically different query plans and resultant performance if
the query has a significant number of joins.

We had one particular frequently-run query that ran in 2-3 seconds in
Python 2, and 80+ seconds in Python 3.6. The problem was only apparent in
production, and presented as severely degraded responsiveness from the
postgres RDS database. The additional load on the database made most other
queries take longer as well, which made it difficult to track down the
cause as it was not obvious that any queries could have changed. To
compound the problem, the different query execution caused many more temp
block writes on the RDS instance, which burned through our IOPS burst
balance and further degraded site performance. Because the query didn’t
change semantically, all our automated tests passed and we didn’t see the
issue until it was in production under full load. Googling the problem
produced no good results, which suggests it is not common, but in need of
a useful resource.

People may be affected by this change when upgrading from Python 2.7 to
Python 3.6+ using Django 1.x, in which case it would be helpful to have a
warning in the [https://docs.djangoproject.com/en/1.11/topics/python3/
1.11 porting topic], but it will also occur when people are upgrading from
Python 3.[0-5] to 3.6+ while potentially using Django 2, in which case the
porting topic is no longer in the current docs. Not sure what would be an
appropriate place for a warning in this case.

--
Ticket URL: <https://code.djangoproject.com/ticket/29215>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Mar 14, 2018, 12:03:59 AM3/14/18

to django-...@googlegroups.com

#29215: Document potential change to behaviour of QuerySet methods when upgrading
to Python 3.6
-------------------------------------+-------------------------------------

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by Simon Charette):

* cc: Simon Charette (added)

Comment:

This behavior has little to do with Python 3.6 itself.

If you had query performing better on Python 2.7 than on Python 3 it's
probably because you weren't using
[https://docs.python.org/2/using/cmdline.html?highlight=pythonhashseed#cmdoption-r
a random hash seed] which is the default on Python 3. In other words, you
were relying on a stable hash seed to build queries in a way that
PostgreSQL query planner's would correctly optimize. If you had updated
from 2.7 to Python 3.5 instead you would have hit the slow query issue
from time to time when the interpreter chose an hash seed that made
`list({'b__in', 'a'}) == ['b__in', 'a']`. In some way Python 3.6 allows
you to reliably build your `WHERE` statement through kwargs ordering while
it wasn't possible before.

I guess we could add a section about the fact Python 3 has random hash
seeding on by default and that may break code making assumptions about
unordered data structures ordering. Maybe we should also suggest running
Python 2.7 with the `-R` flag on when porting code?

--
Ticket URL: <https://code.djangoproject.com/ticket/29215#comment:1>

Django

unread,

Mar 14, 2018, 11:54:26 AM3/14/18

to django-...@googlegroups.com

#29215: Document potential change to behaviour of QuerySet methods when upgrading
to Python 3.6
-------------------------------------+-------------------------------------

Reporter: Matt Fisher | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Documentation | Version: 2.0
Severity: Normal | Resolution:
Keywords: documentation | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

I think the documentation suggests that `.filter(b__in=some_qs, a=1)` is
equivalent to `.filter(a=1, b__in=some_qs)`. I'm not sure if we should
document the subtleties you brought up about the actual SQL that's
produced. I doubt there are any tests for this "feature." If we don't
document it, then documenting a possible change regarding it doesn't make
much sense. Well, it's now informally documented with this ticket.

By the way, we're no longer updating the Django 1.11 documentation, which
includes the "porting to Python 3" document. The Django 1.11 documentation
[https://docs.djangoproject.com/en/1.11/howto/deployment/checklist
/#python-options recommends] using `python -R` for a random hash seed,
when using Python 2.

--
Ticket URL: <https://code.djangoproject.com/ticket/29215#comment:2>

Django

unread,

Mar 22, 2018, 10:24:59 AM3/22/18

to django-...@googlegroups.com

#29215: Document potential change to behaviour of QuerySet methods when upgrading
to Python 3.6
-------------------------------------+-------------------------------------

Reporter: Matt Fisher | Owner: nobody

Type: | Status: closed

Cleanup/optimization |
Component: Documentation | Version: 2.0

Severity: Normal | Resolution: invalid

Keywords: documentation | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by Carlton Gibson):

* status: new => closed
* resolution: => invalid

Comment:

There's not much we can realistically do here.

I'd say it would be border-line for whether to include it in the 1.11
porting topic page, but since that's no longer maintained that question is
moot.

I feel the issue itself out of scope for Django: we're generating correct
SQL, which is meant to be declarative, i.e. independent of how the RDMS
implements the query.
This falls with any number of backend specific quirks that we just can't
get into. (It's interesting what the Postgres folks would say about it.)

--
Ticket URL: <https://code.djangoproject.com/ticket/29215#comment:3>

Reply all

Reply to author

Forward