[Django] #28939: QuerySet used by prefetch_related() does not use expected connection.

29 views
Skip to first unread message

Django

unread,
Dec 18, 2017, 1:09:48 PM12/18/17
to django-...@googlegroups.com
#28939: QuerySet used by prefetch_related() does not use expected connection.
-------------------------------------+-------------------------------------
Reporter: Nick Pope | Owner: nobody
Type: Bug | Status: new
Component: Database | Version: master
layer (models, ORM) | Keywords: prefetch,
Severity: Normal | prefetch_related, using, connection
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I've run into an case with prefetching where the connection used for the
prefetch queries is not the one I would expect:

{{{#!python
# When using the default database connection:

In [1]: from django.db import connections

In [2]: connections['default'].queries
Out[2]: []

In [3]: connections['readonly'].queries
Out[3]: []

In [4]: User.objects.prefetch_related('operators').first()
Out[4]: <User: example>

In [5]: connections['default'].queries
Out[5]:
[{'sql': 'SELECT ... FROM "auth_user" ORDER BY "auth_user"."id" ASC LIMIT
1', 'time': '0.205'},
{'sql': 'SELECT ... AS "_prefetch_related_val_user_id", ... FROM
"operator" INNER JOIN "operator_users" ON ("operator"."id" =
"operator_users"."operator_id") WHERE "operator_users"."user_id" IN (1)
ORDER BY "operator"."code" ASC', 'time': '0.011'}]

In [6]: connections['readonly'].queries
Out[6]: []

# When specifying the database connection to use:

In [1]: from django.db import connections

In [2]: connections['default'].queries
Out[2]: []

In [3]: connections['readonly'].queries
Out[3]: []

In [4]:
User.objects.using('readonly').prefetch_related('operators').first()
Out[4]: <User: example>

In [5]: connections['default'].queries
Out[5]:
[{'sql': 'SELECT ... AS "_prefetch_related_val_user_id", ... FROM
"operator" INNER JOIN "operator_users" ON ("operator"."id" =
"operator_users"."operator_id") WHERE "operator_users"."user_id" IN (1)
ORDER BY "operator"."code" ASC', 'time': '0.010'}]

In [6]: connections['readonly'].queries
Out[6]:
[{'sql': 'SELECT ... FROM "auth_user" ORDER BY "auth_user"."id" ASC LIMIT
1', 'time': '0.002'}]
}}}

In the second case I would have expected all queries to be executed on the
{{{readonly}}} connection by default.

This can be achieved by using {{{Prefetch('operators',
queryset=Operator.objects.using('readonly'))}}}, but it means that plain
strings cannot be used and often a lot of nested {{{Prefetch()}}} objects
can be required.

A solution to this could be to do the following:

1. Use the connection from the original {{{QuerySet}}} that called
{{{.prefetch_related()}}} by default.
2. ~~Possibly add {{{QuerySet.prefetch_related(using=...)}}} to allow
overriding for all prefetch queries.~~ (Doesn't fit nicely with the API.)
3. Possibly add {{{Prefetch(using=...)}}} to allow overriding for a
specific branch of prefetching. (This would propagate down.)

--
Ticket URL: <https://code.djangoproject.com/ticket/28939>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Dec 18, 2017, 2:15:50 PM12/18/17
to django-...@googlegroups.com
#28939: QuerySet used by prefetch_related() does not use expected connection.
-------------------------------------+-------------------------------------
Reporter: Nick Pope | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch, | Triage Stage: Accepted
prefetch_related, using, |
connection |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* stage: Unreviewed => Accepted


Comment:

I think 1. would make the most sense here unfortunately I'm afraid this
might break backward compatibility with regards to database router.

For example, given the following scenario

{{{#!python
class Foo(models.Model):
pass

class Bar(models.Model):
foo = models.ForeignKey(Foo, related_name='bars')

class Router(object):
def db_for_read(self, model):
if model is Bar:
return 'other'
return 'default'
}}}

It would change the behaviour of
`Foo.objects.using('default').prefetch_related('bars')` because `using()`
has usually precedence over the routers suggestion. We could special case
prefetches to only default to the base queryset's `using` if no router
provide a `db_for_read` suggestion but that would be counter intuitive IMO
and add even more complexity to the read database alias selection logic.

So, I think that 1. is favourable but would require a deprecation cycle
for the case where both a base query alias is forced though `using` and
that `db_for_read` suggests a database. In the mean time I think that 3.
is easier to implement and could be useful even in a future where 1. would
get fixed.

--
Ticket URL: <https://code.djangoproject.com/ticket/28939#comment:1>

Django

unread,
Dec 19, 2017, 5:54:41 AM12/19/17
to django-...@googlegroups.com
#28939: QuerySet used by prefetch_related() does not use expected connection.
-------------------------------------+-------------------------------------
Reporter: Nick Pope | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: master

Severity: Normal | Resolution:
Keywords: prefetch, | Triage Stage: Accepted
prefetch_related, using, |
connection |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Nick Pope):

* type: Bug => Cleanup/optimization
* component: Database layer (models, ORM) => Documentation


Old description:

New description:

**Update:** I should have mentioned that I was using a custom database
router. I now know this issue occurred due to a bug in that router.

----

--

Comment:

Hi Simon,

Sorry - false alarm.

I did some digging and found
{{{tests.prefetch_related.tests.MultiDbTests}}} which seems to show my
scenario working correctly.

I've subsequently realised that I wasn't handling {{{instance._state.db}}}
correctly in {{{db_for_read()}}} in my router.

It may be worth improving the documentation on database routers and/or
{{{prefetch_related()}}} warning of this pitfall.

I have left this ticket open, but feel free to close it if you don't think
the documentation needs improving.

Here is (an example of) my fixed database router:

{{{#!python
class Router(object):

def __init__(self):
self.mapping = {'special': 'other'}

def db_for_read(self, model, **hints):
db = self.mapping.get(model._meta.app_label, 'default')
instance = hints.get('instance')
return instance._state.db or db if instance else db

def db_for_write(self, model, **hints):
return self.mapping.get(model._meta.app_label, 'default')

def allow_relation(self, obj1, obj2, **hints):
db1 = self.mapping.get(obj1._meta.app_label, 'default')
db2 = self.mapping.get(obj2._meta.app_label, 'default')
return db1 == db2

def allow_migrate(self, db, app_label, model_name=None, **hints):
return db == self.mapping.get(app_label, 'default')
}}}

Beforehand, {{{db_for_read()}}} was the same as {{{db_for_write()}}}.

Essentially there are two databases ({{{default}}} and {{{other}}}) which
also both have read-only connections to a replica ({{{default-ro}}} and
{{{other-ro}}}).
This is an application-routing router - everything goes to {{{default}}}
unless it is from the {{{special}}} application which goes to {{{other}}}.
Typically we opt-in to using the read-only connections via
{{{QuerySet.using()}}} and this is where things then broke due to not
handling {{{instance._state.db}}}.

--
Ticket URL: <https://code.djangoproject.com/ticket/28939#comment:2>

Django

unread,
Dec 30, 2017, 10:13:55 PM12/30/17
to django-...@googlegroups.com
#28939: Document which ORM methods provide an instance hint to database routers

-------------------------------------+-------------------------------------
Reporter: Nick Pope | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: master
Severity: Normal | Resolution:
Keywords: prefetch, | Triage Stage: Accepted
prefetch_related, using, |
connection |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

Hey Nick!

> It may be worth improving the documentation on database routers and/or
prefetch_related() warning of this pitfall.

I'm not sure how we could improve the current routers documentation given
we link to [https://docs.djangoproject.com/en/2.0/topics/db/multi-
db/#hints hints] in both `db_for_read` and `db_for_write` section but I
guess the `prefetch_related` one and all the other database methods that
provide an `instance` hint to routers could be improved.

I think there would also be benefit in eventually enhancing the routing
context with additional hints such as `operation`, `field`,
`related_instance`, `cardinality` and such. I've had instances in the past
where being able to differentiate between heavy (e.g.
`prefetch_related()`) and lightweight (e.g. `refresh_from_db()`, `get()`)
reads would have been quite useful.

--
Ticket URL: <https://code.djangoproject.com/ticket/28939#comment:3>

Django

unread,
Sep 17, 2020, 8:26:51 AM9/17/20
to django-...@googlegroups.com
#28939: Document which ORM methods provide an instance hint to database routers
-------------------------------------+-------------------------------------
Reporter: Nick Pope | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: master
Severity: Normal | Resolution:
Keywords: prefetch, | Triage Stage: Accepted
prefetch_related, using, |
connection |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson <carlton@…>):

In [changeset:"8c0794ba0da2b5d668a7eb1c167e54beb7f40890" 8c0794ba]:
{{{
#!CommitTicketReference repository=""
revision="8c0794ba0da2b5d668a7eb1c167e54beb7f40890"
Refs #28939 -- Doc’d Prefetch behavior with multiple DBs.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28939#comment:4>

Django

unread,
Sep 17, 2020, 8:28:13 AM9/17/20
to django-...@googlegroups.com
#28939: Document which ORM methods provide an instance hint to database routers
-------------------------------------+-------------------------------------
Reporter: Nick Pope | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: master
Severity: Normal | Resolution:
Keywords: prefetch, | Triage Stage: Accepted
prefetch_related, using, |
connection |
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson <carlton.gibson@…>):

In [changeset:"17a408207800ec3ceaf9c29fbb2c42629fd0ff76" 17a40820]:
{{{
#!CommitTicketReference repository=""
revision="17a408207800ec3ceaf9c29fbb2c42629fd0ff76"
[3.1.x] Refs #28939 -- Doc’d Prefetch behavior with multiple DBs.

Backport of 8c0794ba0da2b5d668a7eb1c167e54beb7f40890 from master
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/28939#comment:5>

Reply all
Reply to author
Forward
0 new messages