[Django] #30079: Prefetch cache should be aware of database source and .using() should not always trigger a new query

10 views
Skip to first unread message

Django

unread,
Jan 5, 2019, 11:46:14 AM1/5/19
to django-...@googlegroups.com
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
Reporter: Mike | Owner: nobody
Lissner |
Type: Bug | Status: new
Component: Database | Version: 2.1
layer (models, ORM) | Keywords: prefetch,
Severity: Normal | multidatabase
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I've been poking around the darker edges of multidatabase prefetching and
I discovered a strange pattern. When you have cached prefetch values,
they're not aware of which database they came from:

{{{
# This runs zero queries
In [62]: pizzas =
Pizza.objects.filter(pk=17).prefetch_related('toppings').using('replica')

# This runs two, as expected
In [63]: pizza0 = pizzas[0]

# This runs zero, even though it normally would come from "default" and
the cache came from "replica"
In [64]: pizza0.toppings.all()
Out[64]: [<Topping 1: Cheese>, <Topping 2: Bacon>]

# This runs one query, even though this data should already be populated.
In [65]: pizza0.toppings.all().using('replica')
Out[65]: [<Topping 1: Cheese>, <Topping 2: Bacon>]
}}}

I was pretty surprised and confused that adding the using() method on the
last line above busted the cache even when the database selected was the
same as the one used to pre-populate the cache.

I was also surprised (though less so) at the opposite, that *not*
including the non-default DB (on line 64) *was* able to hit the cache
(which was populated by "replica") instead of hitting the default database
(as the query would normally do).

On IRC chatting about this, the question was: "Well, should filter hit the
DB or use the cache? What about exclude and other methods? Where do you
draw the line?" I think the simple line to draw is whether something
changes the SQL query. using() methods don't change SQL, so they should
work properly on a cached result. filter() and exclude() do change the
SQL, so they *shouldn't* use the cache.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jan 16, 2019, 6:11:09 AM1/16/19
to django-...@googlegroups.com
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch, | Triage Stage:
multidatabase | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* version: 2.1 => master
* type: Bug => Cleanup/optimization


Comment:

Hi Mike. Thanks for the report.

On the basis of the difference between `ln64` and `ln65` I'm inclined to
accept this. `pizza0` came from `replica` and the `ln64` call just goes
with that. (That seems right/expected to me...) I see your point that
adding the `using()` shouldn't change that.

At first glance the ''"if you didn't add a `filter()`/`exclude()`"'' idea
seems OK too.

Two questions though:

1. What's the use-case for re-adding the `using()` call in `ln65`? I can't
quite see why you'd do that...?
2. Have you looked at all at what a patch might look like? (If it's simple
enough, that helps...)

Thanks.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079#comment:1>

Django

unread,
Jan 16, 2019, 1:39:58 PM1/16/19
to django-...@googlegroups.com
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch, | Triage Stage:
multidatabase | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mike Lissner):

Replying to [comment:1 Carlton Gibson]:


> On the basis of the difference between `ln64` and `ln65` I'm inclined to
accept this. `pizza0` came from `replica` and the `ln64` call just goes
with that. (That seems right/expected to me...) I see your point that
adding the `using()` shouldn't change that.

Good! I made myself clear. Always a good thing.

> At first glance the ''"if you didn't add a `filter()`/`exclude()`"''
idea seems OK too.

That was my analysis too: "This seems to be a reasonable line in the
sand."

> Two questions though:
>
> 1. What's the use-case for re-adding the `using()` call in `ln65`? I

can't quite see why you'd do that...? (Another way of looking at `using()`
might be "go back to **this** DB"...)

I can't make a really clear argument for my use case, but basically it
was, "I'm doing weird things with prefetching and I want to be sure Django
pulls this data properly from the right DB/cache." In other words, it was
me trying to be explicit about where the caches were *supposed* come from,
and then getting bit by having the caches busted. From my reading, ln64 is
totally unclear which DB it would use since it would normally use the
"default" DB, but had a cached value, so which wins? My solution was to be
more explicit to make sure it used the cache, but that backfired (luckily,
I was watching the query logs).

> 2. Have you looked at all at what a patch might look like? (If it's
simple enough, that helps...)

No, not at all, unfortunately. I've looked into the Django code from time
to time, but mostly I stay on my side of the API, lest I get in trouble.
Especially when it comes to the ORM bits.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079#comment:2>

Django

unread,
Jan 22, 2019, 4:54:23 AM1/22/19
to django-...@googlegroups.com
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: prefetch, | Triage Stage:
multidatabase | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* Attachment "ticket_30079.patch" added.

Adjustment of admonition wording.

Django

unread,
Jan 22, 2019, 5:02:16 AM1/22/19
to django-...@googlegroups.com
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Documentation | Version: master
Severity: Normal | Resolution: wontfix

Keywords: prefetch, | Triage Stage:
multidatabase | Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* status: new => closed
* resolution: => wontfix
* has_patch: 0 => 1
* component: Database layer (models, ORM) => Documentation


Comment:

OK, at best, this is a cleanup on the documentation. The admonition note
in the
[https://docs.djangoproject.com/en/2.1/ref/models/querysets/#django.db.models.query.QuerySet.prefetch_related
`prefech_related` docs] already covers this:

> Remember that, as always with **QuerySets**, any subsequent chained
methods which imply a different database query will ignore previously
cached results, and retrieve data using a fresh database query.

`using()` is a ''subsequent chained method''. It returns a new QuerySet,
and the results and prefetch caches are not transferred.

Arguably, `using()`, whilst returning a new QuerySet, doesn't actually
imply a different query.

As such I drafted the attached patch.

> ... as always with ``QuerySets``, any subsequent chained methods which
return a new ``QuerySet``, normally implying a different database query
...

In my opinion, the patch unnecessarily complicates the meaning. We know
from our use of QuerySets that all the
[https://docs.djangoproject.com/en/2.1/ref/models/querysets/#methods-that-
return-new-querysets Methods that return new QuerySets] behave this way,
even if we might forget it occasionally. Thus `wontfix`.

Thanks for the report Mike!

--
Ticket URL: <https://code.djangoproject.com/ticket/30079#comment:3>

Django

unread,
Jan 22, 2019, 12:02:05 PM1/22/19
to django-...@googlegroups.com
#30079: Prefetch cache should be aware of database source and .using() should not
always trigger a new query
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: nobody
Type: | Status: closed
Cleanup/optimization |
Component: Documentation | Version: master
Severity: Normal | Resolution: wontfix
Keywords: prefetch, | Triage Stage:
multidatabase | Unreviewed
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mike Lissner):

Whew, that's pretty subtle in both the old and the new versions of the
documentation. I have to confess that I don't think of chaining as the
trigger for a new queryset. I think about the query as a new trigger for a
queryset. I guess I never read this part of the documentation or didn't
take it to heart.

I don't know, I don't think we can document our way out of this behavior,
but I have no idea how difficult this is to fix technically.

--
Ticket URL: <https://code.djangoproject.com/ticket/30079#comment:4>

Reply all
Reply to author
Forward
0 new messages