[Django] #32772: Database cache counts the DB size twice at a performance penalty

21 views
Skip to first unread message

Django

unread,
May 19, 2021, 6:17:46 PM5/19/21
to django-...@googlegroups.com
#32772: Database cache counts the DB size twice at a performance penalty
-----------------------------------------+------------------------
Reporter: Mike Lissner | Owner: nobody
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 3.2
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-----------------------------------------+------------------------
We have a lot of entries in the DB cache, and I've noticed that the
following query shows up in my slow query log kind of a lot (Postgresql is
slow at counting things):

{{{
SELECT COUNT(*) FROM cache_table;
}}}

This query is being run by the DB cache **twice** for every cache update
in order to determine if culling is needed. First, in the cache setting
code, it runs:

{{{
cursor.execute("SELECT COUNT(*) FROM %s" % table)
num = cursor.fetchone()[0]
now = timezone.now()
now = now.replace(microsecond=0)
if num > self._max_entries:
self._cull(db, cursor, now)
}}}

(https://github.com/django/django/blob/d06c5b358149c02a62da8a5469264d05f29ac659/django/core/cache/backends/db.py#L120-L131)

Then in self._cull (the last line above) it runs:

{{{
cursor.execute("DELETE FROM %s WHERE expires < %%s" % table,
[connection.ops.adapt_datetimefield_value(now)])
cursor.execute("SELECT COUNT(*) FROM %s" % table)
num = cursor.fetchone()[0]
if num > self._max_entries:
# Do culling routine here...
}}}

(https://github.com/django/django/blob/d06c5b358149c02a62da8a5469264d05f29ac659/django/core/cache/backends/db.py#L254-L260)

The idea is that if the MAX_ENTRIES setting is exceeded, it'll cull the DB
cache down by some percentage so it doesn't grow forever.

I think that's fine, but given that the SELECT COUNT(*) query is slow, I
wonder two things:

1. Would a refactor to remove the second query be a good idea? If you pass
the count from the first query into the `_cull` method, you can then do:

{{{
def _cull(self, db, cursor, now, count):
...
cursor.execute("DELETE FROM %s WHERE expires < %%s" % table,
[connection.ops.adapt_datetimefield_value(now)])
deleted_count = cursor.rowcount
num = count - deleted_count
if num > self._max_entries:
# Do culling routine here...
}}}

That seems like a simple win.

2. Is it reasonable to not run the culling code *every* time that we set a
value? Like, could we run it every tenth time or every 100th time or
something?

If this is a good idea, does anybody have a proposal for how to count
this? I'd be happy just doing it on a mod of the current millisecond, but
there's probably a better way (randint?).

Would a setting be a good idea here? We already have MAX_ENTRIES and
CULL_FREQUENCY. CULL_FREQUENCY is "the fraction of entries that are culled
when ``MAX_ENTRIES`` is reached." That sounds more like it should have
been named CULL_RATIO (regrets!), but maybe a new setting for this could
be called "CULL_EVERY_X"?

I think the first change is a no-brainer, but both changes seem like wins
to me. Happy to implement either or both of these, but wanted buy-in
first.

--
Ticket URL: <https://code.djangoproject.com/ticket/32772>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 19, 2021, 6:21:58 PM5/19/21
to django-...@googlegroups.com
#32772: Database cache counts the DB size twice at a performance penalty
-------------------------------------+-------------------------------------

Reporter: Mike Lissner | Owner: nobody
Type: Uncategorized | Status: new
Component: Core (Cache system) | Version: dev
Severity: Normal | Resolution:
Keywords: cache, database | Triage Stage:

| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mike Lissner):

* keywords: => cache, database
* version: 3.2 => dev
* component: Uncategorized => Core (Cache system)


--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:1>

Django

unread,
May 20, 2021, 12:54:56 AM5/20/21
to django-...@googlegroups.com
#32772: Database cache counts the DB size twice at a performance penalty
--------------------------------------+------------------------------------

Reporter: Mike Lissner | Owner: nobody
Type: Cleanup/optimization | Status: new

Component: Core (Cache system) | Version: dev
Severity: Normal | Resolution:
Keywords: cache, database | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Simon Charette):

* type: Uncategorized => Cleanup/optimization
* stage: Unreviewed => Accepted


Comment:

I'm not sure how widely used this backend is for large dataset but I agree
with you that both of these changes are no brainers.

I like the idea of having a cache option control the culling frequency and
I don't have a strong opinion on the implementation itself. Could we
possibly focus this ticket on the `COUNT` reduction off `cursor.rowcount`
and have another one track the culling frequency changes though?

FWIW this backend also suffers from other limitations that could be worth
fixing (#23326) if you're interested in doing that.

--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:2>

Django

unread,
May 25, 2021, 12:37:45 PM5/25/21
to django-...@googlegroups.com
#32772: Database cache counts the DB size twice at a performance penalty
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: Mike
Type: | Lissner
Cleanup/optimization | Status: assigned

Component: Core (Cache system) | Version: dev
Severity: Normal | Resolution:
Keywords: cache, database | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mike Lissner):

* owner: nobody => Mike Lissner
* status: new => assigned
* has_patch: 0 => 1


Comment:

I put together a PR for this here:
https://github.com/django/django/pull/14447

(I'm setting the has_patch flag. Hopefully this is right.)

--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:3>

Django

unread,
May 25, 2021, 4:43:52 PM5/25/21
to django-...@googlegroups.com
#32772: Database cache counts the DB size twice at a performance penalty
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: Mike
Type: | Lissner
Cleanup/optimization | Status: assigned
Component: Core (Cache system) | Version: dev
Severity: Normal | Resolution:
Keywords: cache, database | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

That's looking great, thanks! It's now showing up
[https://code.djangoproject.com/query?status=!closed&needs_better_patch=0&needs_tests=0&needs_docs=0&has_patch=1&stage=Accepted&desc=1&order=changetime
in the review queue] so it will eventually be picked up.

--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:4>

Django

unread,
May 26, 2021, 5:23:58 AM5/26/21
to django-...@googlegroups.com
#32772: Database cache counts the DB size twice at a performance penalty
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: Mike
Type: | Lissner
Cleanup/optimization | Status: assigned
Component: Core (Cache system) | Version: dev
Severity: Normal | Resolution:
Keywords: cache, database | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:5>

Django

unread,
May 26, 2021, 5:50:38 AM5/26/21
to django-...@googlegroups.com
#32772: Database cache counts the DB size twice at a performance penalty
-------------------------------------+-------------------------------------
Reporter: Mike Lissner | Owner: Mike
Type: | Lissner
Cleanup/optimization | Status: closed

Component: Core (Cache system) | Version: dev
Severity: Normal | Resolution: fixed

Keywords: cache, database | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"5a8e8f80bb82a867eab7e4d9d099f21d0a976d22" 5a8e8f80]:
{{{
#!CommitTicketReference repository=""
revision="5a8e8f80bb82a867eab7e4d9d099f21d0a976d22"
Fixed #32772 -- Made database cache count size once per set.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:6>

Reply all
Reply to author
Forward
0 new messages