{{{
SELECT COUNT(*) FROM cache_table;
}}}
This query is being run by the DB cache **twice** for every cache update
in order to determine if culling is needed. First, in the cache setting
code, it runs:
{{{
cursor.execute("SELECT COUNT(*) FROM %s" % table)
num = cursor.fetchone()[0]
now = timezone.now()
now = now.replace(microsecond=0)
if num > self._max_entries:
self._cull(db, cursor, now)
}}}
Then in self._cull (the last line above) it runs:
{{{
cursor.execute("DELETE FROM %s WHERE expires < %%s" % table,
[connection.ops.adapt_datetimefield_value(now)])
cursor.execute("SELECT COUNT(*) FROM %s" % table)
num = cursor.fetchone()[0]
if num > self._max_entries:
# Do culling routine here...
}}}
The idea is that if the MAX_ENTRIES setting is exceeded, it'll cull the DB
cache down by some percentage so it doesn't grow forever.
I think that's fine, but given that the SELECT COUNT(*) query is slow, I
wonder two things:
1. Would a refactor to remove the second query be a good idea? If you pass
the count from the first query into the `_cull` method, you can then do:
{{{
def _cull(self, db, cursor, now, count):
...
cursor.execute("DELETE FROM %s WHERE expires < %%s" % table,
[connection.ops.adapt_datetimefield_value(now)])
deleted_count = cursor.rowcount
num = count - deleted_count
if num > self._max_entries:
# Do culling routine here...
}}}
That seems like a simple win.
2. Is it reasonable to not run the culling code *every* time that we set a
value? Like, could we run it every tenth time or every 100th time or
something?
If this is a good idea, does anybody have a proposal for how to count
this? I'd be happy just doing it on a mod of the current millisecond, but
there's probably a better way (randint?).
Would a setting be a good idea here? We already have MAX_ENTRIES and
CULL_FREQUENCY. CULL_FREQUENCY is "the fraction of entries that are culled
when ``MAX_ENTRIES`` is reached." That sounds more like it should have
been named CULL_RATIO (regrets!), but maybe a new setting for this could
be called "CULL_EVERY_X"?
I think the first change is a no-brainer, but both changes seem like wins
to me. Happy to implement either or both of these, but wanted buy-in
first.
--
Ticket URL: <https://code.djangoproject.com/ticket/32772>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* keywords: => cache, database
* version: 3.2 => dev
* component: Uncategorized => Core (Cache system)
--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:1>
* type: Uncategorized => Cleanup/optimization
* stage: Unreviewed => Accepted
Comment:
I'm not sure how widely used this backend is for large dataset but I agree
with you that both of these changes are no brainers.
I like the idea of having a cache option control the culling frequency and
I don't have a strong opinion on the implementation itself. Could we
possibly focus this ticket on the `COUNT` reduction off `cursor.rowcount`
and have another one track the culling frequency changes though?
FWIW this backend also suffers from other limitations that could be worth
fixing (#23326) if you're interested in doing that.
--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:2>
* owner: nobody => Mike Lissner
* status: new => assigned
* has_patch: 0 => 1
Comment:
I put together a PR for this here:
https://github.com/django/django/pull/14447
(I'm setting the has_patch flag. Hopefully this is right.)
--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:3>
Comment (by Simon Charette):
That's looking great, thanks! It's now showing up
[https://code.djangoproject.com/query?status=!closed&needs_better_patch=0&needs_tests=0&needs_docs=0&has_patch=1&stage=Accepted&desc=1&order=changetime
in the review queue] so it will eventually be picked up.
--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:4>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:5>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"5a8e8f80bb82a867eab7e4d9d099f21d0a976d22" 5a8e8f80]:
{{{
#!CommitTicketReference repository=""
revision="5a8e8f80bb82a867eab7e4d9d099f21d0a976d22"
Fixed #32772 -- Made database cache count size once per set.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32772#comment:6>