Cache framework

115 views
Skip to first unread message

Jure Erznožnik

unread,
Aug 17, 2020, 4:30:24 AM8/17/20
to django-d...@googlegroups.com
Apologies for wall of text, but it's a complex issue I'm raising. Hopefully you will see something you thought about in the recent past. I have also indented the rationalisation part, so if it's TL;DR, don't read it. Well, on with it:

I've lately had some dealings with Django caching framework that left me somewhat wanting. To be specific, these things stood out:

  1. There is no listing function. It is impossible to enumerate keys matching a pattern
  2. There are no synchronisation functions.
  3. There is no default redis support

Proceeding with rationalisation:

#1: Sometimes I just need records in the cache that would be easily iterable, easily insertable, but both with minimum locking / racing. An enumeration function would allow for easy retrieval of thusly inserted objects.

This proposal is partly opposed by having support for atomic list operations, such as cache.append_to_list(key, items) and cache.pop_from_list(key, num_of_items). Such operations could be supported using #2

Another possible approach might also be a cache.pop(key) function that would atomically retrieve current value and delete the cache entry at the same time. Such operation could also be supported using #2.

#2: With complex data, there comes a need for synchronisation functions. The easiest to solve with existing commands, we have implemented a mutex class supporting "with Mutex('mutex_name'):" syntax.

There was some debate within our team on what to do when even the cache is distributed among multiple cache servers, but we reached no consensus. It would definitely require a (separate) shared cache server, so maybe it's not really a huge problem as Django caching framework already supports that.

#3: Current cache implementations in Django doesn't have redis support.

I don't know whether that should be included or not, but even django-channels implements the channels backend with redis (and only redis), not with memcached. TBH I do not recall why we went with redis and not memcached when we were deciding infrastructure for our servers, but is seems the rationale we used lead to similar conclusion as that of django-channels developers.

There was also a recent discussion debating obsoletness of memcached (i think, but did not verify just now) implementation. IIRC the problem was use of library that is no longer maintained while an up-to-date, well maintained alternative was available.

Furthermore the django-channels redis backend is implemented such that it requires redis server 5.0 or greater (use of BZPOPMIN command). Ubuntu 18.04 LTS does not have this version in its apt archives. This I only mention as a curiosity as it's pretty easy to solve, but still, it was a nuisance requiring a workaround.

I find it interesting that django-channels does not implement that channels backend with standard Django caching framework, but it seems it was inadequate at the time (hint: redis 5.0 requirement). Perhaps I should look up the discussion threads where channels developers were deciding this approach to see what made them decide for custom redis approach vs using the Django cache.

TBH I find this one the hardest: there is third-party support for Redis caching in django. There must be a policy behind what cache backends should be supported and for some reason that policy has so far excluded Redis. However, I find it conflicting that django-channels would then only support Redis for some of its functionality. I hope I'm not going into politics too much here.


To finish up, is this something that Django core team would be willing to consider for improvement? If yes, I would be willing to provide relevant PRs (Django only), but I wanted to feel the general attitude towards the problem first.

To be clear, this is what I'm proposing to implement:

  1. cache.list(pattern) returning matching cache entries. I would like the pattern to be regex compatible, but I have not done any analysis on listing support for existing backends yet.
  2. hand over the Mutex class we implemented
  3. quite frankly, I don't know what to do about this one. We're using django-redis package for our Redis backend, so I would hate to write something up from scratch. What is the standard approach with this, if support is decided by the core team?
  4. While at it, I have seen that various backends now implement their own pickling. With the exception of memcached, they all implement the same thing. Perhaps it would be beneficial to move pickling functionality to BaseCache?

Thank you for your consideration,
Jure

Adam Johnson

unread,
Aug 17, 2020, 5:33:18 AM8/17/20
to django-d...@googlegroups.com
Hi Jure

The caching framework exists mostly to provide a memcached-like API. Memcached does not support either of your two first options - there's no efficient way to list keys, nor does it make any atomicity guarantees (outside of a single key). I think for most applications you'll find it easier and more reliable to use a database model if you need such operations. The overhead of using a database is not much compared to the stronger guarantees you want - I made an efficient MySQL-based cache backend which comes in at only ~50% of the speed of memcached, despite writing to disk ( https://adamj.eu/tech/2015/05/17/building-a-better-databasecache-for-django-on-mysql/ ). (Additionally your proposal to use a regex can *never* perform well in the general case, since some regexes have terrible performance characteristics.)

In terms of the third option, I think most people using redis with the caching framework happily use django-redis. It's well maintained and has 1.8k stars on GitHub. I don't see any compelling reason to merge it to core right now, plus that always slows down a project's progression. Channels uses redis through its channels-redis package, and it needs redis pub-sub operations, which are quite different to caching. This is a fundamentally different problem to caching, and it only happens that Redis supports both. Channels can use other layers backends - I'm not sure of existing alternatives, but they could use e.g. Postgres' NOTIFY, Google PubSub, AWS SNS, etc.

There are some memcached features which the cache API doesn't currently support, such as compare-and-set. I'd be interested in seeing those added to core.

Thanks,

Adam

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/8fc9b51b-a0d9-363e-1e5a-7dfd196327cf%40gmail.com.


--
Adam

Carlton Gibson

unread,
Aug 17, 2020, 5:36:20 AM8/17/20
to django-d...@googlegroups.com


On 17 Aug 2020, at 11:32, Adam Johnson <m...@adamj.eu> wrote:

Channels can use other layers backends - I'm not sure of existing alternatives, but they could use e.g. Postgres' NOTIFY, Google PubSub, AWS SNS, etc.

The only other active channel layer backend that I’m aware of is for RabbitMQ: 

Roger Gammans

unread,
Aug 17, 2020, 5:46:58 AM8/17/20
to django-d...@googlegroups.com
On Mon, 2020-08-17 at 10:30 +0200, Jure Erznožnik wrote:
  1. While at it, I have seen that various backends now implement their own pickling. With the exception of memcached, they all implement the same thing. Perhaps it would be beneficial to move pickling functionality to BaseCache?



I've noticed in my applications when I've used django cache in performance sensitive hot paths, often it shifts he slowest portion from the databases (or whatever I'm caching) to the unpickler. 

This makes me wonder if there is a argument to abstract the pickling, so that application can do something special in their uses case.

-- 
Roger Gammans <rgam...@gammascience.co.uk>
Gamma Science Ltd. (GB Nr. 07356014 )

Carlton Gibson

unread,
Aug 18, 2020, 9:33:56 AM8/18/20
to Django developers (Contributions to Django itself)
I think we SHOULD bring a redis cache backend into core.

The recent survey showed 70% of our users using redis for caching, with 20%+ using memcached.

Nick is working on
https://github.com/django/django/pull/13310 now.

This’ll give us three memcached backends and zero for redis.
That doesn’t seem right.

The cache backend API is small and redis is stable. I think we can offer that, freeing third party packages some work, to do the more interesting things.

C.

Tobias McNulty

unread,
Aug 18, 2020, 9:54:34 AM8/18/20
to django-developers
+1, even as someone who does not self-identify as a Redis enthusiast.

Tobias McNulty
Chief Executive Officer

tob...@caktusgroup.com
www.caktusgroup.com

Adam Johnson

unread,
Aug 18, 2020, 11:18:50 AM8/18/20
to django-d...@googlegroups.com
The recent survey showed 70% of our users using redis for caching, with 20%+ using memcached.

Oh I see you brought data to an opinion fight :)

Okay if those are even within 20% of the truth, it completely makes sense to add Redis support! +1

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.


--
Adam

René Fleschenberg

unread,
Aug 18, 2020, 6:11:54 PM8/18/20
to django-d...@googlegroups.com
Hi,

On 8/18/20 3:33 PM, Carlton Gibson wrote:
> I think we SHOULD bring a redis cache backend into core.
Would this also mean supporting Redis-specific APIs? I'm thinking of
listing keys in particular. Would Django then throw an exception on
backends that don't support this? Or would the built-in cache backend
just not expose those APIs?

--
René Fleschenberg

Carlton Gibson

unread,
Aug 19, 2020, 12:36:14 AM8/19/20
to Django developers (Contributions to Django itself)


On 19 Aug 2020, at 00:11, René Fleschenberg <re...@fleschenberg.net> wrote:

Or would the built-in cache backend
just not expose those APIs?

That one. 

All I’m proposing is a backend with the current API. 

Over time that API has grown slightly, so maybe there’d be a case for additional methods, but that would be a separate issue. 

C.

Jure Erznožnik

unread,
Aug 19, 2020, 2:25:20 AM8/19/20
to django-d...@googlegroups.com
May I ask where that survey was conducted? I totally missed it.

LP,
Jure

Carlton Gibson

unread,
Aug 19, 2020, 2:27:52 AM8/19/20
to django-d...@googlegroups.com
Here’s the blog post: 


It includes links to charts and raw results. 

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages