Hello all,
I have an application that calculates and tells you whether a specific crop at a specific piece of land needs to be irrigated, and how much. The calculation lasts for a few seconds, so I'm doing it offline with Celery. Every two hours new meteorological data comes in and all the pieces of land are recalculated.
The question is where to store the results of the calculation. I thought that since they are re-creatable, the cache would be the appropriate place. However, there is a difference with the more common use of the cache: they are re-creatable, but they are also necessary. You can't just go and delete any item in the cache. This will cripple the website, which expects to find the calculation results in the cache. Viewing something on the site will never trigger a recalculation (and if I make it trigger, it will be a safety procedure for edge cases and not the normal way of doing things). The results must also survive reboots, so I chose the file-based cache.
I didn't know about culling, so when the pieces of land grew to 100, and the items in the cache to 400 (4 items need to be stored for each piece of land), I spent a few hours trying to find out what the heck is going on. I solved the problem by tweaking the culling parameters. However all this has raised a few issues:
"the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious. If you need a cache capable of holding 100000 items, I strongly recommend you look at memcache. If you insist on using the filesystem as a cache, it isn't hard to subclass and extend the existing cache."
If these comments are correct, then the documentation needs some fixing, because not only does in not say that the filesystem cache is not for serious use, but it implies the opposite:
"Without a really compelling reason, ... you should stick to the cache backends included with Django. They’ve been well-tested and are easy to use."
Is Russell not entirely correct perhaps, or is the documentation? Or am I missing something?
-- Antonis Christofides http://djangodeployment.com
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/a5a8d1ab-f4e0-a6b5-b1da-acc9dc2dbf9d%40djangodeployment.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAAAP30ER%2B1veikd6yHd%2BbytDStW5%3DjpAp92xNzW8B0wtL-Bw5g%40mail.gmail.com.
On Saturday 27 May 2017 12:25:17 Antonis Christofides wrote:
> The question is where to store the results of the calculation. I
> thought that since they are re-creatable, the cache would be the
> appropriate place. However, there is a difference with the more
> common use of the cache: they are re-creatable, but they are also
> necessary. You can't just go and delete any item in the cache. This
> will cripple the website, which expects to find the calculation
> results in the cache.
What you're describing is not a cache, but a key/value store. A cache knows how to obtain content that is not in there.
Since you're offloading the calculation, you should not delete the old contents till the new one can be written. Swapping contents should be atomic, block reads and fast.
Another strategy is two have 2 for 1 entry. When updating one, lock it. Async readers go for the 2nd. Unlock, lock+write second and you're done.
This can be done with one store and 2 keys or - in a larger environment - with replicated stores, where things pretty much work automagically.
--
Melvyn Sopacua
Hello,
thanks to everyone who replied. Here are some conclusions of mine:
Today's filebased-cache code seems to be suffering from the same problems it was suffering 7 years ago. Every time you .set() the cache it asks the OS to provide a list of files, just for counting them (for the purpose of culling). This is slow. The culling strategy is to delete a random sample of cache entries. So Russell's comment seems valid today, at least with respect to culling. Of Django's included cache backends, apparently only memcached is suitable for a large cache in production. Redis could be a good idea for adding persistence, but it is non-standard (not included with Django).
Redis is anyway not appropriate for my use case because I don't need the speed, so storing the information in RAM, which has a larger cost than the filesystem, is suboptimal.
The fact that a cache knows how to get the information if it
doesn't have it is an interesting observation that I hadn't
thought about, but appears to be true for most uses of "cache"
that I can think of (it doesn't apply to write caches). Therefore
I'm using the cache for a different purpose than the one for which
it was designed, which can create all sorts of problems (such as a
new administrator—or even an old one—not knowing or forgetting
they can't just delete the cache). However I will take my risks
and continue using it for a while, as for these two small projects
implementing a more complicated solution, or adding another
component and thus raising the bar for other people to replace me,
isn't worth it.
Antonis Christofides http://djangodeployment.com
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.