Caching some application data in memcache v/s in django process itself

189 views
Skip to first unread message

Subodh Nijsure

unread,
Aug 19, 2014, 4:27:46 PM8/19/14
to django...@googlegroups.com
Hello,

I have application that servers time-series data for 10+ units. This
data can get big and I don't want to retrieve 5-6MB worth of data
from cassandra and serve it out as JSON to the client.

I want to maintain cache of this time series data. I had horrible
thought of just storing this data in local python dictionary instead
of doing something exotic like memcach server.

What are the plus/minus of just storing this cache in process running
django? Of course this doesn't scale well for 1000+ units..

-Subodh

Russell Keith-Magee

unread,
Aug 19, 2014, 7:54:46 PM8/19/14
to Django Users
This approach will probably work, but it's a bad idea.

Firstly, if you store it in-process, the process is the web server. So - your web server will have a huge memory footprint. If your web server has multiple processes - which would be a common configuration for handling load - then the memory requirements will be duplicated for every process.

Secondly, web processes aren't long lived. Depending on your web server configuration, each web server process will serve a certain number of requests, and then restart itself. This means that every N requests, the process will need to restart, and reload the in-process memory cache. This will obviously take time, and that's time the end-user is going to experience as a delay in loading a page.

So - you'll be much better served using memcache (or similar) - this keeps the "data I need to keep readily available in memory" separate from "data needed to keep the web server running". The memcache image is shared across processes - even across machines if necessary - and only requires an initial warming (which can be done out of the request-response cycle, if necessary). 

Yours,
Russ Magee %-)

Cal Leeming [iops.io]

unread,
Aug 19, 2014, 8:08:11 PM8/19/14
to django...@googlegroups.com
On Wed, Aug 20, 2014 at 12:54 AM, Russell Keith-Magee <rus...@keith-magee.com> wrote:

On Wed, Aug 20, 2014 at 4:27 AM, Subodh Nijsure <subodh....@gmail.com> wrote:
Hello,

I have application that servers time-series data for 10+ units. This
data can get big  and I don't want to retrieve 5-6MB worth of data
from cassandra and serve it out as JSON to the client.

I want to maintain cache of this time series data. I had horrible
thought of just storing this data in local python dictionary instead
of doing something exotic like memcach server.

What are the plus/minus of just storing this cache in process running
django? Of course this doesn't scale well for 1000+ units..

This approach will probably work, but it's a bad idea.

Firstly, if you store it in-process, the process is the web server. So - your web server will have a huge memory footprint. If your web server has multiple processes - which would be a common configuration for handling load - then the memory requirements will be duplicated for every process.

Not 100% true.

uWSGI comes with a "shared memory" feature which lets you store a large chunk of data in memory which is shared by all workers;

There is a higher level abstraction of this which is aptly named "caching";

Although the benefits of storing in a separate key/value store db (such as redis/memcache) vs using uWSGI shared memory are debatable, and it really depends on your use case.

This is, of course, assuming you are using uWSGI as your WSGI application server.


Secondly, web processes aren't long lived. Depending on your web server configuration, each web server process will serve a certain number of requests, and then restart itself. This means that every N requests, the process will need to restart, and reload the in-process memory cache. This will obviously take time, and that's time the end-user is going to experience as a delay in loading a page.

This wouldn't be too much concern for the shared memory feature, however you may run into issues where your data requires a lot of processing on each query, mostly because the "shared memory" doesn't have any of the niceties of redis/memcache. Again, it's very use case specific
 

So - you'll be much better served using memcache (or similar) - this keeps the "data I need to keep readily available in memory" separate from "data needed to keep the web server running". The memcache image is shared across processes - even across machines if necessary - and only requires an initial warming (which can be done out of the request-response cycle, if necessary). 

Memcache would almost certainly be the easiest way to do this. Of course, you do have to consider what happens if your memcache server goes down, and make sure it's given the same love/attention as your application server in terms of monitoring, configuration, maintenance etc. You could also run into issues of eviction, e.g. if too much RAM is used it will start to remove cached entries, and one of those might be your data. And also additional latency depending on how many queries you're pushing (assuming 1ms per query).

Of course, if you're not using a "shared memory" feature like the one in uWSGI, then all the comments by Russ are completely valid.
 

Yours,
Russ Magee %-)

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAJxq848G8MMfCtR5rWFbj%2Bz2rH7-5KZBJRq5efkPz7EhZNe6mg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Cal Leeming [iops.io]

unread,
Aug 19, 2014, 8:16:53 PM8/19/14
to Cal Leeming [iops.io], django...@googlegroups.com
Sorry forgot to add one more thing.

For what it's worth, it does sound like a key/val store (such as memcache/redis) would be a better fit for this.

Shared memory areas are great for storing huge lookup tables in high performance deployments, and it's always good to know these things for future, never know when it might come in handy :)

Cal

Subodh Nijsure

unread,
Aug 20, 2014, 1:36:15 AM8/20/14
to django...@googlegroups.com, Cal Leeming [iops.io]
Thank you so much for all your suggestions. I am using uWSGI I will
certainly looking into that and then redis for caching.


-Subodh
> https://groups.google.com/d/msgid/django-users/CAHKQagHa90LR8adt9fNL%2Bigcmm4yvuOQEKsYzDL9qGPFmwfMZw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages