caching - memcached

17 views
Skip to first unread message

Laurent Savaete

unread,
May 14, 2012, 9:00:05 PM5/14/12
to ductus-d...@googlegroups.com
Following up on the cached template loaders, I had a look into caching
further data.

Using django's cache system, we can cache on several levels:
- site level
- view level
- parts of templates (between 2 {% cache %} tags)
- random bits of code/data
They all ask for a cache timeout (in seconds), but the last option
also provides a function to invalidate a specific cache key.

From profiling data, the heaviest operations are things like:
- get_resource_object()
- rendering a flashcard deck template

We know that their result won't change until someone edits the wiki
(in fact, urns won't ever change, but the user friendly result will),
so I'm keen to believe that we'd get good results by caching those
using something like the resource urn as a key (plus UI language for
good measure), and a very long cache timeout (say a month), along with
a cache invalidation when an edit is made.
Caching entire views may be possible, but we would end up caching
views per user/language, which would probably end up filling the
cache, and lead to lots of cache misses. I don't have any data to
support this, though.

I tried a simple setup of memcached as the default cache backend for
django, and added a few cache.get()/cache.set() around the code
mentioned above. The amount of work involved is minimal, and results
are quite impressive: on my laptop, rendering the fcd drops from
6-700ms to ~1ms, numbers are very similar for get_resource_object().

Also, about the server setup:
A few figures, from opening a 330 rows long FSI podcast view on devbox
(using strace), with cachedLoader turned on:
- the page loads in about 4.3s (onLoad in browser, including roundtrip
Sweden-US) fairly constantly (DNS+connection is ~1.2s of the total).
Server-side profiling shows about 1.5-2s.
- about 60k system calls are generated, of which 57k seem filesystem related
- about 29k of them are filesystem calls related to staticfiles cache,
so I anticipate (most of) those would disappear by using memcached as
opposed to file based caching.

I'll give all this a try on devbox tomorrow, to get an impression of
the actual improvements we can get.

Regarding using a reverse proxy in front of the server (varnish or
so...), I wonder how much of an improvement we would get, considering
those would only cache the entire result of an HTTP request. We'd have
to vary according to ui language and user at least. It would probably
be worth it for non logged in users, as the page content only depends
on ui language. And this is likely to be the bulk of the traffic.

Jim Garrison

unread,
May 16, 2012, 3:42:33 AM5/16/12
to ductus-d...@googlegroups.com
On 05/14/12 18:00, Laurent Savaete wrote:
> Following up on the cached template loaders, I had a look into caching
> further data.
>
> Using django's cache system, we can cache on several levels:
> - site level
> - view level
> - parts of templates (between 2 {% cache %} tags)
> - random bits of code/data

Another place we can cache is directly in the subviews framework. Since
each subview works directly on a urn, its contents should be constant
unless ductus is upgraded. (We should keep in mind that any subviews
that return rendered html depend on the language context, as well.)

Since some subviews will need to query the cache several times instead
of a few, we should figure out a way to use memcache's "multiget" here.
I've thought of a way to do this before, but haven't implemented it yet.

> They all ask for a cache timeout (in seconds), but the last option
> also provides a function to invalidate a specific cache key.

Often a conceptually cleaner alternative to invalidation is to simply
use a key that contains all information that might be relevant, and to
allow outdated things to simply expire automatically. This will also
remove state from the cache system, and things won't go haywire if some
invalidation fails. (Plus, we will never need to code a loop to
invalidate several things based on one change.)

> From profiling data, the heaviest operations are things like:
> - get_resource_object()
> - rendering a flashcard deck template
>
> We know that their result won't change until someone edits the wiki
> (in fact, urns won't ever change, but the user friendly result will),

The user-friendly result for a URN is mostly invariant.

> so I'm keen to believe that we'd get good results by caching those
> using something like the resource urn as a key (plus UI language for
> good measure)

See the varied_etag and unvaried_etag in ductus/wiki/views.py :). If we
simply add a layer of varnish over the entire site, these things will go
into effect immediately.

And we can use the same guidelines to cache pieces of the site in
memcache as well.

>, and a very long cache timeout (say a month), along with
> a cache invalidation when an edit is made.
> Caching entire views may be possible, but we would end up caching
> views per user/language, which would probably end up filling the
> cache, and lead to lots of cache misses. I don't have any data to
> support this, though.

Yes, this is a good reason not to rely on varnish entirely. But it
would be a good first line of defense against thousands of people
getting linked to a single lesson on our site, for instance.

>
> I tried a simple setup of memcached as the default cache backend for
> django, and added a few cache.get()/cache.set() around the code
> mentioned above. The amount of work involved is minimal, and results
> are quite impressive: on my laptop, rendering the fcd drops from
> 6-700ms to ~1ms, numbers are very similar for get_resource_object().

I'm curious: how did you cache the results of get_resource_object()? It
returns an object, not a string...

>
> Also, about the server setup:
> A few figures, from opening a 330 rows long FSI podcast view on devbox
> (using strace), with cachedLoader turned on:
> - the page loads in about 4.3s (onLoad in browser, including roundtrip
> Sweden-US) fairly constantly (DNS+connection is ~1.2s of the total).
> Server-side profiling shows about 1.5-2s.
> - about 60k system calls are generated, of which 57k seem filesystem related
> - about 29k of them are filesystem calls related to staticfiles cache,
> so I anticipate (most of) those would disappear by using memcached as
> opposed to file based caching.

hopefully it doesn't just end up sending thousands of requests to
memcache on each request, either...

> I'll give all this a try on devbox tomorrow, to get an impression of
> the actual improvements we can get.
>
> Regarding using a reverse proxy in front of the server (varnish or
> so...), I wonder how much of an improvement we would get, considering
> those would only cache the entire result of an HTTP request. We'd have
> to vary according to ui language and user at least. It would probably
> be worth it for non logged in users, as the page content only depends
> on ui language. And this is likely to be the bulk of the traffic.

Exactly, especially if we are in the middle of a traffic spike with lots
of anonymous views.

Laurent Savaete

unread,
May 16, 2012, 5:12:10 AM5/16/12
to ductus-d...@googlegroups.com
> Another place we can cache is directly in the subviews framework.  Since
> each subview works directly on a urn, its contents should be constant
> unless ductus is upgraded.  (We should keep in mind that any subviews
> that return rendered html depend on the language context, as well.)

yes, that was what I was getting at further down (and that's what I
tested), so it looks like we agree on the approach :)

> Since some subviews will need to query the cache several times instead
> of a few, we should figure out a way to use memcache's "multiget" here.
>  I've thought of a way to do this before, but haven't implemented it yet.

The way I tried it was to brutally cache the entire flashcard deck
subview, since that's the expensive call. Once cached, we don't
actually need to pull the components of that view anymore, so caching
sublevels is probably useless (at least until we heavily reuse bits
from lesson to lesson, which won't happen until indexing with full
text search works, and even then...).
It works well with a single top-level cache call, so i'm not sure I
understand where you'd want the multiget. Can you detail a bit?

>> They all ask for a cache timeout (in seconds), but the last option
>> also provides a function to invalidate a specific cache key.
>
> Often a conceptually cleaner alternative to invalidation is to simply
> use a key that contains all information that might be relevant, and to
> allow outdated things to simply expire automatically.  This will also
> remove state from the cache system, and things won't go haywire if some
> invalidation fails.  (Plus, we will never need to code a loop to
> invalidate several things based on one change.)

I see your point. My suggestion was based on the assumption that we'd
only cache one toplevel item per view, so no elaborate multi
invalidation involved. Then I thought that managing the cache
ourselves (with extremely long expiry times) meant we'd favour
"useful" stuff in the cache as opposed to whatever hasn't expired yet
(but maybe that's the same stuff in the end :)

>> so I'm keen to believe that we'd get good results by caching those
>> using something like the resource urn as a key (plus UI language for
>> good measure)
>
> See the varied_etag and unvaried_etag in ductus/wiki/views.py :).  If we
> simply add a layer of varnish over the entire site, these things will go
> into effect immediately.

Cookie contains a session key (along with a piwik id and other unique
stuff like that). Will that bust varnish?


>> I tried a simple setup of memcached as the default cache backend for
>> django, and added a few cache.get()/cache.set() around the code
>> mentioned above. The amount of work involved is minimal, and results
>> are quite impressive: on my laptop, rendering the fcd drops from
>> 6-700ms to ~1ms, numbers are very similar for get_resource_object().
>
> I'm curious: how did you cache the results of get_resource_object()?  It
> returns an object, not a string...

From https://docs.djangoproject.com/en/dev/topics/cache/#the-low-level-cache-api,
"You can cache any Python object that can be pickled safely: strings,
dictionaries, lists of model objects, and so forth. (Most common
Python objects can be pickled"
so I just threw the resource object in there, and it seems to work well!

> hopefully it doesn't just end up sending thousands of requests to
> memcache on each request, either...

good point, we'll have to check that.

>> I'll give all this a try on devbox tomorrow, to get an impression of
>> the actual improvements we can get.

So here are some more findings, from profiling the code on devbox,
with an actual podcast view (the same one 330 rows) and caching in
memcached.

With an empty cache, view_wikipage runs in 2.025s, of which 1.4s is
spent in get_resource_object() calls (680 calls, which didn't make
much sense to me at first).
With a primed cache (for both fcd.get_resource_object() and fcd
subview rendering), view_wikipage is now running in 0.76s.
BUT get_resource_object() is still called 332 times, ... from
_get_audio_urns_in_column() (in podcast()). That takes .69s.
In other words, we are running through the entire list of rows twice
to render a podcast static view :(

So we could of course cache the results of both calls (that's the
quick & easy solution), but it seems to me that a smarter way to deal
with it would be to build that list of rows only once (and cache that
single result). When we get a cache miss, we'd only use half the work.
A lot of the time is spent waiting for IO from disk, when I load
tested that page (using siege), the server collapsed after a few page
views, I'm guess because of concurrent access to the disk for all
these audio urns)

Without this double data access (or caching), I'm anticipating that
view_wikipage would return in about 80ms with a primed cache. With a
fresh cache, the view would probably take about 1.4s to build.

And then we still have the question of building the podcast...
(note: all these figures must be looked at relatively, they're only
valid for devbox as it is configured now)

Laurent Savaete

unread,
May 22, 2012, 1:45:04 PM5/22/12
to ductus-d...@googlegroups.com
a thought that's been on my mind for a few days, and only just became clear.

the resource_json and rendered html (restricted to the part that
doesn't depend on user) for a urn will never change (except if we
update the code, which django can handle with cache versions). So why
calculate them more than once?

we could have 2 levels of caching:
- one on disk that is related to wiki changes, created/invalidated
when a user edits content. The cached file is created upon edit (as
the user is redirected to the result of their edit) and just stays
there. So we render a urn once (at least for the big bits like
podcast, choice lessons, textwiki). End of story. (this cache can use
the code rev hash or something similar to the DNS serial number in its
key to play nicely with revisions) Disk is cheap, we can potentially
cache everything, even old revs as this doesn't need backup.

- one in memcached: which serves content way faster than disk and we
let it expire after 30 min or an hour. That's the one that will save
us during traffic peaks. But the server doesn't collapse under
hundreds of disk accesses for a single page view on a cache miss,
because we just fetch one file on disk. And then maybe this one can be
cached at the view level...

Jim Garrison

unread,
Jun 5, 2012, 10:52:08 PM6/5/12
to ductus-d...@googlegroups.com
On 05/16/12 02:12, Laurent Savaete wrote:
>> Since some subviews will need to query the cache several times instead
>> of a few, we should figure out a way to use memcache's "multiget" here.
>> I've thought of a way to do this before, but haven't implemented it yet.
>
> The way I tried it was to brutally cache the entire flashcard deck
> subview, since that's the expensive call. Once cached, we don't
> actually need to pull the components of that view anymore, so caching
> sublevels is probably useless (at least until we heavily reuse bits
> from lesson to lesson, which won't happen until indexing with full
> text search works, and even then...).
> It works well with a single top-level cache call, so i'm not sure I
> understand where you'd want the multiget. Can you detail a bit?

There are some recursive subviews that go up through parents
indefinitely (e.g. to determine all the contributors), so once we have a
nontrivial number of revisions for some of our more popular lessons,
multiget might save us significantly. Or we might want to pre-compute
and cache these things directly on disk.

>>> so I'm keen to believe that we'd get good results by caching those
>>> using something like the resource urn as a key (plus UI language for
>>> good measure)
>>
>> See the varied_etag and unvaried_etag in ductus/wiki/views.py :). If we
>> simply add a layer of varnish over the entire site, these things will go
>> into effect immediately.
>
> Cookie contains a session key (along with a piwik id and other unique
> stuff like that). Will that bust varnish?

Piwik isn't a concern since we can always store the cookie in the
different domain (as we are indeed doing now). Users only get a session
key when they login; if we wanted to we could delete it when they log
out (and maybe we ought to be doing this anyway). But the real
annoyance is the csrftoken cookie, which people receive immediately upon
coming to the site and which must in turn be rendered by each page. And
that makes Varnish fairly useless I think :(

>>> I tried a simple setup of memcached as the default cache backend for
>>> django, and added a few cache.get()/cache.set() around the code
>>> mentioned above. The amount of work involved is minimal, and results
>>> are quite impressive: on my laptop, rendering the fcd drops from
>>> 6-700ms to ~1ms, numbers are very similar for get_resource_object().
>>
>> I'm curious: how did you cache the results of get_resource_object()? It
>> returns an object, not a string...
>
> From https://docs.djangoproject.com/en/dev/topics/cache/#the-low-level-cache-api,
> "You can cache any Python object that can be pickled safely: strings,
> dictionaries, lists of model objects, and so forth. (Most common
> Python objects can be pickled"
> so I just threw the resource object in there, and it seems to work well!

Aha, I figured you must have been pickling it.

In general it is suboptimal to cache something that has just been read
from disk, as it adds no value (let me explain). Here, it is improving
things because it either (i) reduces the number of disk seeks or (ii)
reduces parsing time (and probably a bit of both). The first point is
not sustainable as it requires the amount of RAM dedicated to memcached
that grows linearly with the corpus of content, and that concerns me.
Surely our RAM can be used for more intelligent things.

See "avoid useless marshall dump" in the following presentation; I think
they are getting at a similar thing.

> 42. 3) Memcache• Used more raw memcache objects • Avoid useless
> marshal dump • Yajl + Snappy + raw memcache = Win Combo• Removed huge
> get_multi (100+ items) • It can be slower than the sql query
> equivalent!• Tuned memcached options
>
> (from http://www.slideshare.net/kwi/rails-performance-at-justintv-guillaume-luccisano)

Regarding the second point, it is greatly desirable to speed up resource
loading from the processor's perspective (i.e. assuming no i/o wait,
which I will address below). With this, we can get a speedup even if
there is a memcached cache miss. It might be difficult to compete with
the raw execution speed of cPickle, but lxml shouldn't be too far behind
it, and the eventual migration to pypy should speed up the construction
of the DuctModel objects.

We should find ways to prevent get_resource_object() from doing work
unnecessarily, redundantly, or with a poor algorithm. If we can speed
these things up, the benefits will pay off all over the place.

I recently made a commit that speeds up resource loading a bit, but I
haven't profiled it to see whether it actually makes a significant
difference.

>>> I'll give all this a try on devbox tomorrow, to get an impression of
>>> the actual improvements we can get.
>
> So here are some more findings, from profiling the code on devbox,
> with an actual podcast view (the same one 330 rows) and caching in
> memcached.
>
> With an empty cache, view_wikipage runs in 2.025s, of which 1.4s is
> spent in get_resource_object() calls (680 calls, which didn't make
> much sense to me at first).
> With a primed cache (for both fcd.get_resource_object() and fcd
> subview rendering), view_wikipage is now running in 0.76s.
> BUT get_resource_object() is still called 332 times, ... from
> _get_audio_urns_in_column() (in podcast()). That takes .69s.
> In other words, we are running through the entire list of rows twice
> to render a podcast static view :(
>
> So we could of course cache the results of both calls (that's the
> quick & easy solution), but it seems to me that a smarter way to deal
> with it would be to build that list of rows only once (and cache that
> single result). When we get a cache miss, we'd only use half the work.
> A lot of the time is spent waiting for IO from disk, when I load
> tested that page (using siege), the server collapsed after a few page
> views, I'm guess because of concurrent access to the disk for all
> these audio urns)

I've spent quite a bit of time (and have been a bit troubled) thinking
about disk seek issues. Assuming we don't want to always store the
entire XML portion of the resourcedatabase in RAM, the only way we can
fix this problem is to minimize the number of disk seeks per request.
See e.g.:

http://developer.gnome.org/optimization-guide/stable/id397971.html.en
https://gist.github.com/2841832

From here forward, we should find a way no never require more than O(log
N) disk seeks to render a page.

How can we do that? Currently a FlashcardDeck is stored in a file,
which contains links to each row, and each row contains links to each of
its elements. This means that rendering a single flashcard deck
requires loading N files, where N is the number of non-empty cells.
Already we have lessons where this is over 1000. And all these disk
seeks are currently occurring in sequence.

The benefits to storing things this way are largely imagined. Indeed,
there is even storage overhead with linking to a bunch of small XML
files within a FlashcardDeck. All in all, the design of this was a bit
short-sighted on my part.

We should modify the format of FlashcardDeck so that everything
necessary to render the flashcard-deck is stored directly within the
toplevel FlashcardDeck XML file. This means all phrases, as well and
blob link and mime type for both image and audio objects.

One question to answer going forward: Do we care about row-level
revision history? Currently it is being stored. It might be useful to
have in the future in case somebody copies rows from one lesson to
another. Then we can give credit where it is due, easily. The downside
is, having row-level revision history in this new model may require a
little bit of thought, but I think I have thought of a way to make it work.

So going forward, the plan is:

* we enable memcache caching within get_resource_object() as a stop-gap
measure (by the way, I don't think I ever saw your commit for this)
* I work on re-doing the data model of FlashcardDeck so that only one
file must be read from the filesystem to render it
* once that is done, we disable memcache caching in
get_resource_object() and further optimize things if we find that it is
necessary.

Cheers,
Jim

Laurent Savaete

unread,
Jun 7, 2012, 10:23:41 AM6/7/12
to ductus-d...@googlegroups.com
> There are some recursive subviews that go up through parents
> indefinitely (e.g. to determine all the contributors), so once we have a
> nontrivial number of revisions for some of our more popular lessons,
> multiget might save us significantly.  Or we might want to pre-compute
> and cache these things directly on disk.

I forgot those views...

>>> I'm curious: how did you cache the results of get_resource_object()?  It
>>> returns an object, not a string...

> Aha, I figured you must have been pickling it.

I didn't even pickle them :)

> In general it is suboptimal to cache something that has just been read
> from disk, as it adds no value (let me explain).  Here, it is improving
> things because it either (i) reduces the number of disk seeks or (ii)
> reduces parsing time (and probably a bit of both).  The first point is
> not sustainable as it requires the amount of RAM dedicated to memcached
> that grows linearly with the corpus of content, and that concerns me.
> Surely our RAM can be used for more intelligent things.

totally agree with that.

> We should find ways to prevent get_resource_object() from doing work
> unnecessarily, redundantly, or with a poor algorithm.  If we can speed
> these things up, the benefits will pay off all over the place.

I guess that's the core of the problem: unoptimised code as it is now
loads some resources from disk up to 3 times for one page view.

> We should modify the format of FlashcardDeck so that everything
> necessary to render the flashcard-deck is stored directly within the
> toplevel FlashcardDeck XML file.  This means all phrases, as well and
> blob link and mime type for both image and audio objects

This might be silly but couldn't we imagine storing data in mongoDB
rather than in tons of small files?

> One question to answer going forward: Do we care about row-level
> revision history?  Currently it is being stored.  It might be useful to
> have in the future in case somebody copies rows from one lesson to
> another.  Then we can give credit where it is due, easily.  The downside
> is, having row-level revision history in this new model may require a
> little bit of thought, but I think I have thought of a way to make it work.

I'm not quite sure if we need row level revision history, but I'm
convinced we need a way to easily reuse a row from one fcd to another.
Since we'll most likely do this in the UI, I suppose how the backend
stores data doesn't make much difference.

> So going forward, the plan is:
>
>  * we enable memcache caching within get_resource_object() as a stop-gap
> measure (by the way, I don't think I ever saw your commit for this)

I think it's in branch profiling_podcast_view on my gitorious. I
cached stuff not inside get_resource_object, but at the fcd level, to
avoid caching lots of small files (see your comment above). The code
is pretty trivial, feel free to relocate the caching call if you think
that will do better.

>  * I work on re-doing the data model of FlashcardDeck so that only one
> file must be read from the filesystem to render it
>  * once that is done, we disable memcache caching in
> get_resource_object() and further optimize things if we find that it is
> necessary.

sounds good.

Jim Garrison

unread,
Jun 11, 2012, 11:42:49 PM6/11/12
to ductus-d...@googlegroups.com
On 06/07/12 07:23, Laurent Savaete wrote:
> This might be silly but couldn't we imagine storing data in mongoDB
> rather than in tons of small files?

We have a gridfs storage backend already, which I considered using.
Indeed, <http://www.mongodb.org/display/DOCS/When+to+use+GridFS>
mentions that gridfs handles a large number of static files quite well.

However, we would still need to do a gridfs call (which will result in a
seek if the data is not cached in RAM) for each cell of a flashcard
deck, which is still not sustainable as flashcard decks grow.

So we may still use gridfs one day, but it's only going to really help
us be able to do horizontal scaling of load. It won't help the fact
that the number of seeks currently required to render a lesson scales
linearly with the length of the lesson.
Reply all
Reply to author
Forward
0 new messages