On 05/16/12 02:12, Laurent Savaete wrote:
>> Since some subviews will need to query the cache several times instead
>> of a few, we should figure out a way to use memcache's "multiget" here.
>> I've thought of a way to do this before, but haven't implemented it yet.
>
> The way I tried it was to brutally cache the entire flashcard deck
> subview, since that's the expensive call. Once cached, we don't
> actually need to pull the components of that view anymore, so caching
> sublevels is probably useless (at least until we heavily reuse bits
> from lesson to lesson, which won't happen until indexing with full
> text search works, and even then...).
> It works well with a single top-level cache call, so i'm not sure I
> understand where you'd want the multiget. Can you detail a bit?
There are some recursive subviews that go up through parents
indefinitely (e.g. to determine all the contributors), so once we have a
nontrivial number of revisions for some of our more popular lessons,
multiget might save us significantly. Or we might want to pre-compute
and cache these things directly on disk.
>>> so I'm keen to believe that we'd get good results by caching those
>>> using something like the resource urn as a key (plus UI language for
>>> good measure)
>>
>> See the varied_etag and unvaried_etag in ductus/wiki/views.py :). If we
>> simply add a layer of varnish over the entire site, these things will go
>> into effect immediately.
>
> Cookie contains a session key (along with a piwik id and other unique
> stuff like that). Will that bust varnish?
Piwik isn't a concern since we can always store the cookie in the
different domain (as we are indeed doing now). Users only get a session
key when they login; if we wanted to we could delete it when they log
out (and maybe we ought to be doing this anyway). But the real
annoyance is the csrftoken cookie, which people receive immediately upon
coming to the site and which must in turn be rendered by each page. And
that makes Varnish fairly useless I think :(
>>> I tried a simple setup of memcached as the default cache backend for
>>> django, and added a few cache.get()/cache.set() around the code
>>> mentioned above. The amount of work involved is minimal, and results
>>> are quite impressive: on my laptop, rendering the fcd drops from
>>> 6-700ms to ~1ms, numbers are very similar for get_resource_object().
>>
>> I'm curious: how did you cache the results of get_resource_object()? It
>> returns an object, not a string...
>
> From
https://docs.djangoproject.com/en/dev/topics/cache/#the-low-level-cache-api,
> "You can cache any Python object that can be pickled safely: strings,
> dictionaries, lists of model objects, and so forth. (Most common
> Python objects can be pickled"
> so I just threw the resource object in there, and it seems to work well!
Aha, I figured you must have been pickling it.
In general it is suboptimal to cache something that has just been read
from disk, as it adds no value (let me explain). Here, it is improving
things because it either (i) reduces the number of disk seeks or (ii)
reduces parsing time (and probably a bit of both). The first point is
not sustainable as it requires the amount of RAM dedicated to memcached
that grows linearly with the corpus of content, and that concerns me.
Surely our RAM can be used for more intelligent things.
See "avoid useless marshall dump" in the following presentation; I think
they are getting at a similar thing.
> 42. 3) Memcache• Used more raw memcache objects • Avoid useless
> marshal dump • Yajl + Snappy + raw memcache = Win Combo• Removed huge
> get_multi (100+ items) • It can be slower than the sql query
> equivalent!• Tuned memcached options
>
> (from
http://www.slideshare.net/kwi/rails-performance-at-justintv-guillaume-luccisano)
Regarding the second point, it is greatly desirable to speed up resource
loading from the processor's perspective (i.e. assuming no i/o wait,
which I will address below). With this, we can get a speedup even if
there is a memcached cache miss. It might be difficult to compete with
the raw execution speed of cPickle, but lxml shouldn't be too far behind
it, and the eventual migration to pypy should speed up the construction
of the DuctModel objects.
We should find ways to prevent get_resource_object() from doing work
unnecessarily, redundantly, or with a poor algorithm. If we can speed
these things up, the benefits will pay off all over the place.
I recently made a commit that speeds up resource loading a bit, but I
haven't profiled it to see whether it actually makes a significant
difference.
>>> I'll give all this a try on devbox tomorrow, to get an impression of
>>> the actual improvements we can get.
>
> So here are some more findings, from profiling the code on devbox,
> with an actual podcast view (the same one 330 rows) and caching in
> memcached.
>
> With an empty cache, view_wikipage runs in 2.025s, of which 1.4s is
> spent in get_resource_object() calls (680 calls, which didn't make
> much sense to me at first).
> With a primed cache (for both fcd.get_resource_object() and fcd
> subview rendering), view_wikipage is now running in 0.76s.
> BUT get_resource_object() is still called 332 times, ... from
> _get_audio_urns_in_column() (in podcast()). That takes .69s.
> In other words, we are running through the entire list of rows twice
> to render a podcast static view :(
>
> So we could of course cache the results of both calls (that's the
> quick & easy solution), but it seems to me that a smarter way to deal
> with it would be to build that list of rows only once (and cache that
> single result). When we get a cache miss, we'd only use half the work.
> A lot of the time is spent waiting for IO from disk, when I load
> tested that page (using siege), the server collapsed after a few page
> views, I'm guess because of concurrent access to the disk for all
> these audio urns)
I've spent quite a bit of time (and have been a bit troubled) thinking
about disk seek issues. Assuming we don't want to always store the
entire XML portion of the resourcedatabase in RAM, the only way we can
fix this problem is to minimize the number of disk seeks per request.
See e.g.:
http://developer.gnome.org/optimization-guide/stable/id397971.html.en
https://gist.github.com/2841832
From here forward, we should find a way no never require more than O(log
N) disk seeks to render a page.
How can we do that? Currently a FlashcardDeck is stored in a file,
which contains links to each row, and each row contains links to each of
its elements. This means that rendering a single flashcard deck
requires loading N files, where N is the number of non-empty cells.
Already we have lessons where this is over 1000. And all these disk
seeks are currently occurring in sequence.
The benefits to storing things this way are largely imagined. Indeed,
there is even storage overhead with linking to a bunch of small XML
files within a FlashcardDeck. All in all, the design of this was a bit
short-sighted on my part.
We should modify the format of FlashcardDeck so that everything
necessary to render the flashcard-deck is stored directly within the
toplevel FlashcardDeck XML file. This means all phrases, as well and
blob link and mime type for both image and audio objects.
One question to answer going forward: Do we care about row-level
revision history? Currently it is being stored. It might be useful to
have in the future in case somebody copies rows from one lesson to
another. Then we can give credit where it is due, easily. The downside
is, having row-level revision history in this new model may require a
little bit of thought, but I think I have thought of a way to make it work.
So going forward, the plan is:
* we enable memcache caching within get_resource_object() as a stop-gap
measure (by the way, I don't think I ever saw your commit for this)
* I work on re-doing the data model of FlashcardDeck so that only one
file must be read from the filesystem to render it
* once that is done, we disable memcache caching in
get_resource_object() and further optimize things if we find that it is
necessary.
Cheers,
Jim