Caching

0 views
Skip to first unread message

Luca de Alfaro

unread,
Aug 3, 2009, 10:23:05 PM8/3/09
to wikitru...@googlegroups.com
In particular to Ian:

what should we do about caching of the colored revisions HTML?
This is what we would like, ideally (correct me if I am wrong):
  • We want to cache the trust information, when correctly computed, for efficiency's sake.
  • When someone votes on a revision, we need to invalidate the cache for that revision, otherwise, the new trust coloring is not displayed.
  • When the coloring of a revision is NOT correctly computed, i.e., when the system shows the "The trust information for this revision has not been computed yet" message, we should avoid caching that revision.
So ideally, if Mediawiki gave us a way to clear the cache for a particular revision, this is what we should be doing:
  • When someone votes on a revision, clear the cache for that revision (via AJAX?  Via what? No idea.  Suggestions?).
  • When the trust information is not available, and the message "The ... not available yet" is displayed, we tell Mediawiki not to cache that page.
  • Otherwise, we leave caching enabled.
Can Ian take a look at whether this is doable?
Also, the problem especially at WMF will be what happens if there are squid servers between the users and Mediawiki.  Can we make sure that the right cache headers are generated?

To alleviate the problems, I will be lifting the limit of 100 revisions every time eval_online_wiki is called.  Instead, I will just have online_eval_wiki bring the whole wiki up to date.

Let me know if you have comments, and let me know if you have better ideas, above all.

Luca

Ian Pye

unread,
Aug 4, 2009, 2:05:15 PM8/4/09
to wikitru...@googlegroups.com
For this type of caching, I think we should be using memcached. This
is something which is built into MW, and I believe is all set up
already to play well with Squid. We can be caching all of the colored
text which is generated, giving a unique key to each colored page (on
a per-page basis, not per-revision). Whenever someone votes, in
addition to recording the vote, its easy enough to tell memcached to
invalidate the given key.

I've got some code somewhere which does this already. I'll dig it out
and see if I can get it going.

Ian

Bo Adler

unread,
Aug 4, 2009, 2:13:15 PM8/4/09
to wikitru...@googlegroups.com
Just to clarify: Luca's msg is in reference to yesterday's thread
about how MW is _already_ caching our colored results. What's
necessary here is code to _disable_ the caching (both MW and Squid
based) when we return an error result.

Also, doesn't MW support other caching schemes besides memcache?
There must be some generic interface to the caching from within MW.
(I hope.)

-Bo

Luca de Alfaro

unread,
Aug 4, 2009, 2:15:25 PM8/4/09
to wikitru...@googlegroups.com
I don't know very much, so perhaps what I am saying is all wrong.
But, mediawiki has its own caching, correct? So even if we use memcached, we would have to make sure that colored pages are not cached, correct? (Just so that I understand).

What you propose sounds good,  but I see now the message from Bo.

Perhaps we should aim first at a simple, correct approach, then talk to the Wikimedia folks to know what they want for their setting?

Luca


On Tue, Aug 4, 2009 at 11:05 AM, Ian Pye <ian...@gmail.com> wrote:

Ian Pye

unread,
Aug 4, 2009, 2:23:37 PM8/4/09
to wikitru...@googlegroups.com
Ahhh:

Yes -- there's a generic cache interface, inlucdes/HTMLFileCache.php.
The global wgOut also manages the client cache headers. I think its a
matter of using wgOut to update the last modified headers correctly,
and also making sure we play nicely with the FileCache object.

Bo Adler

unread,
Aug 4, 2009, 2:23:41 PM8/4/09
to wikitru...@googlegroups.com
FYI - the code you are referring to should be in the Remote class in
the current codebase.

-Bo

On Tue, Aug 4, 2009 at 11:05 AM, Ian Pye<ian...@gmail.com> wrote:
>

Luca de Alfaro

unread,
Aug 4, 2009, 2:28:42 PM8/4/09
to wikitru...@googlegroups.com
Why the remote class?
The caching issue is present also in the local mode.  Or do you mean that we solved the issue for remote mode, so the code can be copied from there?

Luca

Ian Pye

unread,
Aug 4, 2009, 2:31:16 PM8/4/09
to wikitru...@googlegroups.com
As an experiment, I added some caching code to the remote class. Note:
we are talking about 2 types of cache issues:

1) Making sure that our code is not incorrectly cached by some other part of MW

2) Making rendered colored pages load faster by explicitly caching
them ourselves.

#2 is what the code in the remote branch does. #1 is the bug we need
to fix right now for local.

I

Bo Adler

unread,
Aug 4, 2009, 2:31:04 PM8/4/09
to wikitru...@googlegroups.com
Oh, sorry -- what I meant was that the previous work on caching (used
with the Firefox extension) is currently residing in the Remote class.
I didn't get to refactor that class yet, but that needs to be done at
some point.

-Bo

Bo Adler

unread,
Aug 4, 2009, 2:33:39 PM8/4/09
to wikitru...@googlegroups.com
On #2, I think we should look at that more carefully, now that we have
evidence suggesting that MW is already caching our results for us. If
MW is already doing it, I think it would be better to stick with the
default caching policy.

I suspect that previously we had the issue because it was going
through the AJAX interface -- which I guess MW doesn't cache? That
would explain why we needed it before.

-Bo

Luca de Alfaro

unread,
Aug 4, 2009, 2:35:13 PM8/4/09
to wikitru...@googlegroups.com
Ian, perfect, this is exactly correct.  #1 is the top priority, so we can have a version we give out (the rest, as far as I know, works).
#2 is a nice optimization, but we can do it:
  • either when we know how to do it in a generic way that works with many (all?) caching mechanisms
  • or when we know which specific clients (ie WMF) need it.
Do you all agree?
Luca

Ian Pye

unread,
Aug 4, 2009, 2:36:48 PM8/4/09
to wikitru...@googlegroups.com
On Tue, Aug 4, 2009 at 11:33 AM, Bo Adler<thu...@alumni.caltech.edu> wrote:
>
> On #2, I think we should look at that more carefully, now that we have
> evidence suggesting that MW is already caching our results for us.  If
> MW is already doing it, I think it would be better to stick with the
> default caching policy.
>
> I suspect that previously we had the issue because it was going
> through the AJAX interface -- which I guess MW doesn't cache?  That
> would explain why we needed it before.

Exactly. I think #2 is really needed for when/if we run the remote
mode, so that our servers can handle the load via AJAX.

Luca -- I agree #1 is something we need to solve so that our extension
plays nicely with MW.

Luca de Alfaro

unread,
Aug 4, 2009, 2:39:51 PM8/4/09
to wikitru...@googlegroups.com
So let's recap: we need to avoid colored results being cached by mediawiki.
The simplest approach is not to have colored results ever cached.
The better approach is to have colored results cached, except when:
  • someone votes
  • the colored page is invalid (it says "no trust info available").
Luca
Reply all
Reply to author
Forward
0 new messages