What's going with CloudFlare and caching and such.

21 views
Skip to first unread message

Robin Lee Powell

unread,
Feb 6, 2020, 1:29:23 AM2/6/20
to loj...@googlegroups.com
Some of you have noticed problems with dynamic content on lojban.org
now that we've started using CloudFlare. I haven't been able to
figure out how to fix this, so here's what's going on; maybe
somebody else will have ideas.

So the goal of CloudFlare, primarily, was to make it so that if my
server went down, the site would be basically fine. This we have
acheived.

However, to do so, I had to use brute force. Here's our CloudFlare
page rules:

*lojban.org/*&* Cache Level: Bypass
*lojban.org/*edit* Cache Level: Bypass
*lojban.org/*Special:* Cache Level: Bypass
*lojban.org/*Talk:* Cache Level: Bypass
*lojban.org/* Browser Cache TTL: 30 minutes, Always Online: On, Cache Level: Cache Everything, Edge Cache TTL: 2 hours, Origin Cache Control: Off

That last one is a very large hammer that says "just cache the hell
out of everything".

The reason I need that hammer is that mediawiki is returning
absurdly wrong caching headers. Here's an example that entirely
bypasses CloudFlare:

$ curl -k -v -H 'Host: mw.lojban.org' -L https://jukni.lojban.org/papri/pronunciation 2>&1 | less
[snip]
< HTTP/1.1 200 OK
< Date: Thu, 06 Feb 2020 06:26:31 GMT
< Server: Apache/2.4.38 (Debian)
< X-Powered-By: PHP/7.3.14
< X-Content-Type-Options: nosniff
< Content-language: en
< Vary: Accept-Encoding,Cookie
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Cache-Control: private, must-revalidate, max-age=0

^^ That. That Cache-Control line is absurd, and effectively
completely disableds CloudFlare.

I have tested this by talking *directly* to the mediawiki server,
no SSL, no proxies, no nothing; it's the same.

Our LocalSettings.php file is attached.

I don't care if the solution is on the mediawiki side or the
CloudFlare side, but at this point I've sunk more time into this
than I can afford and I've come up empty, so I'd appreciate any
ideas you might have.

Having said that, if you're going to point me at
https://www.mediawiki.org/wiki/Manual:CloudFlare#Integration_with_MediaWiki
, you'll need to tell me exactly which part you think is relevant,
and why. Most of that page is about making X-Forwarded-For: do the
right thing, which is totally irrelevant to this problem.

Thanks for any help.
ls.php.txt

Robin Lee Powell

unread,
Feb 6, 2020, 2:49:37 AM2/6/20
to loj...@googlegroups.com
FWIW, I turned the test site on at , for example ,
http://mw-test.lojban.org/papri/Pronunciation (which is a very
simple, static page).

If you look at the headers there you'll see:

< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache

And below is the entire LocalSettings.php file for the test site; as
you can see it's absurdly simple.

So, again, I have no idea why mediawiki is saying not to cache the
pages ever.

Also, $wgUseSquid = true; makes no difference.

- ------------

$ cat /var/www/mediawiki/LocalSettings.php | grep -v '^#'
<?php

if ( !defined( 'MEDIAWIKI' ) ) {
exit;
}


$wgSitename = "Lojban";
$wgMetaNamespace = "Lojban";

$wgScriptPath = "";
$wgScriptExtension = ".php";
$wgArticlePath = "/papri/$1";
//$wgArticlePath = "/wiki/$1"; # Virtual path. This directory MUST be different from the one used in $wgScriptPath

$wgServer = "http://mw-test.lojban.org";
$wgCanonicalServer = "http://mw-test.lojban.org";

$wgStylePath = "$wgScriptPath/skins";

$wgDBtype = "mysql";
$wgDBserver = "jukni:11036"; # Not 'localhost'; that will try to do a socket connection, instead of TCP
$wgDBname = "mediawiki";
$wgDBuser = "root";
$wgDBpassword = "[snip]";

$wgDBprefix = "";

$wgDBTableOptions = "ENGINE=InnoDB, DEFAULT CHARSET=binary";

$wgDBmysql5 = true;

- ------------

Mike S.

unread,
Feb 7, 2020, 7:59:22 AM2/7/20
to loj...@googlegroups.com
On Thu, Feb 6, 2020 at 1:29 AM Robin Lee Powell <rlpo...@digitalkingdom.org> wrote:
Some of you have noticed problems with dynamic content on lojban.org
now that we've started using CloudFlare.  I haven't been able to
figure out how to fix this, so here's what's going on; maybe
somebody else will have ideas.

So the goal of CloudFlare, primarily, was to make it so that if my
server went down, the site would be basically fine.  This we have
acheived.

Judging from what I saw during the last outage, I believe Cloudflare is caching only pages that have been requested at least once. Assuming I am right about that, I don't know if there is a way to make Cloudflare cache the whole site without clicking through every page on the Wiki.  By the way, I don't care about this issue since the site is up most of the time anyway, but since the whole point of the exercise is to cache pages and that doesn't necessarily happen, I thought I'd mention it! 

As far as the main issue, I can only guess.  It might have something to do with an extension hijacking the Cache-Control header, but I don't know.

At any rate, your bypass rule for talk pages seems to be working okay.  Since we don't know what the problem is, I suggest the following for a temp fix --  Please add the following rules in the proper location:

*lojban.org/*LFK*          Cache Level: Bypass
*lojban.org/*[bypasscache]*          Cache Level: Bypass

The first line will help the new committee work on their pages.  The second line will allow a person to work on an arbitrary page without the cache.  When the page is finished, it can then be moved to the correct location.

I *believe* these rules will be sufficient for the time being and get you off the hook.  Your help is very much appreciated!.

-Mike


 
--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/20200206062919.GU26741%40stodi.digitalkingdom.org.

la .eris.

unread,
Feb 7, 2020, 12:35:35 PM2/7/20
to lojban
Personally, this is seeming like more trouble than it's worth. Pages keep displaying improperly (looking like I'm not logged in, or like someone else is logged in). I'd prefer to just have the site down when your server is down and have everything work properly the rest of the time.

Incidentally, how do the Wikimedia Foundation wikis handle caching?

-Aris (aka. bookofportals)

Robin Lee Powell

unread,
Feb 11, 2020, 1:59:14 AM2/11/20
to loj...@googlegroups.com


On Fri, Feb 07, 2020 at 07:58:58AM -0500, Mike S. wrote:
> On Thu, Feb 6, 2020 at 1:29 AM Robin Lee Powell
> <rlpo...@digitalkingdom.org> wrote:
>
> > Some of you have noticed problems with dynamic content on
> > lojban.org now that we've started using CloudFlare. I haven't
> > been able to figure out how to fix this, so here's what's going
> > on; maybe somebody else will have ideas.
> >
> > So the goal of CloudFlare, primarily, was to make it so that if
> > my server went down, the site would be basically fine. This we
> > have acheived.
> >
>
> Judging from what I saw during the last outage, I believe
> Cloudflare is caching only pages that have been requested at least
> once. Assuming I am right about that, I don't know if there is a
> way to make Cloudflare cache the whole site without clicking
> through every page on the Wiki. By the way, I don't care about
> this issue since the site is up most of the time anyway, but since
> the whole point of the exercise is to cache pages and that doesn't
> necessarily happen, I thought I'd mention it!

Yeah, you're absolutely right about that. For me the point is to
make it so that the *basics* are up no matter what.

The specific complaints that I was receiving were about that the
site being down *entirely* sends a message to people unfamiliar with
the projec that it's dead. Which is not unreasonable: if I'd just
heard about a super-nerdy language and I went to the website and it
was down, I would in fact assume that the project was dead,
personally.

> At any rate, your bypass rule for talk pages seems to be working okay.
> Since we don't know what the problem is, I suggest the following for a temp
> fix -- Please add the following rules in the proper location:
>
> *lojban.org/*LFK* Cache Level: Bypass
> *lojban.org/*[bypasscache]* Cache Level: Bypass
>
> The first line will help the new committee work on their pages. The second
> line will allow a person to work on an arbitrary page without the cache.
> When the page is finished, it can then be moved to the correct location.
>
> I *believe* these rules will be sufficient for the time being and get you
> off the hook. Your help is very much appreciated!.

I'm actually wondering if we might want to go the other way:
explicitely cache the very static stuff (javascript, css, etc) and
then list specific pages that we don't mind caching long-term, like
the front page.

Robin Lee Powell

unread,
Feb 11, 2020, 2:08:00 AM2/11/20
to loj...@googlegroups.com


On Fri, Feb 07, 2020 at 09:35:34AM -0800, la .eris. wrote:
> Personally, this is seeming like more trouble than it's worth.
> Pages keep displaying improperly (looking like I'm not logged in,
> or like someone else is logged in). I'd prefer to just have the
> site down when your server is down and have everything work
> properly the rest of the time.

I've gotten a *lot* of negative feedback about outages, and I'm
pretty sick of it; the complaints about the current state have been
quite mild by comparsion.

> Incidentally, how do the Wikimedia Foundation wikis handle
> caching?

Well, presumably they have their mediawiki tuned to not tell abject
lies to the upstream systems, but I imagine they have their own
caching servers that they can control directly. We could do that
too if we didn't mind paying for it, but always-on cloud-based web
servers are typically not very cheap, which is why I've always run
stuff out of my house for "free".

Oh, turns out their architecture is described at
https://www.mediawiki.org/wiki/Manual:MediaWiki_architecture

Speaking in general, there are many ways to handle this sort of
situation, most of which I'm personally familiar with (as this is my
day job), but all of them require considerable time spent or
considerable money or both.

If people (or the LLG) want to pitch in to spend money for always-on
Squid caching proxy servers that aren't at my house, that's
certainly an option, but for that to actually work properly we'd
*still* need to figure out why mediawiki is telling lies in the
cache control headers.

Robin Lee Powell

unread,
Feb 11, 2020, 2:18:59 AM2/11/20
to loj...@googlegroups.com
Did those rules anyway, by the way, although I don't required the
square brackets; anything with "bypasscache" is fine.

Mike S.

unread,
Feb 11, 2020, 4:05:15 PM2/11/20
to loj...@googlegroups.com
On Tue, Feb 11, 2020 at 1:59 AM Robin Lee Powell <rlpo...@digitalkingdom.org> wrote:
>
>
>
> On Fri, Feb 07, 2020 at 07:58:58AM -0500, Mike S. wrote:
> > On Thu, Feb 6, 2020 at 1:29 AM Robin Lee Powell
> > <rlpo...@digitalkingdom.org> wrote:
> >
> > > Some of you have noticed problems with dynamic content on
> > > lojban.org now that we've started using CloudFlare.  I haven't
> > > been able to figure out how to fix this, so here's what's going
> > > on; maybe somebody else will have ideas.
> > >
> > > So the goal of CloudFlare, primarily, was to make it so that if
> > > my server went down, the site would be basically fine.  This we
> > > have acheived.
> > >
> >
> > Judging from what I saw during the last outage, I believe
> > Cloudflare is caching only pages that have been requested at least
> > once. Assuming I am right about that, I don't know if there is a
> > way to make Cloudflare cache the whole site without clicking
> > through every page on the Wiki.  By the way, I don't care about
> > this issue since the site is up most of the time anyway, but since
> > the whole point of the exercise is to cache pages and that doesn't
> > necessarily happen, I thought I'd mention it!
>
> Yeah, you're absolutely right about that.  For me the point is to
> make it so that the *basics* are up no matter what.
>

As a side note, it dawns on me one could run a website-ripping tool to hit all the pages.  




> The specific complaints that I was receiving were about that the
> site being down *entirely* sends a message to people unfamiliar with
> the projec that it's dead.  Which is not unreasonable: if I'd just
> heard about a super-nerdy language and I went to the website and it
> was down, I would in fact assume that the project was dead,
> personally.
>
> > At any rate, your bypass rule for talk pages seems to be working okay.
> > Since we don't know what the problem is, I suggest the following for a temp
> > fix --  Please add the following rules in the proper location:
> >
> > *lojban.org/*LFK*          Cache Level: Bypass
> > *lojban.org/*[bypasscache]*          Cache Level: Bypass
> >
> > The first line will help the new committee work on their pages.  The second
> > line will allow a person to work on an arbitrary page without the cache.
> > When the page is finished, it can then be moved to the correct location.
> >
> > I *believe* these rules will be sufficient for the time being and get you
> > off the hook.  Your help is very much appreciated!.
>
> I'm actually wondering if we might want to go the other way:
> explicitely cache the very static stuff (javascript, css, etc) and
> then list specific pages that we don't mind caching long-term, like
> the front page.
>

I was sorta wondering why it was done the way it was done.  Are you using CloudFlare partly to minimize the load on your server?  

Also, does adding a "Bypass" rule for a page deactivate the cache entirely for that page? (That would obviously be very bad for some pages.)

If the answer to both questions is no, why not bypass all dynamically generated pages like so:

*lojban.org/papri/*          Cache Level: Bypass
*lojban.org/*php*          Cache Level: Bypass

*lojban.org/*           Browser Cache TTL: 30 minutes, Always Online: On, Cache Level: Cache Everything, Edge Cache TTL: 2 hours, Origin Cache Control: Off

... assuming, again, the caching mechanism continues to work, which we could test in order to be sure.

The first line makes the whole Wiki live when accessed through the short URL.  The second any pages directly invoking php, which includes page edits (which need to be bypassed).

^Just a thought.  The LFK bypass rule is working (I know because I tested it) which makes the situation a lot better.  Thanks.


Mike S.

unread,
Feb 11, 2020, 6:44:05 PM2/11/20
to loj...@googlegroups.com

On Tue, Feb 11, 2020 at 4:04 PM Mike S. <mai...@gmail.com> wrote:
>
>
> I was sorta wondering why it was done the way it was done.  Are you using CloudFlare partly to minimize the load on your server?  
>
> Also, does adding a "Bypass" rule for a page deactivate the cache entirely for that page? (That would obviously be very bad for some pages.)
>
> If the answer to both questions is no, why not bypass all dynamically generated pages like so:
>
> *lojban.org/papri/*          Cache Level: Bypass
> *lojban.org/*php*          Cache Level: Bypass
> *lojban.org/*           Browser Cache TTL: 30 minutes, Always Online: On, Cache Level: Cache Everything, Edge Cache TTL: 2 hours, Origin Cache Control: Off
>
> ... assuming, again, the caching mechanism continues to work, which we could test in order to be sure.
>
> The first line makes the whole Wiki live when accessed through the short URL.  The second any pages directly invoking php, which includes page edits (which need to be bypassed).

I don't think this setup is going to work based on my reading of the docs at https://support.cloudflare.com/hc/en-us/articles/218411427-Understanding-and-Configuring-Cloudflare-Page-Rules-Page-Rules-Tutorial- which says pretty plainly

Cache Level - Bypass - Cloudflare does not cache.

There must be a way to do it though.  It's an obvious use case.  I'll read the docs more a little later.  Are you using the free plan or a paid one?

-Mike



Robin Lee Powell

unread,
Feb 26, 2020, 2:15:42 AM2/26/20
to loj...@googlegroups.com
How's the behaviour now? You should be seeing that logging in gets
you the live site without caching.

On Fri, Feb 07, 2020 at 09:35:34AM -0800, la .eris. wrote:
> --
> You received this message because you are subscribed to the Google Groups "lojban" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lojban+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lojban/d444e62c-2706-4bd5-ba06-84da52b2c69c%40googlegroups.com.

Mike S.

unread,
Feb 27, 2020, 5:40:01 PM2/27/20
to loj...@googlegroups.com
I ran some tests three days ago and it all seems good so far!
-Mike



Reply all
Reply to author
Forward
0 new messages