Mod_pagespeed cache folder contains hundreds and thousands of strange folders

336 views
Skip to first unread message

Jay Gao

unread,
Jul 14, 2016, 10:15:34 AM7/14/16
to mod-pagespeed-discuss
Hi,
First of all thanks in advance for your help.

We use SiteGround cloud server hosting, with Joomla 3.5.1, and Mod_pagespeed. Overall PageSpeed did improve our site loading speed which is great, but what we noticed is that once or twice a day, each time for up to 2 hours, the server CPU and disk I/O peaked and the site slows down. After some investigation we found that on our server in the cache folder such as this: /.pagespeed_cache/DOMAIN/prop_page/https,3A/,2FDOMAIN/, we have hundreds if not thousands of subfolders that we have no idea about, things like axis, bitweaver, bugs, buildbot,  bugzilla, cfanywhere, helpdesk, livechat, DMC, dokuwiki, eGroupware, elite, firestats etc. There are too many of them.

We are a small business website with simple setup, and this just doesn't make sense to me. I do recognise that some of the folder names look like apps in Cpanel, but that doesn't explain why they should be there.

What we then find is that at a specific time of about an hour or 2, those folders and files were either created or updated (by timestamp), and that specific time coincides with the peaks of CPU and disk reads.

My questions are:
1. do you also have these folders in your cache? Does pagespeed cache everything on the sever, such as server system apps etc, much more than just the website?
2. What do you think could have caused this?

Any help is appreciated.
Jay

Otto van der Schaaf

unread,
Jul 15, 2016, 10:52:52 AM7/15/16
to mod-pagesp...@googlegroups.com
Re: unexpected domain names in the file cache directory:

Entries like /pagespeed_cache/DOMAIN are created when pagespeed processes a request for DOMAIN (or actually hostname) and it needs to remember something about that url.
It looks like your webserver is listening and responding to all incoming hostnames, and it seems someone or something (bots) are performing requests to your server with the hostnames you are seeing in the file cache directory. 

To change that you could:
- Change Apache's configuration to only listen to hostnames that you want to serve (explicitly listing your site's domain name(s)).
- Modify ModPagespeed's configuration to explicitly list what you want to allow. For Example:

        ModPagespeedDisallow *;
        ModPagespeedAllow *.yourdomain.com;

Either of these options should fix the unexpected hostnames from showing up in the configured file cached directory.

Re: Peaking CPU and disk I/O

I think it would be good to scan the access logs:
- Is there large increase in traffic during these times where CPU and I/O peak? 
- If so, what do the user-agent's and hostnames look like?

I would also recommend enabling the pagespeed admin pages and statistics logging:

Specifially, the console and statistics pages could be helpful for you in diagnosing if it turns out there is a mod_pagespeed side to it, and having the raw logged stats file may help debug some more if nessecary.

Otto

--
You received this message because you are subscribed to the Google Groups "mod-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mod-pagespeed-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mod-pagespeed-discuss/4dbc0887-76df-4ecf-bcab-186c5a3242a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jay Gao

unread,
Jul 26, 2016, 10:56:07 AM7/26/16
to mod-pagespeed-discuss

Thanks for your suggestions Otto.

We did find out that the reason for all those entries in the pagespeed cache was being caused by bots accessing non-existent pages and also by a security scan of our site, probing for weak points. Hence all the references to software which we never actually use. We’ve now modified our htaccess to block the most aggressive bots. After flushing the pagespeed cache we have a much smaller set of resources being cached compared with before.

We’ll speak with Siteground to see about turning on or having access to the admin pages and statistics logging because we are seeing other issues with pagespeed right now.




Reply all
Reply to author
Forward
0 new messages