Ngx_pagespeed_cache hacked?

138 views
Skip to first unread message

Jacob Share

unread,
Nov 23, 2014, 3:11:09 AM11/23/14
to ngx-pagesp...@googlegroups.com
Early this morning, Munin started emailing me warnings like this:

myserver.com :: vps.myserver.com :: Inode usage in percent
        WARNINGs: /dev is 92.11 (outside range [:92]).
        OKs: / is 26.12, /run/lock is 0.00, /run/shm is 0.00, /run is 0.39.

Googling 'reduce inode usage' I found this:


So I did:

{{{
sudo find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n
}}}

Which pointed at /dev/ngx_pagespeed_cache.

I ran the search again directly in the cache directory and discovered subdirectories for domains I don't own with .hk and .cn TLDs. I then deleted them and rebooted the server, without first checking when they were created :( (Since I have ngx_pagespeed_cache in a tmpfs directory, a reboot was probably all I needed to do). 

Munin's inode usage graphs showed me that usage started skyrocketing on Thursday Nov. 20th. The deletion + server reboot solved the problem, bringing inode usage back to where it was earlier.

I checked logwatch for that day and there was nothing unusual. My server's firewall is set up to only allow SSH connections from my IP and one other, and logwatch showed a login from my IP only.

I don't remember SSH-ing in that day, but I might have. Regardless, it looks like the only way those illicitly-cached directories could have made it onto the server was via my computer.

Am I right, or is related to ngx_pagespeed_cache?

How can I debug further?

Thanks

Jacob


Joshua Marantz

unread,
Nov 23, 2014, 7:37:36 AM11/23/14
to ngx-pagesp...@googlegroups.com
There are two questions here I think:

1. ngx_pagespeed is using too many inodes.  Can we make it stop?
2. ngx_paegspeed has inode entries for other domains not in my control.  Is that bad?

The answer to the first question is 'yes'.  We don't examine the filesystem currently to determine its physical capacity so you have to help us by limiting it.  See https://developers.google.com/speed/pagespeed/module/system#file_cache e.g.:
    pagespeed FileCacheInodeLimit        500000;

The answer to the second question is: "not necessarily."  The cache contains entries for URLs that PageSpeed wants to remember things about.  Depending on your settings, PageSpeed might attempt to load resources (images, css, and javascript) from other domains.  One feature that comes to mind is "Inlining resources without explicit authorization:"


If you enable that feature, PageSpeed may wind up writing information about any domain referenced by your HTML into its cache.  I'm not recalling offhand what other features

Another situation where you'd see this is if you set up a forward proxy in your browser, and add "pagespeed Domain *;" to your configuration.

Before I go further it might be helpful to see your pagespeed.conf.


--
You received this message because you are subscribed to the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngx-pagespeed-di...@googlegroups.com.
Visit this group at http://groups.google.com/group/ngx-pagespeed-discuss.
For more options, visit https://groups.google.com/d/optout.

Jacob Share

unread,
Nov 23, 2014, 8:33:04 AM11/23/14
to ngx-pagesp...@googlegroups.com
I don't use a pagespeed.conf; some of the settings are directly in my nginx.conf, and others (mainly location settings, such as for the console) are in my main domain's conf file.

From nginx.conf:

  ## Pagespeed
  pagespeed on;
  pagespeed FileCachePath /dev/ngx_pagespeed_cache;
  pagespeed Statistics on;
  pagespeed StatisticsLogging on;
  pagespeed LogDir /var/log/pagespeed;
  pagespeed LRUCacheKbPerProcess     8192;
  pagespeed LRUCacheByteLimit        16384;

  ## Pagespeed tools
  pagespeed StatisticsPath /ngx_pagespeed_statistics;
  pagespeed GlobalStatisticsPath /ngx_pagespeed_global_statistics;
  pagespeed MessagesPath /ngx_pagespeed_message;
  pagespeed ConsolePath /pagespeed_console;
  pagespeed AdminPath /pagespeed_admin;
  pagespeed GlobalAdminPath /pagespeed_global_admin;

  ## Pagespeed filters - Core 
  pagespeed DisableFilters rewrite_images;
  pagespeed DisableFilters add_head;
  pagespeed EnableFilters inline_images,jpeg_subsampling;

  ## Pagespeed filters - non-Core
  pagespeed EnableFilters remove_comments;
  pagespeed EnableFilters collapse_whitespace;
  pagespeed EnableFilters trim_urls;
  pagespeed EnableFilters insert_dns_prefetch;
  pagespeed EnableFilters dedup_inlined_images;

from my main domain's conf:

    # Ensure requests for pagespeed optimized resources go to the pagespeed
    # handler and no extraneous headers get set.
    location /ngx_pagespeed_statistics {
        auth_basic            "Restricted Access";
        auth_basic_user_file  /var/www/myserver.com/.htpasswd;
        }
location /ngx_pagespeed_global_statistics { allow 127.0.0.1; deny all; }
location /ngx_pagespeed_message { allow 127.0.0.1; deny all; }
    location /pagespeed_console {
        auth_basic            "Restricted Access";
        auth_basic_user_file  /var/www/myserver.com/.htpasswd;
        }
location ~ ^/pagespeed_admin { allow 127.0.0.1; deny all; }
location ~ ^/pagespeed_global_admin { allow 127.0.0.1; deny all; }
    location ~ "^/ngx_pagespeed_static/" { }
    location ~ "^/ngx_pagespeed_beacon$" { }
    location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" { add_header "" ""; }

The .hk and .cn domains that also appeared in the pagespeed cache aren't domains that any of my sites link to.

Joshua Marantz

unread,
Nov 23, 2014, 8:37:56 AM11/23/14
to ngx-pagesp...@googlegroups.com
Can you give a few of the full pathnames to files in your cache from .hk and .cn?   Also please give the file contents (if small enough) as an attachment.



--

Jacob Share

unread,
Nov 23, 2014, 8:45:29 AM11/23/14
to ngx-pagesp...@googlegroups.com
The domain caches were:

/dev/ngx_pagespeed_cache/appledaily.com.hk
/dev/ngx_pagespeed_cache/passiontimes.hk
/dev/ngx_pagespeed_cache/url.cn
/dev/ngx_pagespeed_cache/newyorker.com
/dev/ngx_pagespeed_cache/tumblr.com

Unfortunately, I don't think I can do better than that. When I discovered them earlier today, I kind of panicked and deleted them because inode usage was >93% and rising.

But I might not have deleted everything. The remaining directories in /dev/ngx_pagespeed_cache/ are:
  • one for each of my domains on the server
  • one for the server's public IP
  • localhost
  • !clean!time!
  • prop_page
  • rname
Anything unusual there?

Joshua Marantz

unread,
Nov 23, 2014, 8:53:15 AM11/23/14
to ngx-pagesp...@googlegroups.com
None of those are unusual, but I'm looking for a path to a real file, all the way down to the leaf node.  If you put in the inode limit, your system should stay reasonably healthy on its own in terms of inode usage, but some of these directories named for other domains might start to reappear.  Then you can dive into one of these directories in a few days and see what files are in there.

-Josh

Jacob Share

unread,
Nov 23, 2014, 8:55:14 AM11/23/14
to ngx-pagesp...@googlegroups.com
Ok, I'll add the inode usage limit and report back if this happens again.

thanks Josh

Jeff Kaufman

unread,
Nov 24, 2014, 8:01:00 AM11/24/14
to ngx-pagesp...@googlegroups.com
This can also happen because of requests to your server with fake Host
headers: https://groups.google.com/forum/#!msg/mod-pagespeed-discuss/zWgCfnAQIkE/JZmILm_jR6YJ

On Sun, Nov 23, 2014 at 8:55 AM, Jacob Share <share...@gmail.com> wrote:
> Ok, I'll add the inode usage limit and report back if this happens again.
>
> thanks Josh
>

Jacob Share

unread,
Nov 25, 2014, 1:24:39 PM11/25/14
to ngx-pagesp...@googlegroups.com
This just happened again, even though I set the inode limit (perhaps too high).

You can a path to a real file from this:

root@ip-10-162-77-189:/dev/ngx_pagespeed_cache/url.cn/http,3A/,2F6.url.cn/zc/chs/img# ls -la
total 4
drwxrwxrwx 2 www-data www-data  60 Nov 25 03:53 .
drwxrwxrwx 3 www-data www-data  60 Nov 24 03:11 ..
-rw------- 1 www-data www-data 173 Nov 25 03:53 body.png,

Took 3 hours for the server to go from 92% to 98%, so I rebooted it and now it's back down to 26%.

Joshua Marantz

unread,
Nov 25, 2014, 1:43:07 PM11/25/14
to ngx-pagesp...@googlegroups.com
Sounds like it's worth cutting your inode-limit in half; then repeat until you never get to this situation.


RE that specific file: I am guessing that there is some reference to http://url.cn/zc/chs/img/body.png in your site somewhere.  The content in your file is so small that it's probably just a 404 or 403 or a redirect header.  Fetching that file via wget indicates it's a redirect to an HTML file called 'sorry'.  I don't know what that's about.

Or, as Jeff suggested, that someone is trying to access that file from your server by sending a fake Host.  I don't know why that would be happening.  If you log referrers in your access log, maybe you can find out who the Referer is for that page when it was requested.

And if you follow the thread Jeff linked to, he suggests a virtual-host setup that will eliminate this problem.

-Josh


--

Jacob Share

unread,
Nov 27, 2014, 3:56:33 AM11/27/14
to ngx-pagesp...@googlegroups.com
This is still happening, and at the same inode usage growth rate according to Munin.

Also according to Munin, this started happening on Nov. 20th. However, nothing noticeably changed in the server environment that day- I didn't install any updates of any kind in the e.g. 24 hours before inode usage started rocketing.

Taking this file that is currently sitting in ngx_pagespeed cache:

root@ip-10-162-77-189:/dev/ngx_pagespeed_cache/url.cn/http,3A/,2F6.url.cn/zc/chs/img# ls -la
total 4
drwxrwxrwx 2 www-data www-data  60 Nov 26 04:20 .
drwxrwxrwx 3 www-data www-data  60 Nov 26 04:20 ..
-rw------- 1 www-data www-data 173 Nov 26 04:20 body.png,


I just did a full text search of my access and error logs for the past 24 hours, and found this:

125.64.35.67 - - [26/Nov/2014:05:50:43 +0200] "GET http://6.url.cn/zc/chs/img/body.png HTTP/1.1" 400 666 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.3; Trident/7.0; .NET4.0E; .NET4.0C; .NET CLR 3.5.3072; .NET CLR 2.0.50727; .NET CLR 3.0.30729; Tablet PC 2.0)"

I'm going to try what Jeff suggested to block unexpected Host headers.


Jacob Share

unread,
Dec 3, 2014, 2:49:32 AM12/3/14
to ngx-pagesp...@googlegroups.com
Jeff's suggestion to block unexpected Host headers solved the problem of strange domains appearing in the cache, but it didn't solve the inode usage problem.

Joshua's suggestion to set a FileCacheInodeLimit has reined in growth to prevent the server from being overwhelmed and necessitating automated cache purges, as you can see the new plateau at the end of the attached Munin graph.

However, these are both bandaids. I still have no clue why inode usage skyrocketed on Nov. 20th from its regular 'jagged' plateau. Week 45 shows an unrelated server reboot, but it's worth noting the slower growth after that reboot in comparison with the 4 manual reboots/troughs during Week 48.

For context- no new software or updates were installed on my server on the 20th, there was no significant spike in traffic to my websites that day, and according to Munin's fairly long list of indicators, no other unexpected behavior took place at the same times.

Any ideas how I can debug this further?
df_inode-month.png

Joshua Marantz

unread,
Dec 3, 2014, 6:27:20 AM12/3/14
to ngx-pagesp...@googlegroups.com
I would not say that having pagespee enforce an inode limit on its cache is a bandaid.  That is exactly how a cache should be tuned to the limits of the underlying hardware.

But I agree it is worthwhile to understand what changed.  My first guess is that your cms or php or javascript has added some kind of cache busting query param to resource urls.

Is that possible?  If so we should review why that is needed and whether the query params can be changed less frequently.

Josh
--

Joshua Marantz

unread,
Dec 3, 2014, 6:29:20 AM12/3/14
to ngx-pagesp...@googlegroups.com
And it should be possible to debug it by scanning the directory tree for the leaf filenames.  That will tell you all the variants of what is being cached.

Jacob Share

unread,
Dec 3, 2014, 10:53:11 AM12/3/14
to ngx-pagesp...@googlegroups.com
I agree with you, the inode limit is a good practice overall, but in terms of stemming my sudden inode growth problem, it's just holding back the tide (to mix metaphors) where there wasn't any tide at all just a few days ago.

There are some css and javascript files that have cache busting query params, but those only change when the related plugins or theme is updated, and that didn't happen immediately prior to the inode usage explosion. Plus, they rarely all change at the same time since updates are staggered throughout the year, and I've never had this problem before.

To look closer at the 'leafs', I did:

find /dev/ngx_pagespeed_cache/ -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n

And right away the culprit has appeared!

It turns out there was some software that was updated on the server the day that the usage exploded: from Piwik 2.9.0 to Piwik 2.9.1. 

So now my question is what do I need to add to my nginx configs to stop a subdirectory (e.g. Piwik) from being cached by ngx_pagespeed?



You received this message because you are subscribed to a topic in the Google Groups "ngx-pagespeed-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ngx-pagespeed-discuss/6lI1M_MNdvw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ngx-pagespeed-di...@googlegroups.com.



--

To your job search success,

Jacob

--
Jacob Share
Job Search Expert & Professional Blogging Consultant
https://jobmob.co.il/

Follow me on Twitter:
https://twitter.com/jacobshare

Joshua Marantz

unread,
Dec 3, 2014, 10:56:32 AM12/3/14
to ngx-pagesp...@googlegroups.com
Great catch!  Try this:

pagespeed Disallow */subdirectory/*;

-Josh

Jacob Share

unread,
Dec 8, 2014, 11:21:36 AM12/8/14
to ngx-pagesp...@googlegroups.com
Now I'm having a syntax issue.

I've tried a number of variations of the directory path to be disallowed but it's still being cached.

What do you suggest if this is the directory path to the piwik install:


When I use the earlier find command, things like this are being found:

/dev/ngx_pagespeed_cache/prop_page/https,3A/,2Fmyserver.com/piwik/piwik.php,3Faction_name=...

Jeff Kaufman

unread,
Dec 8, 2014, 11:27:00 AM12/8/14
to ngx-pagesp...@googlegroups.com
What about:

pagespeed Disallow */piwik/*;

If you used the command exactly as Josh gave it (with
*/subdirectory/*) then it's not surprising it failed to fix the
problem: your bad paths don't contain "/subdirectory/".

Jacob Share

unread,
Dec 9, 2014, 8:55:36 AM12/9/14
to ngx-pagesp...@googlegroups.com
Nope, that didn't do it either. Piwik pages are still being cached.

I pasted this into my nginx.conf:

  pagespeed Disallow */piwik/*;

Any other suggestions?

Jeff Kaufman

unread,
Dec 9, 2014, 9:34:02 AM12/9/14
to ngx-pagesp...@googlegroups.com
Can you give an example piwik file that's still being cached, with
both the url and the path of it in the file cache? What does "ls -l"
on that path give you?
Reply all
Reply to author
Forward
0 new messages