Cache cleanup does not remove empty directories

Showing 1-7 of 7 messages
Cache cleanup does not remove empty directories Mark Breedlove 4/25/12 10:03 AM
I have a site that does wildcard virtual hosting, such that I have a
VirtualHost with:
ServerAlias *.example.com
... and the web application that powers the site interprets the
hostname to determine what content to serve, in this case, pages for
the numerous users of the site.  E.g. person.example.com.

As a result, thousands of cache directories are created under /var/www/
mod_pagespeed/cache.  Cache cleanup seems to work as intended, and
removes old regular files from these directories.  However, it does
not remove the empty directories for the various virtual hosts.  As a
result, the inode count on the server becomes pretty high after a few
days (greater than a million).

Example:
/var/www/mod_pagespeed/cache/http,3A/,2Fusername.example.com/css
... will be empty

Ideally, in my case, where not all of these sites get visited that
often, these directories would be deleted.  I am wondering if the high
directory count is contributing to higher i/o latency than one would
normally like to see on this server.  On this server, the inode count
on the relevant partition can get above 1.5 million in a few days.  It
seems that when I completely purge the cache directory (via shutdown,
mv directory, restart, delete old directory), iowait drops for a day
or two and the inode count drops to the tens of thousands.

Is it safe for me to run a script that periodically removes empty
directories from /var/www/mod_pagespeed/cache?

Should I fill out an issue in the issue tracker to request that old
cache subdirectories be removed?

Thanks in advance!
Re: Cache cleanup does not remove empty directories jmarantz 4/25/12 2:23 PM
I think you've nailed it.  This is a great corner-case we didn't think about in our caching design, when there are huge numbers of domains in one server.

Periodically purging the empty directories should be fine.  I'd recommend using "rmdir" rather than "rm -rf" in case something was writing into the directory while you were removing it.

Let us know if this helps resolve the issue, or whether the performance of your system still looks like a problem.  A resolution we could consider is to add a pagespeed.conf option to synthesize some more hierarchy our cache directories.  I wouldn't want to change our default behavior as adding the hierarchy may slow down performance when there are a smaller number of domains.

-Josh
Re: Cache cleanup does not remove empty directories Matt Atterbury 4/26/12 5:18 AM
Periodically purging the empty directories should be fine.  I'd recommend using "rmdir" rather than "rm -rf" in case something was writing into the directory while you were removing it.

+1 ... DO NOT use rm, use rmdir and ignore any error. Doing this from a cron job should be fine.
Something like: find <yourdir> -depth -empty -type d -exec rmdir {} \; 2>/dev/null

Re: Cache cleanup does not remove empty directories Mark Breedlove 4/26/12 3:28 PM
Thanks, Joshua and Matt!

I have had a script running now on cron for about a day, and all signs
are positive so far.  It is keeping the inode count down to about a
third to a half of what it used to be.  I was having an issue before
running it where there would be periods throughout the day where
iowait would be at > 20% for hours at a time, intermittently
throughout the day.  (Would I be guessing correctly to assume that the
mod_pagespeed cleanup thread had extra work to do in traversing all of
those empty directories?)  I have not seen this in the day since
implementing this script.  I will post an update in a few days or a
week to confirm how this is still looking.

My shell script contains the following:

#!/bin/bash
LOCKFILE=/var/tmp/cleanpsdirs.lock
test -e $LOCKFILE && exit;
touch $LOCKFILE;
find /var/www/mod_pagespeed/cache -type d -empty ! -name '*lock' |
head -1000 | xargs rmdir
rm $LOCKFILE

It is run on cron every 2 minutes with:
nice -n 19 /usr/local/etc/cleanpsdirs.sh 2>/dev/null

It avoids removing the lock directories that I occasionally see
directly under /var/www/mod_pagespeed/cache, and only does a measured
amount of work at a time so as not to hog resources.  It seems not to
add anything significant to the load of the system.

By the way, thanks for your work on mod_pagespeed!  It's really
incredibly helpful.
Re: Cache cleanup does not remove empty directories Matt Atterbury 4/26/12 7:06 PM
Hi Mark, thanks for the update, and great to know it's working as hoped.

I have 2 comments about your script though:
a) 'mkdir $LOCKDIR || exit 0' is better than using touch because mkdir is atomic whereas test+touch isn't.
b) You don't need the head before the xargs to limit the command size since xargs does this automatically,
    but if, as I suspect, you're doing that just to limit how much cleanup is done per run, then it's a great idea.

cheers, m.
Re: Cache cleanup does not remove empty directories Mark Breedlove 4/26/12 7:11 PM

On Apr 26, 10:06 pm, Matt Atterbury <matterb...@google.com> wrote:
> a) 'mkdir $LOCKDIR || exit 0' is better than using touch because mkdir is
> atomic whereas test+touch isn't.

Aha, thanks for that tip.  I will do that instead.

> b) You don't need the head before the xargs to limit the command size since
> xargs does this automatically,
>     but if, as I suspect, you're doing that just to limit how much cleanup
> is done per run, then it's a great idea.

Yup, just to limit how much is done per run.
Re: Cache cleanup does not remove empty directories Mark Breedlove 5/2/12 7:23 PM
Hey, Joshua and Matt.  It's been about a week now since I started
cleaning up the empty directories, and it's been working as I'd
hoped.  I do not see any more surges in i/o wait when I look at sar,
and 'df -i' reveals that I have hundreds of thousands fewer inodes on
the filesystem than I used to.

Thanks for your assistance,
Mark