On Thu 14 Feb '13 md, at 05:57 st, Lane wrote:
> What if we don't have any of the add ons (additional licensing). Is
> there another way to do this other dan du?
Seriously, get temporary *evaluation* licenses for
SmartQuotas or InsightIQ - see what you learn
and then look further...
Also seriously: for a purely homebrew approach
avoiding full crawls, be prepared for some "forensic"
action along this path:
*** Just omit any HUGE directories from (exact) crawls,
then deal with them separately. ***
find /ifs -type d -size +100k -prune -o -print
will list everything BUT the contents of the
large dirs (also skipping all their subdirs).
(You will need to do some more scripting to
extract and accumulate the reported files' sizes.)
This should give you a pretty QUICK overview
at least of the "visible matter" on your system/
(You can experiment with the treshold size
of +100k in the find command.)
Compare the result to the output of a "df /ifs"
to estimate the amount of "dark matter"
that gets skipped by the above find.
(You need to make a rough(!) a priori estimation of
the difference between "appearent" file sizes and "consumed"
disk space, according to the choosen protection scheme(s) and
the assumed distribution of file sizes.)
Dealing with the "dark matter": I. Flat matter
The good news about directories with gazillions of
files is that you (usually) don't have gazillions of
users doing individual stuff ;-)
These files are typically generated in highly automated manners
and normally follow predictive patterns which can be derived
from relative small numbers of samples
(or from knowledge about the generating workflows).
That said, to analyse your "dark matter", check for
typical file sizes and file name lengths.
Then, generate a list of the "dark matter" directories with
find /ifs/ -type d -size +100k -prune
It is safe (= fast!) to apply a "ls -ld" or "stat ..."
on the resulting large dirs! The obtained "size" of
a directory indicates the number N of
(immediately) contained files by:
size = sum of file name lengths + N * 32
~= N * (avg file name length + 32)
That should give some first clues about
the overall distribution of the dark matter
= the troublesome dirs.
Btw, "isi statistics heat ..." on the CLI
requires no extra license and shows you
where in the /ifs currently the largest
action takes place...
Dealing with the "dark matter": II. Sub-structured matter
Very often, those large dirs are
pretty "flat" - but you need to check wether
you are that lucky:
The number of links reported by the (fast!)
"ls -ld" on a dir is the number of subdirs + 2,
hence: 2 => no subdirs (lucky case), 3 => one subdir, etc.
If there do exist subdirs (which "find ... -prune"
does not descent into!) you need to leverage more
information on the "typical" directory layout to
include them into the estimation.
(Such as: these subdirs might usually have
common names like "tmp", or the numbers and sizes
of files are in fixed relation to those on the parent dir, etc.
Did I mention that gazillions of files are likely
to be highly correlated in their properties?
And I promised you some forensic work.)
Estimate your dark matter and compare
with the global df /ifs .
Finally, you should be able to see what's going
at least at a reasonable scale, while not bytewise exact --
and with pretty limited amounts of "crawling" in the /ifs.
Your users might find any new insights helpful, too,
and thus might provide a more qualified level of feedback,
which in turn might be leveraged for
further improving the accuracy of the forensic approach.
Have fun!
Peter
>
>
> On Monday, February 6, 2012 10:53:33 AM UTC-7, scott wrote:Hello
>
> I have an Isilon server with several very large directories under /
> ifs. Is there a better way than du -hs /ifs/bigdir to find out how
> big various points in the tree are? I have many TB's and running du
> can take all day.
>
> Thanks
>