Directory Size

1,894 views
Skip to first unread message

prstat

unread,
Feb 6, 2012, 12:53:33 PM2/6/12
to Isilon Technical User Group
Hello

I have an Isilon server with several very large directories under /
ifs. Is there a better way than du -hs /ifs/bigdir to find out how
big various points in the tree are? I have many TB's and running du
can take all day.

Thanks

Chris Pepper

unread,
Feb 6, 2012, 1:32:00 PM2/6/12
to isilon-u...@googlegroups.com, prstat

If you have the SmartQuotas license, you can create advisory quotas for
all the interesting directories.

Note that OneFS has a bug: it will not allow you to move a directory
from quota root under another -- must use cp/rsync and rm instead --
which we find extremely annoying...

Chris

prstat

unread,
Feb 6, 2012, 3:37:23 PM2/6/12
to Isilon Technical User Group

>         If you have the SmartQuotas license, you can create advisory quotas for
> all the interesting directories.
>

We don't have the SmartQuotas license yet, but I imagine it will be
one of the first extras we tack on.


>         Note that OneFS has a bug: it will not allow you to move a directory
> from quota root under another -- must use cp/rsync and rm instead --
> which we find extremely annoying...

Thanks. I can see that being a substantial problem.

jerry

unread,
Feb 6, 2012, 3:41:36 PM2/6/12
to Isilon Technical User Group
isi quota create --dir --path=/ifs/some/path --accounting

Or you can use the new fancy (more expensive I think) insightiq stuff.
fwiw, insightIQ is killer at a lot of this kinda of analytic stuff.

Quotas are a great start though (and probably required for insightiq
to report on it anyway now that I think about it).

Lane

unread,
Feb 13, 2013, 4:57:01 PM2/13/13
to isilon-u...@googlegroups.com
What if we don't have any of the add ons (additional licensing). Is there another way to do this other dan du?

Jerry Uanino

unread,
Feb 13, 2013, 5:14:05 PM2/13/13
to isilon-u...@googlegroups.com
smartquotas are so nice....
This is going to be tough without that option.... maybe du --apparent-size would be faster?

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Andrew Stack

unread,
Feb 13, 2013, 9:54:38 PM2/13/13
to isilon-u...@googlegroups.com
Your going to be out of luck unless you use insightiq.  A 3rd party tool will do a crawl.  DU will also take time given the size your referencing. Insightiq works with and takes advantage of fsanalyze which leverages metadata for fast analysis. So skip into bosses office and see if there's a few pennies to spare in the next budget cycle.  My two cents...

Andrew Stack
Sent from my iPhone

LinuxRox

unread,
Feb 13, 2013, 9:59:10 PM2/13/13
to isilon-u...@googlegroups.com
i am not sure if fsanalyze is going to provide instant gratification, on our cluster of 7 x 108NL nodes that is 75% full, fsanalyze takes around 48 hours to complete.

Erik Weiman

unread,
Feb 13, 2013, 10:00:42 PM2/13/13
to isilon-u...@googlegroups.com
You need to be licensed for InsightIQ to use FSAnalyze in any manner. 

--
Erik Weiman 
Sent from my iPhone 4

Peter Serocka

unread,
Feb 14, 2013, 7:10:07 AM2/14/13
to isilon-u...@googlegroups.com
On Thu 14 Feb '13 md, at 05:57 st, Lane wrote:

> What if we don't have any of the add ons (additional licensing). Is
> there another way to do this other dan du?

Seriously, get temporary *evaluation* licenses for
SmartQuotas or InsightIQ - see what you learn
and then look further...

Also seriously: for a purely homebrew approach
avoiding full crawls, be prepared for some "forensic"
action along this path:

*** Just omit any HUGE directories from (exact) crawls,
then deal with them separately. ***

find /ifs -type d -size +100k -prune -o -print

will list everything BUT the contents of the
large dirs (also skipping all their subdirs).

(You will need to do some more scripting to
extract and accumulate the reported files' sizes.)

This should give you a pretty QUICK overview
at least of the "visible matter" on your system/
(You can experiment with the treshold size
of +100k in the find command.)

Compare the result to the output of a "df /ifs"
to estimate the amount of "dark matter"
that gets skipped by the above find.

(You need to make a rough(!) a priori estimation of
the difference between "appearent" file sizes and "consumed"
disk space, according to the choosen protection scheme(s) and
the assumed distribution of file sizes.)


Dealing with the "dark matter": I. Flat matter

The good news about directories with gazillions of
files is that you (usually) don't have gazillions of
users doing individual stuff ;-)

These files are typically generated in highly automated manners
and normally follow predictive patterns which can be derived
from relative small numbers of samples
(or from knowledge about the generating workflows).

That said, to analyse your "dark matter", check for
typical file sizes and file name lengths.
Then, generate a list of the "dark matter" directories with
find /ifs/ -type d -size +100k -prune

It is safe (= fast!) to apply a "ls -ld" or "stat ..."
on the resulting large dirs! The obtained "size" of
a directory indicates the number N of
(immediately) contained files by:
size = sum of file name lengths + N * 32
~= N * (avg file name length + 32)

That should give some first clues about
the overall distribution of the dark matter
= the troublesome dirs.

Btw, "isi statistics heat ..." on the CLI
requires no extra license and shows you
where in the /ifs currently the largest
action takes place...


Dealing with the "dark matter": II. Sub-structured matter

Very often, those large dirs are
pretty "flat" - but you need to check wether
you are that lucky:

The number of links reported by the (fast!)
"ls -ld" on a dir is the number of subdirs + 2,
hence: 2 => no subdirs (lucky case), 3 => one subdir, etc.

If there do exist subdirs (which "find ... -prune"
does not descent into!) you need to leverage more
information on the "typical" directory layout to
include them into the estimation.
(Such as: these subdirs might usually have
common names like "tmp", or the numbers and sizes
of files are in fixed relation to those on the parent dir, etc.
Did I mention that gazillions of files are likely
to be highly correlated in their properties?
And I promised you some forensic work.)

Estimate your dark matter and compare
with the global df /ifs .
Finally, you should be able to see what's going
at least at a reasonable scale, while not bytewise exact --
and with pretty limited amounts of "crawling" in the /ifs.

Your users might find any new insights helpful, too,
and thus might provide a more qualified level of feedback,
which in turn might be leveraged for
further improving the accuracy of the forensic approach.

Have fun!

Peter


>
>
> On Monday, February 6, 2012 10:53:33 AM UTC-7, scott wrote:Hello
>
> I have an Isilon server with several very large directories under /
> ifs. Is there a better way than du -hs /ifs/bigdir to find out how
> big various points in the tree are? I have many TB's and running du
> can take all day.
>
> Thanks
>
Reply all
Reply to author
Forward
0 new messages