How to estimate how much data SmartPools will move around ?

466 views
Skip to first unread message

Jean-Baptiste Denis

unread,
Jan 26, 2014, 3:34:49 PM1/26/14
to isilon-u...@googlegroups.com
Hi,

I've already posted my question on EMC forum, but no answer so far. I've
also tried a live chat with EMC support, without luck : the answser was
: "use du".

I'm running 6.5.5.22. I was wondering if there is a way to evaluate how
much data SmartPools will move around before effectively running it.

I've seen this document https://support.emc.com/kb/16435 but running
something like "isi smartpools process --nop --stats --recurse
/ifs/mydata" will only give the amount of files and directories that
each policy match.

Any ideas ? Is that something possible in 7.x.x.x ?

Keith Nargi

unread,
Jan 26, 2014, 4:42:58 PM1/26/14
to isilon-u...@googlegroups.com
I think your best bet is to figure out what the policy will actually do and look at insight iq to figure out how much data will actually move.  For example if you wanted all mp3 files to be moved you look at FSA and see how many files and what the capacity is.  You the create the file pool policy and start the smart pools job and let it run.  The amount moved should be pretty darn close to the amount you saw in IIQ.  

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
Keith 

Jean-Baptiste Denis

unread,
Jan 27, 2014, 2:56:35 AM1/27/14
to isilon-u...@googlegroups.com
On 01/26/2014 10:42 PM, Keith Nargi wrote:
> I think your best bet is to figure out what the policy will actually do
> and look at insight iq to figure out how much data will actually move.

Yes, of course. But I don't want to do that =)

Peter Serocka

unread,
Jan 27, 2014, 3:09:50 AM1/27/14
to Jean-Baptiste Denis, isilon-u...@googlegroups.com

On 2014 Jan 27. md, at 04:34 st, Jean-Baptiste Denis wrote:

>
> Any ideas ? Is that something possible in 7.x.x.x ?



No real changes in 7.1 ...

At least isi smartpools process --nop --stats FILENAME
can tell you the policy-rule that one given file matches.

The target pool for each policy-rule should also be known
(it is somehow encoded in isi_gconfig output, or you maintain
by hand a table corresponding to the policy-rule definitions).

That together gives you the pool where a file is /supposed/
to be stored according to current policies.

isi get -D FILENAME reveals on which pools a file /actually/ lives.

If these two pools are not the same, then
SmartPools will need to migrate this file
from one pool to the other.

Finally, get the file size with du/stat/ls;
repeat for all files in question, sum up, et voila!

Might not be feasable for millions of files, though ;-(
Perhaps sampling a random subset will still yield a
reasonable estimation in such a case.

Happy hacking!

-- Peter
> --
> You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Peter Serocka
CAS-MPG Partner Institute for Computational Biology (PICB)
Shanghai Institutes for Biological Sciences (SIBS)
Chinese Academy of Sciences (CAS)
320 Yue Yang Rd, Shanghai 200031, China
pser...@picb.ac.cn





Peter Serocka

unread,
Jan 27, 2014, 3:11:50 AM1/27/14
to Jean-Baptiste Denis, isilon-u...@googlegroups.com
Didn't see that reply before sending other mail two minutes ago ;-)

-- Peter
> --
> You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Jean-Baptiste Denis

unread,
Jan 27, 2014, 4:07:23 AM1/27/14
to isilon-u...@googlegroups.com
On 01/27/2014 09:09 AM, Peter Serocka wrote:

> No real changes in 7.1 ...

Thank you for this information.

> Finally, get the file size with du/stat/ls;
> repeat for all files in question, sum up, et voila!

In theory, that'll do the trick =)

> Might not be feasable for millions of files, though ;-(
> Perhaps sampling a random subset will still yield a
> reasonable estimation in such a case.

I'd like to use the logic within the isi smartpools command and not
(badly) reinvent the wheel. So frustrating not to have the total volume
in the --stats output :/

As Keith previously suggested, I'll try to leverage FSA and InsightIQ to
achieve what I want ! Not sure I can map what I can express in a
smartpool policy with FS/InsightIQ though.

Thank you for the answers,

Jean-Baptiste



Peter Serocka

unread,
Jan 27, 2014, 5:26:04 AM1/27/14
to Jean-Baptiste Denis, isilon-u...@googlegroups.com

On 2014 Jan 27. md, at 17:07 st, Jean-Baptiste Denis wrote:

>
>
> As Keith previously suggested, I'll try to leverage FSA and InsightIQ to
> achieve what I want ! Not sure I can map what I can express in a
> smartpool policy with FS/InsightIQ though.

Sure -- as far as I can see, IIQ can break out reports by Disk Pool,
so one can distinguish matching stuff in the source pool from
matching stuff in the target pool.

[Meanwhile, in a different universe...]

Just noticed that we haven't yet fully exploited the present options:

isi smartpools apply --dont-restripe --recurse ...

This does NOT move any bytes around pools (so it's fast),
but marks the due plans in the metadata. Then,

isi get -r -D ...

will report something like:

* Disk pools: policy x200_ssd(4) -> target nearline(2)

for files where any real migration will happen during the actual run.

Now both passes (isi smartpools and isi get) can use efficient
built-in recursions, and results are easier to parse :)
Not so much of re-inventing the wheel, maybe just the axle.


Anyway, let us know what you find;
we are certainly going to need such predictions over here.

-- Peter

Jean-Baptiste Denis

unread,
Mar 6, 2014, 6:24:16 AM3/6/14
to isilon-u...@googlegroups.com
> Anyway, let us know what you find; we are certainly going to need
> such predictions over here.

Sorry, didn't have time to deal with this problem until now. Your
two passes solution (isi smartpools apply --dont-restripe --recurse ... then,
isi get -r -D) seems nice. Did you give it a try ?

For now I'm using http://sourceforge.net/apps/trac/robinhood over a NFSv3
non-root-squash export. It's clearly more tailored for Lustre filesystem
(using live information from the transaction log), but can handle "generic"
filesystem quite well also. It is not as powerful as InsightIQ could be,
but it's working for me now, which is quite an advantage =)

It basically scans the filesystem (using an efficient multithread scanning
algorithm) and put information in a mysql database. After that, you've got
cli utilities and a web GUI to retrieve information from it.

Sample outputs from my data :

========
# rbh-report -i 2>/dev/null
type , count, volume, avg_size
symlink , 24873, 1.80 MB, 76
dir , 2313600, 3.94 GB, 1.79 KB
file , 109642670, 365.35 TB, 3.49 MB
fifo , 4, 0, 0
sock , 5, 0, 0

Total: 111981152 entries, 401715151109476 bytes (365.36 TB)

# rbh-report -a 2>/dev/null

Filesystem scan activity:

Current scan interval: 7.0d

Last filesystem scan:
status: done
start: 2014/03/04 00:10:08
end: 2014/03/06 06:01:59
duration: 2d 5h 51min 51s

Statistics:
entries scanned: 111982086
errors: 1
timeouts: 0
# threads: 8
average speed: 577.62 entries/sec

Storage usage has never been checked

No purge was performed on this filesystem
========

(I'm filtering out stderr due to warning concerning my Filesets definition, see below).

So, two days for scanning 365 TB and more than 100 millions entries
(HUGE number of small files. During scan, I was oscillating between
6000 entries/s and 20 entries/s. Filer is in production).

It results in a 40 GB mysql database.

I've defined some policies based on last access time in the configuration file.

To give you an idea :

Filesets
{
FileClass between_0m_and_1m
{
definition
{
last_access <= 30day
}
}

FileClass between_1m_and_3m
{
definition
{
last_access > 30day
and
last_access <= 90day
}
}

FileClass between_3m_and_6m
{
definition
{
last_access > 90day
and
last_access <= 180day
}
}

FileClass more_than_6m
{
definition
{
last_access > 180day
}
}
}

FYI, Filesets need to be referenced in the purge_policies (even if you don't purge anything) :

purge_policies {
ignore_fileclass = between_0m_and_1m;
ignore_fileclass = between_1m_and_3m;
ignore_fileclass = between_3m_and_6m;
ignore_fileclass = more_than_6m;
}

Which give me this kind of output, after 10 minutes of processing
(I'm clearly not a database expert, it's running on the VM with 8GB of ram,
NFS datastore on the cluster in production I'm actually crawling... There
is some space for performance improvement here :D)

# rbh-report --class-info 2>/dev/null
[10 minutes later]
purge class , count, spc_used, volume, min_size, max_size, avg_size
between_0m_and_1m , 12421765, 80.48 TB, 63.41 TB, 0, 135.22 GB, 5.35 MB
between_1m_and_3m , 26631999, 68.34 TB, 50.29 TB, 0, 137.86 GB, 1.98 MB
between_3m_and_6m , 7976866, 100.27 TB, 79.80 TB, 0, 4.00 TB, 10.49 MB
more_than_6m , 62636922, 226.09 TB, 171.86 TB, 0, 1.60 TB, 2.88 MB

So now I've got some metrics (and some histograms/pie charts on the web gui) to estimate how
much a new smartpool tiering policy will move around, and some nice CLI tools to retrieve information
from the database (rbh-find, rbh-du), which is nice. You have to keep in mind obvious things like the
access time is relative to the time when the robinhood scan engine reach a given file, but this still
give you a quite precise view on the state of your data.

The full admin guide here should give you a good view of what is possible : http://sourceforge.net/apps/trac/robinhood/wiki/Doc

Nice generic alternative to InsightIQ. Active project (last release a few weeks ago), active community.
At least, it is enough for my initial need exposed in this thread. Of course InsightIQ is a much more
powerful solution when your FSAnalyze job has succeeded ;)

Jean-Baptiste

Jason Davis

unread,
Mar 6, 2014, 11:23:08 AM3/6/14
to isilon-u...@googlegroups.com
Nice find sir. This might fit in nicely with our workflows as we do HPC work on our Isilon clusters and getting users to clean up scratch is nearly impossible to do with out breaking knee caps.


Paul Jackson

unread,
Mar 6, 2014, 4:25:27 PM3/6/14
to isilon-u...@googlegroups.com, jbd...@pasteur.fr
Jean-Baptise,

Robinhood looks very intersting. I've been able to get it to run properly on a local filesystem but run into a "/mounted_fs/data1" is in a filesystem different from root. Entry ignored. Did you ever come across this error when you set up your nfs mount for scanning?

Thanks,

Paul

Peter Serocka

unread,
Mar 7, 2014, 5:47:25 AM3/7/14
to isilon-u...@googlegroups.com

On 2014 Mar 6. md, at 19:24 st, Jean-Baptiste Denis wrote:

>> Anyway, let us know what you find; we are certainly going to need
>> such predictions over here.
>
> Sorry, didn't have time to deal with this problem until now. Your
> two passes solution (isi smartpools apply --dont-restripe --recurse ... then,
> isi get -r -D) seems nice. Did you give it a try ?

More or less a proof-of-concept and performance test.
While it exactly follows the SmartPools policies,
the scan rates are not too high, as one would expect.
Up to 1000 files/s with metadata on SSD
and up to 200 files/s on SATA.
So probably the best use will be with large
data or video files rather rather with than
home dirs or masses of tiny files from genome sequencing.

Robinhood is great stuff, and I already wondered
wether running it natively on Isilon would be
more efficient than over NFS, after all ;-)

If you want to take it to the next level,
have a look at http://www.irods.org .
Another cool effort, but definitely overkill for us
for the time being.

In general I feel that while it is
notoriously difficult for users to
plan their storage consumption,
any retro-active means are also a pain.
Instead, users seem to be most responsive,
and susceptive to advise, while
actually working on stuff.

Thus, using SmartQuotas and activity monitoring
we aim at detecting unusual storage patterns
in real time. And then often our advise leads
to optimized workflows, or even helps finding
workflows to be broken... I should say that
our users are young researchers and (many) students,
who through this way are best learning that the
granted freedom comes with a high need for responsibility.

Fits our environment quite well, and
we love the Isilon tools (quotas, isi statistics).
As usual, your mileage might vary :)

Have a nice weekend

Jean-Baptiste Denis

unread,
Mar 7, 2014, 10:41:45 AM3/7/14
to Paul Jackson, isilon-u...@googlegroups.com
> Robinhood looks very intersting. I've been able to get it to run
> properly on a local filesystem but run into a "/mounted_fs/data1" is in
> a filesystem different from root. Entry ignored. Did you ever come
> across this error when you set up your nfs mount for scanning?

Mmm, does not ring a bell sorry. Maybe you forgot to enable the "no root
squash option" on the export configuration ? Or maybe you're trying one
shot mode by specifying a directory that is not a subdirectory of the
"fs_path" set in configuration file.

You should try the robinhood mailing list with some details :

- your configuration file and the command line you used
- are you running robinhood in daemon mode or in one shot mode ?
- what is the exact error on console output/log file ?

Since I'm still not comfortable with robinhood configuration
possibilities, I'm not using any purge policies and I'm using a
read-only export. I urge anybody to do the same while playing with
it :D

Jean-Baptiste

Jean-Baptiste Denis

unread,
Mar 7, 2014, 11:24:59 AM3/7/14
to isilon-u...@googlegroups.com
On 03/07/2014 11:47 AM, Peter Serocka wrote:

> More or less a proof-of-concept and performance test.
> While it exactly follows the SmartPools policies,
> the scan rates are not too high, as one would expect.
> Up to 1000 files/s with metadata on SSD
> and up to 200 files/s on SATA.
> So probably the best use will be with large
> data or video files rather rather with than
> home dirs or masses of tiny files from genome sequencing.

Ok, thank you for this feedback.

> If you want to take it to the next level,
> have a look at http://www.irods.org .
> Another cool effort, but definitely overkill for us
> for the time being.

I've bought "iRods Primer" book months ago. I've just play with it on my
workstation and I really like the concept of a big vault of data
implemented the way you want with an associated queryable catalog and
some simple CLI tools. You request the data to be staged on your POSIX
filesystem before working on them which could be a very suitable
workflow. It is quite a paradigm shift for the users though, but for me
it's clearly the way to go in the long term.

> Thus, using SmartQuotas and activity monitoring
> we aim at detecting unusual storage patterns
> in real time. And then often our advise leads
> to optimized workflows, or even helps finding
> workflows to be broken... I should say that
> our users are young researchers and (many) students,
> who through this way are best learning that the
> granted freedom comes with a high need for responsibility.

"Simple" tools and metrics, well used. I like it. Out of curiosity, how
are you detecting unusual storage patterns in real time ? Are you
graphing quotas usage evolution ?

Thank you for again for your feedback.

Jean-Baptiste

Peter Serocka

unread,
Mar 7, 2014, 11:58:18 AM3/7/14
to isilon-u...@googlegroups.com

On Sat 8 Mar '14 md, at 00:24 st, Jean-Baptiste Denis <jbd...@pasteur.fr> wrote:
.
>
> "Simple" tools and metrics, well used. I like it. Out of curiosity, how are you detecting unusual storage patterns in real time ? Are you graphing quotas usage evolution ?

Even simpler, just a “top” style display
showing the relative changes and short-time peak rates
from the past hours. (To be more exact, changes since 8am or 8pm resp.)
And with about 15 research groups we still “know”
what people are usually doing in the various fields.

Plus another “top” display based on isi statistics client,
one ROW per user or remotemachine or both paired,
rows sorted by computed “utilization” (Ops x TimeAvg);
and ops classes (read/write/meta read etc) re-organized
as additional COLUMNS (rather than rows); it’s just the
Total Perspective Vortex for the cluster.

Also experimenting with denser tops for
heat and drive statistics. All stuff running as shell
scripts on the cluster — should definitely
change that soon…

Plus of course saved daily quota reports for extented
trending/reporting/billing, and crontab'ed full
isi client/heat/drive statistics every fifteen minutes
for solving the difficult cases. So to say poor man’s InsightIQ.

— P.



>
> Thank you for again for your feedback.
>
> Jean-Baptiste
>
> --
> You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Jean-Baptiste Denis

unread,
Mar 7, 2014, 12:02:00 PM3/7/14
to isilon-u...@googlegroups.com
> Even simpler, just a "top" style display
> showing the relative changes and short-time peak rates
> from the past hours. (To be more exact, changes since 8am or 8pm resp.)
> And with about 15 research groups we still "know"
> what people are usually doing in the various fields.
>
> ...

I get the spirit, my curiosity is satisfied =)


Reply all
Reply to author
Forward
0 new messages