Dedupe on Isilon

450 views
Skip to first unread message

Kenneth Van Kley

unread,
Jan 18, 2017, 9:05:38 AM1/18/17
to Isilon Technical User Group
Is anyone successfully using the Dedupe feature on their Isilon cluster?

We've been running dedupe on a 16 node cluster (60% full / X410's) for 36 days now on a small number of paths and it's still chugging along...

Here's the Progress so far:

Progress: Iteration 1, updating index, scanned 5837762 files, 431868 directories, 14292036750 blocks, skipped 23203500 files, sampled 891316835 blocks, deduped 302604813 blocks, with 0 errors and 6837132 unsuccessful dedupe attempts


For what I'm saving, it hardly seems worth the cycles...

Chris Pepper

unread,
Jan 18, 2017, 9:12:49 AM1/18/17
to isilon-u...@googlegroups.com
We considered it but the only place it would make a material difference was our 28-node cluster. We could add a whole node, increasing both capacity and performance, for less than a Dedupe license, which would increase complexity and load (slightly).

Chris
> --
> You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jerry Uanino

unread,
Jan 18, 2017, 9:34:38 AM1/18/17
to isilon-u...@googlegroups.com
I believe the dedupe license comes in the higher end "packages".  I think we have it included in some license level we needed to purchase for some other reason. (I could be wrong).
We have had good luck with it on our archive cluster mostly because the paths we are running on lend themselves well to dedup (uncompressed xml and log text).

You can run a dedup assessment which at least will tell you if it's worth running your dedup.
I'm getting 13T savings on 18T (which seems much higher than I would expect).
Looks like the runs take as long as 4 days and as short as 2 days.
I'm only deduping this 18T, 
Cluster is 7 nodes, 130T each. (NL 400's).

You might want to target specific paths you know are more likely to benefit from the dedup.

> To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-group+unsubscribe@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-group+unsubscribe@googlegroups.com.

Andrew Stack

unread,
Jan 18, 2017, 11:55:19 AM1/18/17
to isilon-u...@googlegroups.com
Hi Kenneth,

Most of the answers you are seeking are laid out in this DMC white paper which does a decent job of describing the job and it's various phases.


I encourage you to read the section 'Performance with SmartDedupe'.

My experience is that this is best suited for high capacity clusters like the HD400 platform and is particularly enticing for SyncIQ destination target clusters.  This is not a suitable solution (as of yet) for performance dependent data sets.

As previously mentioned it's a good idea to reach out to your service rep and get a temp license to do a scan to see what your potential savings are because the cost is insane.  On this theme of cost, we have hammered DMC (like Run DMC for you old schoolers) that this license cost is crap when you consider that other vendors **cough, Netapp** offer this for free.  So far crickets, but if there's enough haters then maybe some winds of change will stir within the empire...

Hope this helps!

-- 
Andrew Stack
Sr. Storage Administrator, Storage SSF Data Center Services
Pharma Informatics
Genentech/F. Hoffmann-La Roche Ltd.
Mobile - 650.867.5524

Jerry Uanino

unread,
Jan 18, 2017, 2:28:48 PM1/18/17
to isilon-u...@googlegroups.com
Ah.... I forgot... my destination *is* a syncIQ destination which might explain why it works good for me.
Reply all
Reply to author
Forward
0 new messages