Confused and need some help -- Dedup rates

137 views
Skip to first unread message

Matt Maxson

unread,
Oct 6, 2017, 4:59:35 AM10/6/17
to dedupfilesystem-sdfs-user-discuss
I just installed OpenDedup and created a 10 TB volume.  I started copying some data to it (dumps of mysql databases that have been gzipped).  I was thinking this type of data would be highly dedeuplicated (there's an new export of the entire DB for each day...so the entire previous day's data would be duplicated in the next day's file).  When I run df -h, I see odd looking things:

My base path (I specified a --base-path when I created the volume) shows 41G used
My sdfs volume shows 39G used

and sdfscli --volume-info:

Files : 707

Volume Capacity : 10 TB

Volume Current Logical Size : 63.1 GB

Volume Max Percentage Full : 95.0%

Volume Duplicate Data Written : 24.27 GB

Unique Blocks Stored: 38.88 GB

Unique Blocks Stored after Compression : 38.98 GB

Cluster Block Copies : 2

Volume Virtual Dedup Rate (Unique Blocks Stored/Current Size) : 38.38%

Volume Actual Storage Savings (Compressed Unique Blocks Stored/Current Size) : 38.23%

Compression Rate: -0.24%


I'm under the impression that OpenDedup does the deduplication live when I copy the data in.  Is that not true?


So, it looks like OpenDedup is actually costing me disk space because I have a negative dedup rate.  Is that correct?  Am I mistaken that my database dumps should be highly dedupable?


I feel like I have to be missing something here.


Thanks in advance for the help.

Sam Silverberg

unread,
Oct 6, 2017, 10:39:36 AM10/6/17
to dedupfilesystem-...@googlegroups.com
Are the dumps compressed? It looks like you are sending compressed data from the output

--
You received this message because you are subscribed to the Google Groups "dedupfilesystem-sdfs-user-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedupfilesystem-sdfs-u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matt Maxson

unread,
Oct 7, 2017, 5:46:31 AM10/7/17
to dedupfilesystem-sdfs-user-discuss
>Are the dumps compressed?
"dumps of mysql databases that have been gzipped"

I copied a total of 400GB testing this out.  Not all of it was gzipped mysql dumps....most of it would have been that random user home directory stuff everyone seems to have around.  I guess I was expecting some random benefit that **some** of the data would have been considered duplicated at the block level.  Instead, I saw my compression rate actually drop.  By the time I aborted my copy, I was up to -.26% compression.

I have to have really misunderstood the use-case for de-duplication then.

Thanks for the help. 

Matt
To unsubscribe from this group and stop receiving emails from it, send an email to dedupfilesystem-sdfs-user-discuss+unsubscribe@googlegroups.com.

Sam Silverberg

unread,
Oct 7, 2017, 11:56:31 AM10/7/17
to dedupfilesystem-...@googlegroups.com
Hi Max - Compressed data does not dedupe. OpenDedupe will be unable to add any additional savings for that data.
Reply all
Reply to author
Forward
0 new messages