Clean-up old dedup data

1,403 views
Skip to first unread message

Kuniyasu Suzaki

unread,
Jul 3, 2011, 11:35:32 PM7/3/11
to dedupfilesystem-...@googlegroups.com

Hello,

Please tell me how to clean-up old deduplication data.
I want to synchronize the dedup data size with current files.

On Linux, dedup data and hash table are stored on the following files
/opt/sdfs/volumes/***/chunksotre/chunks/chunk.chk
/opt/sdfs/volumes/***/chunksotre/hdb/hashstore-sdfs

The sizes are not decreased, even if the files on SDFS are deleted.
I want to cut-off the dedup data for the removed files.

The current files on SDFS are confirmed on the following directory
/opt/sdfs/volumes/***/files/

------
suzaki

Adrian Cichowski

unread,
Aug 10, 2011, 8:41:46 AM8/10/11
to dedupfilesystem-sdfs-user-discuss
Hi,

I have exactly the same problem with reclaiming space used by old
deduplication data and wondering if there is any solution to fix this
issue ?
How can I reclaim free space after removing files from deduplicated
share ?

Best Regards.
Adrian Cichowski

Adam Tauno Williams

unread,
Aug 17, 2011, 1:19:29 PM8/17/11
to dedupfilesystem-...@googlegroups.com
On Wed, 2011-08-10 at 05:41 -0700, Adrian Cichowski wrote:
> Hi,
>
> I have exactly the same problem with reclaiming space used by old
> deduplication data and wondering if there is any solution to fix this
> issue ?
> How can I reclaim free space after removing files from deduplicated
> share ?

<http://www.opendedup.org/administrators-guide#dc>

Quoting
----------------
Data Chunk Removal:

SDFS uses a batch process to remove unused blocks of hashed data.This
process is used because the file system is decoupled from the back end
storage (ChunkStore) where the actual data is held. As hashed data
becomes stale they are removed from the chunk store. The process for
determining and removing stale chunks is as follows.

SDFS file-system informs the ChunkStore what chunks are currently in
use. This happens when chunks are first created and then every 2 hours
on the hour after that. The DSE (chunk store) checks for data that has
not been claimed in the last 8 hours upon mount and then every 4 hours
after that. The chunks that have not been claimed in the last 10 hours
upon mount and 6 hours after that are put into a pool and overwritten as
new data is written to the ChunkStore.

The chunkstore can be cleaned manually by running :

Linux :

setfattr -n user.cmd.cleanstore -v 5555:<minutes back> <mount-path>

Windows

sdfscli --cleanstore=<minutes>

The size of the chunks.chk will not diminish but rather SDFS will
re-allocate space already written to, but unclaimed.

As stated above, the volume claims chunks on chunks every two hours.
This can be configured to happen more or less frequently by editing the
SDFS configuration file and modifing the "claim-hash-schedule"
attribute. This should always occure more frequently than the
"eviction-age" attribute set for the DSE ("chunk-store" tag).

The DSE claim schedule can be modified through the "chunk-gc-schedule"
attribute. Again, this should occure more frequently than the
"eviction-age" attribute set for the DSE ("chunk-store" tag).

Finally, the "eviction-age" is set based on hours and by default it is
6. This can be changed but should be greater than the
"claim-hash-schedule" and "chunk-gc-schedule".

All of this is configurable and can be changed after a volume is written
to. Take a look at cron format for more details.

Reply all
Reply to author
Forward
0 new messages