Errors from tar, unexpectedly huge store

Skip to first unread message

John Goerzen

Jan 19, 2011, 11:18:29 PM1/19/11
to dedupfilesystem-sdfs-user-discuss
Hi folks,

I set up a new SDFS for testing today, and was somewhat surprised at
the results.

I simply made a tar file of /usr/bin on my system, which is 516MB. I
then extracted it onto my SDFS filesystem.

Several things then troubled me.

First of all, tar emitted an error apparently related to symlinks,
saying "No such file or directory". Unpacking this tarfile worked
fine on the regular ext4 filesystem, as well as the S3QL deduping
filesystem I was testing.

Secondly -- and I should note here that I set the size limit for the
volume at 2GB -- memory use went up to 350MB for just this time 516MB

Third, the store behind the SDFS mounted filesystem was *larger* than
the original, by a significant margin: 791MB vs. 516MB.

I then extracted the tarfile to a different path on the SDFS mounted
filesystem to test deduping. This raised the store behind the SDFS
mounted filesystem to 823M.

Fourth, running du -s over the unpacked directory produced multiple
"No such file or directory" errors. The files du was trying to
examine were part of the original tarball, but ls didn't see them in
the SDFS filesystem. I don't know why du did, but then apparently
couldn't stat() them.

Finally, a find . | wc -l over the unpacked data for a single tarfile
returned 3222 instead of 3232 files that it should have returned.

Is all this expected? I am concerned about the integrity of my data
under SDFS given all the above.


-- John

Sam Silverberg

Jan 20, 2011, 12:07:11 PM1/20/11

Thanks for the report. Currently SDFS does not support hardlinks. I
will be providing support for these in the near future but that may be
some of the errors you are receiving. I will be releasing a new
version of SDFS very shortly that has a fix for issues around rapid
file creation. I believe this issue could effect tar expansion since
small files are created in rapid succession. Once a file is written to
SDFS its there and the integrity is solid.

In terms of storage growth, Tars, when extracted do not dedup against
the tar itself. This is due to the fact that the files within the tar
do not align with the offsets of the SDFS filesystem.

Here is an example of what will not dedup well:
Tar to Extracted files within tar
Tar with one file within it changed or removed
Zipped Tars

Here is an example of what will dedup well:
actual files, with slight changes within copied directly to filesystem.

I will do more testing with Tar Extraction and let you know my results.

Reply all
Reply to author
0 new messages