How can I optimize disk performance with the folder locations in <volume>-volume-cfg.xml?

224 views
Skip to first unread message

edbv...@gmail.com

unread,
Jun 10, 2014, 3:56:21 AM6/10/14
to dedupfilesystem-...@googlegroups.com
I noticed the hdd heads of the external hard disk where I located the sdfs volume so heavily moving and clicking, even when not copying to the volume, that I wonder how to minimize head movements.
In <volume>-volume-cfg.xml, I noticed:
<locations dedup-db-store="/path/to/sdfsvolume/ddb"  io-log="/path/to/sdfsvolume/ioperf.log"/>
<local-chunkstore ... hash-db-store="/path/to/sdfsvolume/chunkstore/hdb"  ... >
<volume ... path="/path/to/sdfsvolume/files"  ... >
I changed those paths to point to another hdd than chunk-store="/path/to/sdfsvolume/chunkstore/chunks", so the movement of the heads writing to the chunkstore would be minimized. And I moved those folders to their new locations.
My guess is that, most writing is to chunk-store and hash-db-store, and relocation of <volume ... path="/path/to/sdfsvolume/files"  ... >, where the links to the files probably are stored, to a third hdd would optimize performance further.
Is my guess correct?
I also noticed extensive i/o activity to the relocated folders, long after having written to the sdfsvolume. What is that diskactivity?
I notice that that i/o activity is not performed after sdfs unmount and remount. ???

Regards,
Eduard

Scott Middleton

unread,
Jun 11, 2014, 10:42:35 PM6/11/14
to dedupfilesystem-...@googlegroups.com


On Tuesday, 10 June 2014 15:56:21 UTC+8, edbv...@gmail.com wrote:
<snip>
My guess is that, most writing is to chunk-store and hash-db-store, and relocation of <volume ... path="/path/to/sdfsvolume/files"  ... >, where the links to the files probably are stored, to a third hdd would optimize performance further.
Is my guess correct?
 
Regards,
Eduard

Hi Eduard

I wondered something similar. So I currently running a test on 1.7TB of data.
So when I created a SDFS I did:
mkfs.sdfs --volume-name=test --volume-capacity=2TB --chunk-store-data-location=/SDFS_Chunk/ --chunk-store-hashdb-location=/SDFS_HDB/
mkdir /SDFS
mount.sdfs test /SDFS/ &

df -h
Filesystem                               Size  Used Avail Use% Mounted on
/dev/sda1                                455G  5.9G  426G   2% /
none                                     4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                     1.6G  8.0K  1.6G   1% /dev
tmpfs                                    326M  1.2M  325M   1% /run
none                                     5.0M     0  5.0M   0% /run/lock
none                                     1.6G   76K  1.6G   1% /run/shm
none                                     100M   48K  100M   1% /run/user
/dev/sdd2                                1.8T  1.7T   16G 100% /Source
/dev/sdb1                                1.8T  8.1G  1.7T   1% /SDFS_HDB
/dev/sdc1                                2.7T  4.0G  2.6T   1% /SDFS_Chunk
sdfs:/etc/sdfs/test-volume-cfg.xml:6442  2.1T   89G  2.0T   5% /SDFS

I'm still copying the data to see how it goes but so far the majority is stored in  --chunk-store-data-location
It also looks so far that around 12GB of storage on sdb1 and sdc1 translates to 89GB or depuped data. So that is a win! Copying is slow though 2.78MB/s but it is not a high spec machine.

Scott 

 

edbv...@gmail.com

unread,
Jun 13, 2014, 9:38:19 AM6/13/14
to dedupfilesystem-...@googlegroups.com
Scott,

I think, there is a risc, that your system root partition will get out of disk space and allready have been written to at /opt/sdfs/<volume name>/, dirs ddb, files, keys, by sdfs with a small 20GB.
Since you did not specify --base-path and --dedup-db-store, they are stored on the default location, /opt/sdfs/<volume name> (at least dirs "files" and "keys") and respective /opt/sdfs/<volume name>/ddb, so probably in your system root partition.

In my du report, I read:
+ du -csh /mnt/x/sdfs1/ddb
1002M /mnt/x/sdfs1/ddb
1002M total
+ du -csh /mnt/x/sdfs1/chunkstore/hdb
2.0G  /mnt/x/sdfs1/chunkstore/hdb
2.0G  total
+ du -csh /mnt/x/sdfs1/files
553M  /mnt/x/sdfs1/files
553M  total
+ du -csh /mnt/x/sdfs1/keys
8.0K  /mnt/x/sdfs1/keys
8.0K  total
+ du -csh /mnt/s/sdfs1
4.9G  /mnt/s/sdfs1
4.9G  total
+ du -csh /media/sdfs1
7.1G  /media/sdfs1
7.1G  total

In my case, the size of (--)dedup-db-store and --base-path together is 1002M+553MB=1555MB.
For your already stored 89GB,that would be 89*1.555/7.1=19.5GB

On Thursday, June 12, 2014 4:42:35 AM UTC+2, Scott Middleton wrote:

<snip>

 
So when I created a SDFS I did:
mkfs.sdfs --volume-name=test --volume-capacity=2TB --chunk-store-data-location=/SDFS_Chunk/ --chunk-store-hashdb-location=/SDFS_HDB/
mkdir /SDFS
mount.sdfs test /SDFS/ &

df -h
Filesystem                               Size  Used Avail Use% Mounted on
<snip>

edbv...@gmail.com

unread,
Jun 13, 2014, 9:55:56 AM6/13/14
to dedupfilesystem-...@googlegroups.com
I forgot to mention in my guess/question for further optimization: (--)dedup-db-store, which is also relocatable besides "path=" in <volume>-volume-cfg.xml, which is the same as mkfs.sdfs option --base-path, which will at least be the location of the folders "files" and "keys".

edbv...@gmail.com

unread,
Jun 13, 2014, 10:16:17 AM6/13/14
to dedupfilesystem-...@googlegroups.com
I will repost my question about extensive i/o activity long after having written to the sdfsvolume, as a new topic.

Scott Middleton

unread,
Jun 13, 2014, 10:18:21 AM6/13/14
to dedupfilesystem-...@googlegroups.com
So far with the very slow data copying:

So much of the example data is Shadow protect and virtualbox images.
The HDB and chunk data has not increased. but the sdfs mount has.

Filesystem                               Size  Used Avail Use% Mounted on
/dev/sda1                                455G  6.0G  426G   2% /
none                                     4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                     1.6G  8.0K  1.6G   1% /dev
tmpfs                                    326M  1.2M  325M   1% /run
none                                     5.0M     0  5.0M   0% /run/lock
none                                     1.6G   76K  1.6G   1% /run/shm
none                                     100M   52K  100M   1% /run/user
/dev/sdd2                                1.8T  1.7T   16G 100% /Source
/dev/sdb1                                1.8T  8.1G  1.7T   1% /SDFS_HDB
/dev/sdc1                                2.7T  4.0G  2.6T   1% /SDFS_Chunk
sdfs:/etc/sdfs/test-volume-cfg.xml:6442  2.1T  420G  1.6T  21% /SDFS


Scott Middleton
Managing Director
Linux Consultants Pty Ltd t/as AssureTek
Email - Sc...@assuretek.com.au
Phone - 1300 551 696
Mobile - 0400 212 724


On 13 June 2014 21:55, <edbv...@gmail.com> wrote:
I forgot to mention in my guess/question for further optimization: (--)dedup-db-store, which is also relocatable besides "path=" in <volume>-volume-cfg.xml, which is the same as mkfs.sdfs option --base-path, which will at least be the location of the folders "files" and "keys".

--
You received this message because you are subscribed to the Google Groups "dedupfilesystem-sdfs-user-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dedupfilesystem-sdfs-u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

edbv...@gmail.com

unread,
Jun 13, 2014, 11:32:07 AM6/13/14
to dedupfilesystem-...@googlegroups.com
Scott,
I retract my warning that your system root partition can run out of space, now that I noticed its free space in your df report. Be aware though of the sdfs metadata written there.
Did also you plan to test for determing the difference in writing speed:
1, sdfs writing datachuncks to one disk and hashes and metadata to another disk?
2, sdfs writing datachuncks, hashes and all metadata to the same disk,
If there will be no considerable difference, there is no point using an extra drive.
I wonder if instead of disk writing or head movement speed, the cpu speed could become the bottleneck, especially when encryption is enabled.
Difference could depend on data duplicity. When there is much duplicate data or many small files written in to the sdfs volume, writing of datachunks and hashes will be less, and writing of metadata in dirs ddb, files will be more.
Regards,
Eduard
On Friday, June 13, 2014 4:18:21 PM UTC+2, Scott Middleton wrote:
So far with the very slow data copying:
<snap>
Filesystem                               Size  Used Avail Use% Mounted on
/dev/sda1                                455G  6.0G  426G   2% /
<snap>

Sam Silverberg

unread,
Jun 13, 2014, 11:52:36 AM6/13/14
to dedupfilesystem-...@googlegroups.com
sorry for the delay but here are some general tips on data locations. There are 3 types of data:

1. File metadata - this is the data that tell sdfs about hashes within individual files. This data usually represents about 1/100th of the logical data being stored. It should be stored on resonable speed disk
2. Unique File Chunks - This is the actual unique data being stored. This data size is directly related to the physical amount of data stored on disk. It should be stored on a resonably fast disk but can be stored on on whatever is the store and retreival speed you need. IO access is somewhat random so RAID is good.
3. Hash DB - This is the hash database that store the lookup table for if and where Unique file chunks exist in the system. It represents about 1/40th the size of the dse maximum size. It should be stored on fast disk such as SSD since most of the IO and bottleneck will happen in this area. SDFS will attempt to store the entire hashdb into memory but if it cannot, will read from disk. IO is random to this file.

--base-path - Determines the base location where all the data will be setup unless otherwise specificed by either --chunk-store-hashdb-location or --chunk-store-data-location. If you do specify the chunkstore locations only the file meta data will be stored in that location. 
--chunk-store-hashdb-location determines the hash db location
--chunk-store-data-location determines the Unique file chunks location



edbv...@gmail.com

unread,
Jun 16, 2014, 5:07:16 AM6/16/14
to dedupfilesystem-...@googlegroups.com
Allow me to complete your list with --dedup-db-store , the duplicate data database.

Now that I have been copying some more GB of data to the sdfs volume with the chunk store on my external hdd and the other dirs on my internal hdd, I can report, that the external hdd has been writing with far less clicking noise. Random access activity (and subsequent wear) has diminished, and been moved to my faster internal hdd. That feels way better.

Regards,
Eduard

Chip Burke

unread,
Jun 9, 2016, 9:55:04 PM6/9/16
to dedupfilesystem-sdfs-user-discuss
This thread is slightly old, but it took me a while to find this information.

--chunk-store-data-location  < This is for the unique chunks. This is what will consist of the bulk of the data on disk.
--chunk-store-hashdb-location  < This is the HashDB that is usually in RAM, but is also persisted to disk as well as swapped to when RAM is low. Probably should be on an SSD for this reason.
--dedup-db-store < This is the file metadata. So this is the mapping file and contains the file system name space.

Do I have all of this correct?
Reply all
Reply to author
Forward
0 new messages