Discovering SDFS

Skip to first unread message

Fabrice PLATEL

Aug 17, 2020, 12:48:41 PM8/17/20
to dedupfilesystem-sdfs-user-discuss

Hi everybody
I started evaluating sdfs after having looked for a software storage virtualization solution, more precisely to provide an hybrid storage NAS solution with local caching of data stored on S3 compatible cloud storage.
So to make it easier, I downloaded and installed the Datish Virtual Nas appliance from the download page of opendedup site.
The installation of the appliance was pretty straightforward and I could rather quickly configure a volume based on an an S3 bucket on my IBM cloud free tier account for testing.
To test it more deeply I have setup a VMware datastore on a NFS share over the volume and started provisionning one new VM and later I created a clone of this VM just to see deduplication at work.
At first it looked perfect but after a while, I found that the I/Os seem to be blocked sometimes for no reason, and even when there is not much activity (or no activity at all), my VMs take several seconds to initiate while it is instantly starting when they are stored on other datastores base on iSCSI storage from a physical NAS.
Then I tried to update the OS on the appliance, and the version of sdfs because it was not at the latest version... It is at this moment that I realized that this product seems to not be supported anymore because the appliance itself could not start at all as it was incompatible with the java version provided with sdfs , I solved this issue by duplicating the jre from the previous sdfs version so I could start again the appliance services (datish_viewer).
Then the volumes that I defined in the original version of sdfs on tha appliance could not be mounted because of errors when launching the java virtual machine for executing the mount of the filesystem... this was apparently caused by an error in the script /sbin/mount.sdfs with the computation of memory required for the java heapsize so I fixed it to 512MB and now the volumes can be mounted again correctly.

Now I am wondering whether the appliance configures optimal settings  for the volumes : I found that the parameter "max-file-write-buffers" was set to "1" meaning 1MB, whereas I could find in the sdfs documentation  that " This should be set to at least  512."
So I have set it to 512, hoping that it will give some benefits in write performance ..

I would like to know what is the current status of the project and is it going to be maintained further ? The latest sdfs package is provided with a recent openjdk version, howver the changelog doesnt mention any date so it is difficult to understand how often this project is updated.
the datish appliance doesnt seem to be supported anymore as the link for licence key request is not working anymore,this interface seems to be nice and simple but it also contains lots of bugs, so if I want to use sdfs I should not rely on this graphical interface and dig further the commandline and setup files am I wrong ?

Finally I am not sure that this product is what I am looking for , because I am looking for a solution that have these features :
1) should work as a cloud storage caching filesystem : I want to store files in the cloud but as efficiently as possible with frequently used (and recent) files kept in a local cache.
2) should allow data storage on S3 compatible cloud storage providers : I could verify that sdfs can mount correctly some storage from a storage buck on IBM cloud so it is fine for me , I would also like to know whether someone has ever used OVH cloud storage with sdfs. I am new in the cloud storage world and I assume that it just matter if S3 standard is supported by the cloud storage provider but there might be some small details I am not aware of...
3) I am looking for a robust solution with high availability : so I would like to have a solution with two frontend servers that can be used either in load-balancing or at least with one main server and the other as a standby, my concern is to make sure that if my main server is failing, the shared filesystem will still be available through a secondary server. Up to now I have not found this capability in sdfs : I found that it will allow to setup cluster for volume replication where each node of the cluster have its own local storage, but what I want is to have both nodes access the same shared storage (both should see the same S3 bucket where the files are stored in the cloud, and both have their local cache with proper mecanisms to ensure cache and storage consistency among the different nodes).

Does anybody knows or have some lab setup documentation with sdfs on at least two nodes with shared storage on cloud ?

I want to find a solution that is either free or reasonnably priced, so I plam to evaluate other solutions like GPFS (now IBM spectrum scale), or OCFS2, or even software or hardware solution provided by NAS vendors like QNAP or Synology as they provide their own cloud data storage gateway solutions. I would like to avoid solutions with pricing depending on data trafic.

Thanks in advance for any advice.

Fabrice PLATEL

Aug 17, 2020, 12:51:39 PM8/17/20
to dedupfilesystem-sdfs-user-discuss
I forgot one foundamental question for sdfs users :
is it reliable ? is it performant ?
For now, my lab with a VMware datastore is not encouraging regardind performance and reliability as I had several times found slowness, I/O timeout and crash of VMs ...

Sep 2, 2020, 10:30:27 AM9/2/20
to dedupfilesystem-sdfs-user-discuss
this is only my experience!!  you personal opinion can be different

is it reliable ? for me - yes. BUT i use modified mount script, and also separate VM for SDFS. Linux only VM used for running OpenDedup.
is it performant ? I use it only for backups OR back-end for SAMBA, NFS shared files. not  for active VMs.
Reply all
Reply to author
0 new messages