This may seem like a stupid question, but is it possible to disable deduplication when performing backup jobs?
The data I want to backup is over 900 GB and ~140,000 files stored on a Windows 2012 server. Most of that capacity is home videos and other content that I know will not benefit from deduplication. This data changes very little from day to day but I want to back this data up daily over the internet, probably over SSH unless there is a good reason not to.
I have tried several backup programs and they all have problems one way or the other. In most situations my backup setup consists of a Linux Virtual Machine mount the SMB share and then backup the data to a remote location. I have performed these backups "locally" over the LAN before moving the hard drive to the remote location, so I am not backing up the entire ~900GB over the Internet.
I like the concept of zbackup because it natively supports encryption, and remote ssh data stores. (which saves me from having to use luks, or truecrypt for encryption and simplifies the usage of the remote datastore)
However I don't want to allocate 2+GB of RAM to this VM so it can store a deduplication hash table that I know will not be beneficial. Will zbackup start to cache the deduplication hash table to disk if it runs out of RAM? I don't care if the process takes a long time to complete, as long as it completes and can fail gracefully and resume where it left off.
Also, I'm not clear if zbackup is used by other people on datasets of similar size? Based on the comments on the webpage i think it might not be the case but please correct me if i am wrong.
I have tried a number of other open source backup programs and they all fail and die for one reason or another. Here are a few examples:
rdiff-backup -- runs very slow and if it has a problem with a single file in the 140k list, it will want to roll back the entire backup operation (which itself can take hours). does not natively support encryption so i have to use another underlying method. I like the fact that the filesystem is natively browseable, but i think it has problems creating so many hardlinks after over a period of time. I ran this for a long time but it is not reliable, and its status seems like an abandoned project and doesn't inspire confidence.
backuppc - does not have a concept of a remote data store, everything is "local". The only solution is to mount the /var/lib/backuppc directory to a remotely mounted sshfs partition that is encrypted with another underlying method. However I have had problems using this method (you can't create unix sockets over sshfs, and backuppc depends upon a socket for some reason).
duplicati - nice friendly interface and runs natively on the Windows server, but dies after only 200 MB of data with no error logs or indications of failure.
rsync - runs great in all scenarios but of course has no history or rollback capability in the event there is an accidental deletion
Attic looks nice, but it doesn't have a nice debian/ubuntu package and i'm getting lazy so i haven't tried it :) so I wanted to check out zbackup.