I am busy with an exciting setup create a nice approach to
have a huge cloud drive. I think a lot of us will need this seen
some cloud services shut down their unlimited space. (Google Drive)
This time I want to do it right, this means:
1) encryption
2) compression
3) deduplication (to save my wallet)
4) sync on the fly
After many try outs both on windows and linux, failures with 'cloud drive' solutions
such as tvtdrive, raidrive, expandrive, mountain duck + cryptomator
(and losing some data due to bad design, crash computer, losing cache etc)
Then I did consider to use Mega.nz seen they have a good offer for a lot of cloud space,
with zeroknowledge encryption (is it?). When I did see no way to check hashes, rclone has no 2FA for mega (yet) seen outdated go library, I need to trust Go (Rclone) - but I prefer python of course and even worse, it is based on symmetric encryption where you put your key in their client ! And I saw no way to replicate/backup the storage. I was done with that one as well.
So finally found an excellent s3 cloud provider, wasabi, seen I hate to pay egress and this
matters when You use s3ql ! So no payment for up and downloads. And there are s3 providers in Europe as well, the place I live.
So I did end up with s3ql + wasabi
And it is the best thing I ever could do so far !
s3ql and the team of
Nikolaus Rath is way too humble in my fair opinion on this project !
Belief me.
Ok. Now we go to my setup:
My setup is as follow. I do use s3 cloud space wasabi. On top of that, I run s3ql so I have a posix filesystem,
having encryption, compression and deduplication ! I use a permanent and
big cache on a 16TB zfs filesytem on FreeBSD. God, I like ZFS. That
filesystem ROCKS. No single filesystem come close to that piece of gold.
So in this setup, my ZFS secures me against bitrot somehow, however
it stores the cache only anyway of my files of s3ql.
s3ql provides to
me a mountable space to s3 wasabi, so I have a posix filesystem. It
does compress, deduplicate and encrypt. It syncs if needed the blocks
not in the cache on the fly. When it does crash, s3ql checks the cache against the files in s3... so I like that as well...
Once it is mounted I provide a share on the network. using NFS or SMB...
The big cache is used to backup with
borgbackup or to access files that are highly needed.... So I do something crazy here, I use a HUGE s3ql cache so I cache my bucket
so files do not need to be downloaded from s3 bucket.
So in my setup my source files are in s3 cloud, the local storage is used as cache for those files. Not the other way, I use a NAS to store my files and sync them with s3
Needless to say I need to have high trust on s3ql filesystem for this, so for now I test with unimportant data I can find back on the internet.
So now I need some help, because I have now my cloud drive, of course I can replicate and backup using s3 my bucket, but I am still very prudent and I want to have my loval backup. I want tripple security.
For this my eye is on borgbackup, seen it has deduplication. It has NOT s3 endpoints, however seen s3ql mounts my data in my linux tree, this is not an issue.
I do like borgbackup because it has also deduplication and it is pretty fast.
However the manual speaks about inodes.
To create a borg repo, there are 4 options (man borg create)
https://borgbackup.readthedocs.io/en/stable/usage/create.htmlBackup speed is increased by not reprocessing files that are already part of existing archives and weren’t modified. The detection of unmodified files is done by comparing multiple file metadata values with previous values kept in the files cache.
This comparison can operate in different modes as given by --files-cache:
ctime,size,inode (default)
mtime,size,inode (default behaviour of borg versions older than 1.1.0rc4)
ctime,size (ignore the inode number)
mtime,size (ignore the inode number)
rechunk,ctime (all files are considered modified - rechunk, cache ctime)
rechunk,mtime (all files are considered modified - rechunk, cache mtime)
disabled (disable the files cache, all files considered modified - rechunk)
At the moment, as said, I use a very large cache on my S3QL files. My cache equals (almost) stored data. A few TB. Which is not pretty space efficient, locally
however files are of course fast to access. For that I use a huge 10TB.
Not sure if that need to be really be a raid or ZFS to avoid data loss as well...
In fact what happens if the cache got corrupted? how does s3ql detects corrupted cached files? Suppose I put this on a single drive, not on raid, and cache gets corrupted.
Seen I use a large cache with a big timeframe not to timeout
Would my source files be corrupted? Does s3ql repairs my corrupted cache?
Do i need to schedule a command to check my local cache against corruption?
Will my corrupted cache been served until I perform some steps?
So why such huge cache. I do use that because I do not want files are downloaded each time from S3 space ....when I do backup. The files need to be compared if they are changed or not, however my files stored in s3ql rarely change.
Borgbackup states, if INODES are stable, you can leave checking by default (csize, size,inodes) They say SMB is not stable...
So are inodes stable in S3QL ? I see this is a fuse filesystem, but what about inodes, once mounted, can they be considered be stable for both cached and not cached files? Can it be that only the cached files do have stable inodes?
What if I make S3QL to use a much smaller (default size) cache, seen I rarely access and use the files...
How should I compare the uncached files with the borg repository in such way, if files are not cached or not changed, they are not downloaded? Do I need to exclude the inode option check?
What about cached files that are not changed?
So I need some guidance here.
Thanks.