Remote bundle storage

181 views
Skip to first unread message

Bruno De Fraine

unread,
Mar 10, 2015, 6:41:07 AM3/10/15
to zba...@googlegroups.com
Hello,

I think zbackup looks very interesting, but I was wondering about the remote storage of bundles (SFTP, Amazon S3, Google Cloud Storage,...). The README talks about *mirroring* the backup repository on a remote server using rsync or gsutil, but I don't have enough space to keep local copies of the bundles. It seems that zbackup does only simple operations on the bundles (bulk write a bundle for the "backup" command, bulk read a bundle for the "restore" command (or does it access individual chunks within a bundle?), and delete a bundle for the "gc" command) so I think this can work well?

I may consider contributing functionality for the remote storage of bundles, but what would be the preferred interface? I see 3 options:
1) Filesystem: the "bundles" directory can be a mount of a remote filesystem (nfs, sshfs, s3fs,...). That may already work today.
2) Commands: zbackup could invoke some configurable scripts/commands to do bundle operations, e.g.:
write-bundle <id> <file> (exit code indicates success)
read-bundle <id>  (writes to stdout, exit code indicates success)
delete-bundle <id> (exit code indicates success)
3) Native: support for specific remote storage could be built into zbackup directly, e.g. using libssh or libs3.

Option 1 does not seem a very good fit: zbackup seems to need a relatively simple interface for bundles, while a filesystem is a rather advanced interface. Option 3 would provide the best performance, but it will require the most development work, and will bring more configuration options and build requirements. Option 2 seems very flexible without much impact on zbackup, but then I would wonder why compression/encryption of bundles is not handled in this way too (to also make it configurable). On the other hand, encryption is also used for other parts of the repo besides bundles, so it needs to be linked anyway. So which option would best fit the design of zbackup?

Another question: can you influence the size of bundles? For cloud storage, it may be more cost efficient to use larger bundles, even if this means a larger repository. The README mentions that chunks are up to 64K by default and bundles up to 2MB by default. (In my tests, bundles were about 1MB on average.) How can you configure other values?

Regards,
Bruno

Alex Sayers

unread,
Mar 10, 2015, 11:03:43 AM3/10/15
to Bruno De Fraine, zba...@googlegroups.com

> The README talks about *mirroring* the backup repository on a remote
> server using rsync or gsutil, but I don't have enough space to keep
> local copies of the bundles.

Rsync allows you to transfer files instead of mirroring them, by using
the --remove-source-files flag. This is how I do it:

rsync --archive --ignore-existing --remove-source-files "$LOCAL_REPO"/bundles "$REMOTE_REPO"
rsync --archive --ignore-existing --exclude "bundles" "$LOCAL_REPO"/* "$REMOTE_REPO"

> 1) Filesystem: the "bundles" directory can be a mount of a remote
> filesystem (nfs, sshfs, s3fs,...). That may already work today.

This does work, but I think the rsync strategy is better in cases where
you don't always have a connection to the remote host (eg. you have a
sketchy internet connection, the host is a NAS which you can only access
at home, etc.). Using a mount, bundles created while offline will just
sit on your local disk forever, rendering all subsequent bundles
potentially useless. Using rsync, the bundles remain local just as long
as you remain offline. As soon as you do a backup with a connection, the
backlog get transferred.

Transferring the bundles before synching the rest of the repo helps to
maintain the consistency of the remote repo in case the connection dies
mid-way. The rsync approach also has the added benefit of another
immutability guarantee for the remote repo, thanks to the
--ignore-existing flag.

The downside to using rsync vs. an NFS mount is that it serialises
backup creation and transfer. I can live with this though.

> 2) Commands: zbackup could invoke some configurable scripts/commands to do
> bundle operations, e.g.:
> write-bundle <id> <file> (exit code indicates success)
> read-bundle <id> (writes to stdout, exit code indicates success)
> delete-bundle <id> (exit code indicates success)

Or write a wrapper around zbackup and invoke that?

> 3) Native: support for specific remote storage could be built into zbackup
> directly, e.g. using libssh or libs3.

I would be against this idea. For me, zbackup's single-mindedness is its
key selling point.

All the best,
Alex

Bruno De Fraine

unread,
Mar 10, 2015, 11:28:59 AM3/10/15
to Alex Sayers, zba...@googlegroups.com
Hello Alex,

>> The README talks about *mirroring* the backup repository on a remote
>> server using rsync or gsutil, but I don't have enough space to keep
>> local copies of the bundles.
>
> Rsync allows you to transfer files instead of mirroring them, by using
> the --remove-source-files flag. This is how I do it:
>
> rsync --archive --ignore-existing --remove-source-files "$LOCAL_REPO"/bundles "$REMOTE_REPO"
> rsync --archive --ignore-existing --exclude "bundles" "$LOCAL_REPO"/* "$REMOTE_REPO”

I see, but now I wonder how you remove bundles that have become unused? Do you run the “gc” command on the remote repo? (If yes, that means you have to be able to run zbackup on the remote side, it cannot be “dumb" storage.)

While it is a nice property that previous bundles are not touched for subsequent backups (only the index is needed), the restore and gc operations will still touch the bundles, so I find it a little dangerous to simply snatch them from zbackup’s repo structure.

>> 2) Commands: zbackup could invoke some configurable scripts/commands to do
>> bundle operations, e.g.:
>> write-bundle <id> <file> (exit code indicates success)
>> read-bundle <id> (writes to stdout, exit code indicates success)
>> delete-bundle <id> (exit code indicates success)
>
> Or write a wrapper around zbackup and invoke that?

I don’t think that’s the same: I mean that zbackup would no longer directly access the bundles directory; instead it would access bundles indirectly, by invoking these commands, which can then take care of the bundle storage details such as local or remote storage. I’ve proposed the above commands as a very simple protocol, hoping that it is sufficient for how zbackup accesses the bundles for backup, restore and gc operations.

Regards,
Bruno


Alex Sayers

unread,
Mar 10, 2015, 7:14:35 PM3/10/15
to Bruno De Fraine, zba...@googlegroups.com

> While it is a nice property that previous bundles are not touched for
> subsequent backups (only the index is needed), the restore and gc
> operations will still touch the bundles, so I find it a little
> dangerous to simply snatch them from zbackup’s repo structure.

I think "dangerous" is the wrong word. It means that the restore and GC
functionality is broken unless you first go and manually mount the
remote repo. This is fine: if you're storing your bundles in a remote
location, there will always be times when these commands simply don't
work - namely, when the remote repo is unreachable.

I expect for most people, usage of zbackup (as a backup tool) follows a
pattern like this:

- Frequent automated operations: creating backups, moving new bundles to
off-site storage, and copying the index etc. to off-site storage.
- Rare manual operations: restoring a backup when a drive fails, or
removing a backup which accidentally included something big and
unimportant.

For the automated stuff, I use rsync for the reasons I outlined earlier.
For GCs and restores I would mount the whole remote repo as an NFS
share. It's true that this is a bad strategy if you're planning to
delete old backups in an automated way - but I'm not sure I'd recommend
doing that.

> write-bundle <id> <file> (exit code indicates success)
> read-bundle <id> (writes to stdout, exit code indicates success)
> delete-bundle <id> (exit code indicates success)
>
> zbackup would no longer directly access the bundles directory; instead
> it would access bundles indirectly, by invoking these commands

Ah I see. I'm not sure I see the advantage over just mounting a share
though.

All the best,
Alex
Reply all
Reply to author
Forward
0 new messages