zbackup is a globally-deduplicating backup tool, based on the ideas found in rsync. Feed a large .tar into it, and it will store duplicate regions of it only once, then compress and optionally encrypt the result. Feed another .tar file, and it will also re-use any data found in any previous backups. This way only new changes are stored, and as long as the files are not very different, the amount of storage required is very low. Any of the backup files stored previously can be read back in full at any time. The program is format-agnostic, so you can feed virtually any files to it (any types of archives, proprietary formats, even raw disk images -- but see Caveats).
This is achieved by sliding a window with a rolling hash over the input at a byte granularity and checking whether the block in focus was ever met already. If a rolling hash matches, an additional full cryptographic hash is calculated to ensure the block is indeed the same. The deduplication happens then.
If you have a 32-bit system and a lot of cores, consider lowering the number of compression threads by passing --threads 4 or --threads 2 if the program runs out of address space when backing up (see why below, item 2). There should be no problem on a 64-bit system.
Is it safe to use zbackup for production data? Being free software, the program comes with no warranty of any kind. That said, it's perfectly safe for production, and here's why. When performing a backup, the program never modifies or deletes any existing files -- only new ones are created. It specifically checks for that, and the code paths involved are short and easy to inspect. Furthermore, each backup is protected by its SHA256 sum, which is calculated before piping the data into the deduplication logic. The code path doing that is also short and easy to inspect. When a backup is being restored, its SHA256 is calculated again and compared against the stored one. The program would fail on a mismatch. Therefore, to ensure safety it is enough to restore each backup to /dev/null immediately after creating it. If it restores fine, it will restore fine ever after.
To add some statistics, the author of the program has been using an older version of zbackup internally for over a year. The SHA256 check never ever failed. Again, even if it does, you would know immediately, so no work would be lost. Therefore you are welcome to try the program in production, and if you like it, stick with it.
The program does not have any facilities for sending your backup over the network. You can rsync the repo to another computer or use any kind of cloud storage capable of storing files. Since zbackup never modifies any existing files, the latter is especially easy -- just tell the upload tool you use not to upload any files which already exist on the remote side (e.g. with gsutil it's gsutil cp -R -n /my/backup gs:/mybackup/).
To aid with creating backups, there's an utility called tartool included with zbackup. The idea is the following: one sprinkles empty files called .backup and .no-backup across the entire filesystem. Directories where .backup files are placed are marked for backing up. Similarly, directories with .no-backup files are marked not to be backed up. Additionally, it is possible to place .backup-XYZ in the same directory where XYZ is to mark XYZ for backing up, or place .no-backup-XYZ to mark it not to be backed up. Then tartool can be run with three arguments -- the root directory to start from (can be /), the output includes file, and the output excludes file. The tool traverses over the given directory noting the .backup* and .no-backup* files and creating include and exclude lists for the tar utility. The tar utility could then be run as tar c --files-from includes --exclude-from excludes to store all chosen data.
This section tries do address the question on the maximum amount of data which can be held in a backup repository. What is meant here is the deduplicated data. The number of bytes in all source files ever fed into the repository doesn't matter, but the total size of the resulting repository does.Internally all input data is split into small blocks called chunks (up to 64k each by default). Chunks are collected into bundles (up to 2MB each by default), and those bundles are then compressed and encrypted.
You can mix LZMA and LZO in a repository. Each bundle file has a field that says how it was compressed, sozbackup will use the right method to decompress it. You could use an old zbackup respository with only LZMAbundles and start using LZO. However, please think twice before you do that because old versions of zbackupwon't be able to read those bundles.
There's a lot to be improved in the program. It was released with the minimum amount of functionality to be useful. It is also stable. This should hopefully stimulate people to join the development and add all those other fancy features. Here's a list of ideas:
The author is reachable over email at i...@zbackup.org. Please be constructive and don't ask for help using the program, though. In most cases it's best to stick to the forum, unless you have something to discuss with the author in private.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Lars Wirzenius stunned everyone on the 13 August 2017 with his decison to retire obnam, his very popular and useful backup program as of the end of this year! His decision threw a lot of obnam users into a state of disarray and anxiety about what they could use for backup for the foreseable future - myself included. But at least Lars told us with plenty of time to make decisons and change our backup strategy. So thank you Lars for all of your work on obnam, and thanks for giving us plenty of time to look at other solutions.
What this is showing us is the speed of development, and the ease at which its Debian maintainerscan build it to work within the Debian infrastructure. So with all that I now know about 'zbackup'I'm excluding it from any future usage or testing.
So next up is 'borgbackup'. This backs up, uses deduplication, and compression - meaning that thedatabase of backed up files is squashed to fit more in the space available (very simplisticlly). Butwhat it doesn't do is backups every n hours, meaning that it can only do one-a-day backups, whichI find unacceptable. So just for future reference, this is my borg backup script called "borgup" -
If you really do want to do n backups then this might help borg-cron-helper with its latest commit being on August 13, 2017. Using that program I was able to get n backups, but doing so broke the compression. So all in all I'm left feeling rather frustrated by borgbackup.
What this configuration script does - it backs up my home directory to my backc drive in the directory called 'restic-back' and in the process calling the password from its relevant file and also excludes my .cache files, which in the main aren't very useful.
Its next active line is telling restic to drop from its memory all hourly backups except for thelast one when they total 4. Hour 4 then becomes day 1, day 2, etc, and then after 8 daily backups, 7 are forgotten, and one becomes week 1. Etc, and so on, ad inifintitum.
They are both backing up \home which is currently at 218.93gb, so either I have a lot of cache files, which is perfectly possible as borgbackup does generate quite a few as part of its backup, or else something else is happening. Now it is very likely that I have misunderstood something about either of these two backup solutions, or just got it plain wrong. so if I am in error please let me know.
Things that I do like - the borgbackup documentation is very good (worth 9 out of 10 on any scoreboard). Its clean, very easy to read and understand, and it gives a sample config script to help get you started. Whereas the restic documentation isn't quite as concise but does show you how to set it up to backup from Amazon S3, but falls down on not having a sample config script to help get you started using restic.
Data is growing both in volume and importance. As time goes on, theamount of data that we need to store is growing, and the data itselfis becoming more and more critical for organizations. It is becomingincreasingly important to be able to back up and restore this informationquickly and reliably. Using cloud-based systems spreads out the data over many servers and locations.
Where I work, data has grown from less than 1GB on a single server tomore than 500GB spread out on more than 30 servers in multiple data centers.Catastrophes like the events at Distribute IT and Code Spaces demonstratethat ineffective backup practices can destroy a thriving business.Enterprise-level backup solutions typically cost a prohibitive amount,but the tools we need to create a backup solution exist within the OpenSource community.
After switching between many different backup strategies, I have foundwhat is close to an ideal backup solution for our particular use case.That involves regularly backing up many machines with huge numbers offiles as well as very large files and being able to restore any backuppreviously made.
Storing very large files: database backups can be very large but differin small ways that are not block-aligned (imagine inserting one byteat the beginning of a file). Byte-level deduplication means we store only the changes between the versions, similar to doing a diff.
64591212e2