Re: I uninstalled OpenMediaVault (because totally overkill for me) and replaced it with borgbackup and rsyncq

John Hasler

unread,

Sep 1, 2023, 8:30:06 AM9/1/23

to

Jason writes:
> Or how does your backup look like?

Just rsync.
--
John Hasler
jo...@sugarbit.com
Elmwood, WI USA

Default User

unread,

Sep 1, 2023, 3:30:05 PM9/1/23

to

On Fri, 2023-09-01 at 07:25 -0500, John Hasler wrote:
> Jason writes:
> > Or how does your backup look like?
>
> Just rsync.

Sorry, I just couldn't resist chiming in here.

I have never used OpenMediaVault.

I HAVE used a number of other backup methodologies, including
Borgbackup, for which I had high hopes, but was highly disappointed.

In the end, I currently have settled upon using rsnapshot to back up my
single-machine, single-user setup to external external usb hard drive
A, which is then copied to external usb hard drive B, using rsync. If
you can do rsync, you can do rsnapshot.

It's easy, especially when it comes to restoring, verifying, and
impromptu access to data, to use random stuff, or even to just "check
on" your data occasionally, to reassure yourself that it is still
there.

Yes, it does require considerable space (no data de-duplication), and
the rsync of the backup drives does take considerable time. But to me,
it is worth it, to avoid the methodological equivalent of "vendor lock-
in".

INB4: No, I don't do online backups. If people or organizations with
nose problems want my data they are going to have to make at least a
little effort to get it. And yes, I do know the 1-2-3 backup
philosophy, which does seem like a good idea for many (most?) users.

Also, I really like Clonezilla for occasional brute-force, scorched-
earth backups, such as when preparing for a complete reinstall or
release upgrade, or when switching to a new computer.

(Note: everything above applies to backing up my data: (/home/[user]
only). For backing up the actual system: (/ except for /home/[user]), I
use (and like) Timeshift. It has saved my donkey more than once!)

Well, that's what works for me. Feel free to disregard this. It may
not work for you. And, "if it breaks, you get to keep both pieces"!

Michel Verdier

unread,

Sep 1, 2023, 4:50:06 PM9/1/23

to

On 2023-09-01, Default User wrote:

> Yes, it does require considerable space (no data de-duplication), and
> the rsync of the backup drives does take considerable time. But to me,
> it is worth it, to avoid the methodological equivalent of "vendor lock-
> in".

You must have a bad configuration : rsnaphot de-duplicate using hard
links so you never have duplicated files. Keeping 52 weekly and 7 daily
and 24 hourly I need only 130% of original space. And it takes minimal
time as it transfers only changes and can use ssh compression.

Linux-Fan

unread,

Sep 1, 2023, 5:20:06 PM9/1/23

to

Default User writes:

> On Fri, 2023-09-01 at 07:25 -0500, John Hasler wrote:
> > Jason writes:
> > > Or how does your backup look like?

See https://lists.debian.org/debian-user/2019/11/msg00073.html
and https://lists.debian.org/debian-user/2019/11/msg00420.html

> > Just rsync.
>
>
> Sorry, I just couldn't resist chiming in here.
>
> I have never used OpenMediaVault.
>
> I HAVE used a number of other backup methodologies, including
> Borgbackup, for which I had high hopes, but was highly disappointed.

Would you care to share in what regards BorgBackup failed you?

I am currently using `bupstash` (not in Debian unfortunatly) and `jmbb`
(which I wrote for myself in 2013) in parallel and am considering switching
to `bupstash` which provides just about all features that I need.

Here are my notes on these programs:
* https://masysma.net/37/backup_tests_borg_bupstash_kopia.xhtml
* https://masysma.net/32/jmbb.xhtml

And also the Bupstash home page:
* https://bupstash.io/

IMHO borg is about the best backup program that you can get from the Debian
repositories (if you need any of the modern features that is). The only
issue I really had with it is that it was too slow for my use cases.

> In the end, I currently have settled upon using rsnapshot to back up my
> single-machine, single-user setup to external external usb hard drive
> A, which is then copied to external usb hard drive B, using rsync. If
> you can do rsync, you can do rsnapshot.
>
> It's easy, especially when it comes to restoring, verifying, and
> impromptu access to data, to use random stuff, or even to just "check
> on" your data occasionally, to reassure yourself that it is still
> there.
>
> Yes, it does require considerable space (no data de-duplication), and
> the rsync of the backup drives does take considerable time. But to me,
> it is worth it, to avoid the methodological equivalent of "vendor lock-
> in".

Yes, the “vendor lock-in” is really a thing especially when it comes to
restoring a backup but the fancy backup software just does not compile for
the platform or is not available for other reasons or you are stuck on a
Windows laptop without Admin permissions (wost case scenario?).

I mitigated this with `jmbb` by providing for a way to restore individual
files also using third-party utilities and I intend to mitigate this for
`bupstash` by writing my own restore program
(work-in progress: https://masysma.net/32/maxbupst.xhtml)

> INB4: No, I don't do online backups. If people or organizations with
> nose problems want my data they are going to have to make at least a
> little effort to get it. And yes, I do know the 1-2-3 backup
> philosophy, which does seem like a good idea for many (most?) users.

The problem I have with offline backups that it is an inconvenience to carry
around copies and that this means they are always more out of date than I
want them to be. Hence I rely on encryption to store backups on untrusted
storages.

[...]

Short but comprehensive resource on the subject (includes some advertising /
I am not affiliated / maybe this has outlived the product it advertises for?):
http://www.taobackup.com/index.html

YMMV
Linux-Fan

öö

Linux-Fan

unread,

Sep 1, 2023, 5:30:06 PM9/1/23

to

It highly depends on the type of data that is being backed up.

For my regular user files, I think a file-based deduplication works OK. But
for my VM images, hardlinks would only save space for those VMs which did
not run between the current and the preceding backup.

Btw.: I am personally not using any hard-link based approach, mostly due to
the missing encryption and integrity protection of data and metadata.

HTH
Linux-Fan

öö

Michel Verdier

unread,

Sep 1, 2023, 6:10:06 PM9/1/23

to

On 2023-09-01, Linux-Fan wrote:

> For my regular user files, I think a file-based deduplication works OK. But
> for my VM images, hardlinks would only save space for those VMs which did not
> run between the current and the preceding backup.

rsnapshot de-duplicate files not blocks. If you backup files from your VM
you can de-duplicate. If you want to backup whole VM images of course you
need a tool working on physical blocks.

> Btw.: I am personally not using any hard-link based approach, mostly due to
> the missing encryption and integrity protection of data and metadata.

rsnapshot use hard links on the backup filesystem. I encrypt this
filesystem. Integrity is the same for all backup systems. Doing 2 backups
on remote places cover this.

Michael Kjörling

unread,

Sep 2, 2023, 5:20:05 AM9/2/23

to

On 2 Sep 2023 00:04 +0200, from mv...@free.fr (Michel Verdier):

> rsnapshot use hard links on the backup filesystem.

More accurately, rsnapshot (which is basically a frontend to rsync)
tells rsync to do that; IIRC by passing --link-dest pointing at the
previous backup target directory.

And this is not an argument against rsnapshot/rsync; I use the
combination myself, plus a home-grown script to prune old backups
based on the amount of free space remaining on the backup disks rather
than a fixed backup count.

The one big downside of rsnapshot + rsync at least for me is that it
has no real concept of whether a backup run completed or was aborted;
the latter, for example, due to a system shutdown or out-of-memory
condition while it's running. That really shouldn't happen often or
even at all, but I've had it happen a few times over many years, and
it's a bit of a pain when it does happen because you pretty much have
to go in and delete the most recent backup and then renumber all the
previous ones to get back into a sane state on the backup target. Yes,
that can be added with another piece of a wrapper script, and I have
on occasion contemplated doing just that; but it happens sufficiently
rarely, and is noisy enough when it does happen, that it isn't really
worth the effort in my particular situation.

The biggest issue for me is ensuring that I am not dependent on
_anything_ on the backed-up system itself to start restoring that
system from a backup. In other words, enabling bare-metal restoration.
I figure that I can always download a Debian live ISO, put that on a
USB stick, set up an environment to access the (encrypted) backup
drive, set up partitions on new disks, and start copying; if I were
using backup software that uses some kind of custom format, that would
include keeping a copy of an installation package of that and whatever
else it needs for installing and running within a particular
distribution version, and making sure to specifically test that,
ideally without Internet access, so that I can get to the point of
starting to copy things back. (I figure that the boot loader is the
easy part to all this.)

--
Michael Kjörling 🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

Stefan Monnier

unread,

Sep 2, 2023, 1:30:06 PM9/2/23

to

> More accurately, rsnapshot (which is basically a frontend to rsync)
> tells rsync to do that; IIRC by passing --link-dest pointing at the
> previous backup target directory.

I've used a similar (tho hand-cooked) script running `rsync`.
I switched to Bup a few years ago and saw a significant reduction in the
size of my backups that is partly due to the deduplication *between*
machines (I backup several Debian machines to the same backup
repository) as well as because the deduplication occurs even when I move
files around (most obvious when I move directories filled with large
files like videos or music).

Another advantage I've notice with Bup is that my backups are done much
more quickly (because more of the work is done locally on the client
machines which have SSDs, there's a lot less network communication, and
much less work is done on the backup server which is a low-power machine
with spinning rust).

The great advantage of `rsync/rsnapshot` is that your backup is really
"live": you can go `grep` and `find` through it with ease. With Bup you
can theoretically do that as well (via `bup fuse`) but the performance
is significantly lower :-(

Stefan

Michel Verdier

unread,

Sep 2, 2023, 3:20:05 PM9/2/23

to

On 2023-09-02, Stefan Monnier wrote:

> I switched to Bup a few years ago and saw a significant reduction in the
> size of my backups that is partly due to the deduplication *between*
> machines (I backup several Debian machines to the same backup
> repository) as well as because the deduplication occurs even when I move
> files around (most obvious when I move directories filled with large
> files like videos or music).

I setup deduplication between hosts with rsnapshot as you do. But it was
a small gain in my case as the larger part was users data, logs and the
like. So always different between hosts. I gain only on system
files. Mainly /etc as I don't backup binaries and libs.
I almost never move large directories. But if needed it's easy to move it
also in rsnapshot directories.

David Christensen

unread,

Sep 2, 2023, 5:50:06 PM9/2/23

to

I have a SOHO LAN:

* My primary workstation is Debian Xfce on a 60GB 2.5" SATA SSD with 1G
boot, 1G swap, and 12G root partitions. It has one user (myself) with
minimal home data (e-mail and CVS working directories). I backup boot
and root.

* I keep the vast majority of my data on a FreeBSD server with Samba and
the CVS repository (via SSH) on a ZFS stripe of two mirrors containing
two 3TB 3.5" SATA HDD's each (e.g. 6TB RAID10). I backup the Samba data.

* I run rsync(1) and homebrew shell/ Perl scripts on the server to
backup the various LAN sources to backup destination file system tree on
the server. I have enabled ZFS compression on the pool and enabled
deduplication on the backup tree.

I ran some statistics for the daily driver backups in March. The
results were 4.9 GB backup size, 258 backups, 1.2 TB apparent total
backup storage, and 29.0 GB actual total backup storage. So, a savings
of about 42:1:

https://www.mail-archive.com/debia...@lists.debian.org/msg789807.html

Today, I collected some statistics for the backups of my data on the
file server:

2023-09-02 14:10:30 toor@f3 ~
# du -hsx /jail/samba/var/local/samba/dpchrist
693G /jail/samba/var/local/samba/dpchrist

2023-09-02 14:11:09 toor@f3 ~
# ls /jail/samba/var/local/samba/dpchrist/.zfs/snapshot | wc -l
98

2023-09-02 14:13:50 toor@f3 ~
# du -hs /jail/samba/var/local/samba/dpchrist/.zfs/snapshot
67T /jail/samba/var/local/samba/dpchrist/.zfs/snapshot

2023-09-02 14:19:24 toor@f3 ~
# zfs get
compression,compressratio,dedup,used,usedbydataset,usedbysnapshots
p3/ds2/samba/dpchrist | sort
NAME PROPERTY VALUE SOURCE
p3/ds2/samba/dpchrist compression lz4 inherited from p3
p3/ds2/samba/dpchrist compressratio 1.02x -
p3/ds2/samba/dpchrist dedup off default
p3/ds2/samba/dpchrist used 777G -
p3/ds2/samba/dpchrist usedbydataset 693G -
p3/ds2/samba/dpchrist usedbysnapshots 84.2G -

So, 693 GB backup size, 98 backups, 67 TB apparent total backup storage,
and 777 GB actual total backup storage. So, a savings of about 88:1.

What statistics are other readers seeing for similar use-cases and their
backup solutions?

David

Linux-Fan

unread,

Sep 2, 2023, 6:10:06 PM9/2/23

to

Michael Kjörling writes:

[...]

> The biggest issue for me is ensuring that I am not dependent on
> _anything_ on the backed-up system itself to start restoring that
> system from a backup. In other words, enabling bare-metal restoration.
> I figure that I can always download a Debian live ISO, put that on a
> USB stick, set up an environment to access the (encrypted) backup
> drive, set up partitions on new disks, and start copying; if I were
> using backup software that uses some kind of custom format, that would
> include keeping a copy of an installation package of that and whatever
> else it needs for installing and running within a particular
> distribution version, and making sure to specifically test that,
> ideally without Internet access, so that I can get to the point of
> starting to copy things back. (I figure that the boot loader is the
> easy part to all this.)

[...]

My personal way to approach this is as follows:

* I identify the material needed to restore.
It consists of

- the backup itself
- suitable Linux OS to run a restore process on
- the backup software
- the backup key
- a password to decrypt the backup key

* I create a live DVD (using `live-build`) containing
the Linux OS (including GUI, gparted and debian-installer!),
backup software (readily installed inside the live system),
backup key (as an encrypted file) but not the password nor
the backup itself.

Instead I decided to add:

- a copy of an SSH identity I can use to access a
read-only copy of the backup through my server and
- a copy of the encrypted password manager database
in case I forgot the backup password but not the
password manager password and also in case I would
be stuck with the Live DVD but not a copy of the
password such that I could use one of the password
manager passwords to access an online copy of the
backup.

* When I still used physical media in my backup strategy
these were external SSDs (not ideal in terms of data
retention, I know). I partitioned them and made them
able to boot the customized live system (through syslinux).

If you took such a drive and a PC of matching architecture
(say: amd64) then everything was in place to restore from
that drive (except for the password...). The resulting Debian
would probably be one release behind (because I rarely updated
the live image on the drive) but the data would be as up to
date as the contained backup. The assumtion here was that one
would be permitted to boot a custom OS off the drive or have
access to a Linux that could read it because I formatted the
“data” part with ext4 which is not natively readable on
Windows.

In addition to that, each copy of my backups includes a copy of the backup
program executable (a JAR file and a statically compiled Rust program in my
case) and some Windows exe files that could be used to restore the backup on
Windows machines in event of being stuck with a copy of the backup “only”.

While this scheme is pretty strong in theory, I update and test it far too
rarely since it is not really easy to script the process, but at least I
tested the correct working of the backup restore after creation of the live
image by starting the restore from inside a VM.

HTH
Linux-Fan

öö

Michel Verdier

unread,

Sep 2, 2023, 6:30:05 PM9/2/23

to

On 2023-09-02, David Christensen wrote:

> What statistics are other readers seeing for similar use-cases and their
> backup solutions?

I have 83 backups resulting to 130% of data. So a ratio of 63:1. But
because of performance limitation I don't use compression on backup
server. And also rsync deduplication is less consuming than zfs.

David Christensen

unread,

Sep 2, 2023, 8:00:08 PM9/2/23

to

On 9/2/23 15:26, Michel Verdier wrote:
> On 2023-09-02, David Christensen wrote:
>
>> What statistics are other readers seeing for similar use-cases and
>> their backup solutions?
>
> I have 83 backups resulting to 130% of data. So a ratio of 63:1.

Nice.

> But because of performance limitation I don't use compression on
> backup server.

What partitioning scheme, volume manager, file system, etc., do you use
on your backup server?

What is the performance limitation?

If you wanted compression on the backup server, how would you implement it?

> And also rsync deduplication is less consuming than zfs.

Please define your system and metrics.

David

Michel Verdier

unread,

Sep 3, 2023, 5:00:06 AM9/3/23

to

On 2023-09-02, David Christensen wrote:

> > I have 83 backups resulting to 130% of data. So a ratio of 63:1.
>
> Nice.

It's just related to the size of periodically modified data, no glory
here. Each usage must be analyzed.

> What partitioning scheme, volume manager, file system, etc., do you use on
> your backup server?

xfs on logical raid10, no VM

> What is the performance limitation?

> If you wanted compression on the backup server, how would you implement it?

I have both small CPU and RAM on the backup server. It degrades
severely if I setup some sort of compression. I choose to crypt
partitions instead. And get larger disks.

>> And also rsync deduplication is less consuming than zfs.
>
> Please define your system and metrics.

I must admit I only did a bench a long time ago on another system. I
don't remember metrics.

Michael Kjörling

unread,

Sep 3, 2023, 6:10:05 AM9/3/23

to

On 2 Sep 2023 14:49 -0700, from dpch...@holgerdanske.com (David Christensen):

> So, 693 GB backup size, 98 backups, 67 TB apparent total backup storage, and
> 777 GB actual total backup storage. So, a savings of about 88:1.
>
> What statistics are other readers seeing for similar use-cases and their
> backup solutions?

8.07 TiB physically stored on one backup drive holding 174 backups;
11.4 TiB total logical (excluding effects of compression) data on the
source; 7.83 TiB hot current logical data on the source (excluding
things like ZFS snapshots and compression).

Which by your way of calculating seems to work out to an about 246:1
savings compared to simply keeping every single copy in full and
uncompressed. (Which would require almost 2 PB of storage.) But this
figure is a bit exaggerated since there are parts of the backups that
I prune after a while while keeping the rest of that backup, so let's
be very generous and call it maybe a 100:1 savings in practice.

Which is still pretty good for something that only does raw copying
with whole-file deduplication.

I have a wide mix of file sizes and content types; everything from
tiny Maildir message files through photos in the tens of megabytes
range to VM disk image files in the tens of gigabytes range, ranging
from highly compressible to essentially incompressible, and ranging
from files that practically never change after I initially store them
to ones that change all the time.

David

unread,

Sep 3, 2023, 2:20:06 PM9/3/23

to

I have also been trying UpenMediaVault and it's an overkill for me.

I have a Dell R320 fitted with 8 1T SAS drives, the hardware raid is
turned off as OpenMediaVault uses sorfware RAID.

If I turn the hardware raid on can I use Debian as the opperating
system?

Thank you for any help,

David.

Gareth Evans

unread,

Sep 3, 2023, 3:40:05 PM9/3/23

to

On 3 Sep 2023, at 19:16, David <david....@ntlworld.com> wrote:

[...]

I have a Dell R320 fitted with 8 1T SAS drives, the hardware raid is
turned off as OpenMediaVault uses sorfware RAID.

If I turn the hardware raid on can I use Debian as the opperating
system?

Hi David,

In general, outside of certain relatively niche use cases, I believe software raid is to be preferred due to comparable performance and lack of future hardware compatibility/availability issues.

See for example (if with some contradictions)

https://www.techtarget.com/searchstorage/tip/Key-differences-in-software-RAID-vs-hardware-RAID

Debian supports software raid, in the form of MDRAID. I seemm to recall this is usually combined with LVM.

OpenZFS may be of interest, possibly root on ZFS too.

https://en.m.wikipedia.org/wiki/OpenZFS

https://openzfs.org/wiki/System_Administration

https://openzfs.github.io/openzfs-docs/Getting%20Started/Debian/index.html

There doesn't seem to be documentation for root on zfs for bookworm yet. I imagine the bullseye instructions, suitably adapted for repositories etc, might suffice, but I upgraded a root on ZFS installation from Buster to Bullseye to Bookworm (following the release notes in each case) so haven't tried to install from scratch in years.

You will need to turn hardware raid off in either case (MDRAID or ZFS)

If you have actual raid controller cards, that may not be possible iiuc - I'm sure someone (maybe even me) could advise if you provide details of raid hardware if relevant.

Best wishes,

Gareth

David Christensen

unread,

Sep 3, 2023, 5:30:07 PM9/3/23

to

On 9/3/23 03:02, Michael Kjörling wrote:
> 8.07 TiB physically stored on one backup drive holding 174 backups;
> 11.4 TiB total logical (excluding effects of compression) data on the
> source; 7.83 TiB hot current logical data on the source (excluding
> things like ZFS snapshots and compression).
>
> Which by your way of calculating seems to work out to an about 246:1
> savings compared to simply keeping every single copy in full and
> uncompressed.

Without seeing a console session, I am unsure what you mean by
"physically stored", "total logical (excluding effects of compression)
data", and "hot current logical data ... (excluding things like ZFS
snapshots and compression)".

What partitioning scheme, volume manager, file system, compression,

etc., do you use on your backup server?

I had thought you were using rsnapsnot/ rsync --link-dest, but you also
mention ZFS snapshots. Please clarify.

David

Michael Kjörling

unread,

Sep 4, 2023, 4:00:13 AM9/4/23

to

On 3 Sep 2023 14:20 -0700, from dpch...@holgerdanske.com (David Christensen):

>> 8.07 TiB physically stored on one backup drive holding 174 backups;
>> 11.4 TiB total logical (excluding effects of compression) data on the
>> source; 7.83 TiB hot current logical data on the source (excluding
>> things like ZFS snapshots and compression).
>

> Without seeing a console session, I am unsure what you mean by "physically
> stored", "total logical (excluding effects of compression) data", and "hot
> current logical data ... (excluding things like ZFS snapshots and
> compression)".

"Physically stored" is how much data, after compression and including
file system metadata, is actually written to disk and necessary for
all data to be accessible; it's the relevant metric for whether I need
to add disk space.

"Logical" is the sum of all apparent file sizes as visible to userland
utilities e.g. through stat(2).

Something like `dd if=/dev/zero of=$(mktemp) bs=1M count=1M` would
result in a large logical size but, because of compression, a very
small amount of physically stored data.

"Hot" is perhaps better referred to as the "current" data set; since
snapshots (and earlier backups) can include data which has since been
deleted, and is thus no longer current but still exists on disk.

> What partitioning scheme, volume manager, file system, compression, etc., do
> you use on your backup server?

ZFS within LUKS containers. If I recall correctly, the backup pool is
set to use zstd compression.

> I had thought you were using rsnapsnot/ rsync --link-dest, but you also
> mention ZFS snapshots. Please clarify.

Mostly ZFS with a rotating snapshot schedule on the source (the root
file system is ext4); copied using rsync --link-dest (through
rsnapshot) to a ZFS file system which doesn't use snapshots on the
backup target. Most of the ZFS file systems are set up to use
compression; there are a few where I know _a priori_ that the data is
in effect completely incompressible so there's no point in using CPU
to even try to compress that data, so those have compression turned
off.

(In ZFS, creating a file system is barely any more involved than
creating a directory, and all file systems come out of the same "pool"
which is a collection of >=1 storage devices set up with some
particular method of redundancy, possibly none. In more traditional
*nix parlace, a *nix file system is conceptually closer to a ZFS
pool.)

Hopefully this is more clear.

David Christensen

unread,

Sep 4, 2023, 5:00:06 PM9/4/23

to

On 9/4/23 00:53, Michael Kjörling wrote:
> On 3 Sep 2023 14:20 -0700, from dpch...@holgerdanske.com (David Christensen):

So for backup storage:

* We are both using ZFS with default compression.

* You are using 'rsync --link-dest' (via rsnapshot(1)) for deduplication
and I am using ZFS for deduplication.

Related:

* I am using zfs-auto-snapsnot(8) for snapsnots. Are you using
rsnapsnot(1) for snapshots?

Here are the current backups for my current daily driver:

2023-09-04 13:26:15 toor@f3 ~
# zfs get -o property,value
compression,compressratio,dedup,logicalreferenced,logicalused,refcompressratio,referenced,used,usedbydataset,usedbysnapshots
p3/backup/taz.tracy.holgerdanske.com
PROPERTY VALUE
compression lz4
compressratio 2.14x
dedup verify
logicalreferenced 6.59G
logicalused 48.7G
refcompressratio 1.83x
referenced 3.89G
used 23.4G
usedbydataset 3.89G
usedbysnapshots 19.5G

2023-09-04 13:26:36 toor@f3 ~
# ls -1 /var/local/backup/taz.tracy.holgerdanske.com/.zfs/snapshot | wc -l
186

2023-09-04 13:27:15 toor@f3 ~
# du -hs /var/local/backup/taz.tracy.holgerdanske.com/
/var/local/backup/taz.tracy.holgerdanske.com/.zfs
3.9G /var/local/backup/taz.tracy.holgerdanske.com/
722G /var/local/backup/taz.tracy.holgerdanske.com/.zfs

2023-09-04 13:28:02 toor@f3 ~
# crontab -l
9 3 * * * /usr/local/sbin/zfs-auto-snapshot -k d 40
21 3 1 * * /usr/local/sbin/zfs-auto-snapshot -k m 99
27 3 1 1 * /usr/local/sbin/zfs-auto-snapshot -k y 99

Observations:

* du(1) of the backup file system matches ZFS properties 'referenced'
and 'usedbydataset'.

* I am unable to correlate du(1) of the snapshots to any ZFS properties
-- du(1) reports much more storage than ZFS 'usedbysnapshots', even when
scaled by 'compressratio'.

David

Michael Kjörling

unread,

Sep 5, 2023, 10:40:05 AM9/5/23

to

On 4 Sep 2023 13:57 -0700, from dpch...@holgerdanske.com (David Christensen):

> * I am using zfs-auto-snapsnot(8) for snapsnots. Are you using rsnapsnot(1)
> for snapshots?

No. I'm using ZFS snapshots on the source, but not for backup
purposes. (I have contemplated doing that, but it would increase
complexity a fair bit.) The backup target is not snapshotted at the
block storage or file system level; however, rsync --link-dest uses
hardlinks to deduplicate whole files.

> * du(1) of the backup file system matches ZFS properties 'referenced' and
> 'usedbydataset'.

This would be expected, depending on exact specifics (what data du
traverses over and what your ZFS dataset layout is). To more closely
match the the _apparent_ size of the files, you'd look at e.g.
logicalreferenced or logicalused.

> * I am unable to correlate du(1) of the snapshots to any ZFS properties --
> du(1) reports much more storage than ZFS 'usedbysnapshots', even when scaled
> by 'compressratio'.

This would also be expected, as ZFS snapshots are copy-on-write and
thus in effect only bookkeep a delta, whereas du counts the apparent
size of all files accessible under a path and ZFS snapshots allow
access to all files within the file system as they appeared at the
moment the snapshot was created. There are nuances and caveats
involved but, as a first approximation, immediately after taking a ZFS
snapshot the size of the snapshot is zero (plus a small amount of
metadata overhead for the snapshot itself) regardless of the size of
the underlying dataset, and the apparent size of the snapshot grows as
changes are made to the underlying dataset which cause some data to be
referenced only by the snapshot.

In general, ZFS disk space usage accounting for snapshots is really
rather non-intuitive, but it does make more sense when you consider
that ZFS is a copy-on-write file system and that snapshots largely
boil down to an atomic point-in-time marker for dataset state.

(In ZFS, a dataset can be either a file system optionally exposed at a
directory mountpoint or a volume exposed as a block device.)

Default User

unread,

Sep 5, 2023, 8:10:06 PM9/5/23

to

Okay, first:

I said that my "system" files (everything except /home/[user] was
backed up using Timeshift. That is correct. It is done by Timeshift
automatically, once a day, as well as weekly, monthly, and yearly.

But I was wrong about rsnapshot. I said that it was set up to back up
/home/[user] only. That is not correct. I now realize that I have it
set up to back up all of /, except for:
exclude /dev/*
exclude /proc/*
exclude /sys/*
exclude /tmp/*
exclude /run/*
exclude /mnt/*
exclude /media/*
exclude /lost+found
exclude /home/lost+found
exclude /var/lost+found

Now sudo du -sh / says that / seems to be using about 30 Gb. But sudo
du -sh /media/user/rsnapshot_backups_of_host, says that the backup
directory, /media/user/rsnapshot_backups_of_host on backup drive A, is
using a whopping 88 Gb for 24 hourly, 7 daily, and 3 weekly!

I am thinking, that CAN'T be right.
Maybe each hard link is being counted as a full, actual file, when
adding up the space allegedly used.

So, how can I determine how much space is really being used for the
backups?

-----------------------------------------------------------------------

FWIW, here is my /etc/rsnapshot.conf file:
#################################################
# rsnapshot.conf - rsnapshot configuration file #
#################################################
# #
# PLEASE BE AWARE OF THE FOLLOWING RULE: #
# #
# This file requires tabs between elements #
# #
#################################################

#######################
# CONFIG FILE VERSION #
#######################

config_version 1.2

###########################
# SNAPSHOT ROOT DIRECTORY #
###########################

# All snapshots will be stored under this root directory.
#
snapshot_root /media/user/MSD00001/rsnapshot_backups_of_host/

# If no_create_root is enabled, rsnapshot will not automatically create
the
# snapshot_root directory. This is particularly useful if you are
backing
# up to removable media, such as a FireWire or USB drive.
#
no_create_root 1

#################################
# EXTERNAL PROGRAM DEPENDENCIES #
#################################

# LINUX USERS: Be sure to uncomment "cmd_cp". This gives you extra
features.
# EVERYONE ELSE: Leave "cmd_cp" commented out for compatibility.
#
# See the README file or the man page for more details.
#
cmd_cp /bin/cp

# uncomment this to use the rm program instead of the built-in perl
routine.
#
cmd_rm /bin/rm

# rsync must be enabled for anything to work. This is the only command
that
# must be enabled.
#
cmd_rsync /usr/bin/rsync

# Uncomment this to enable remote ssh backups over rsync.
#
#cmd_ssh /usr/bin/ssh

# Comment this out to disable syslog support.
#
cmd_logger /usr/bin/logger

# Uncomment this to specify the path to "du" for disk usage checks.
# If you have an older version of "du", you may also want to check the
# "du_args" parameter below.
#
cmd_du /usr/bin/du

# Uncomment this to specify the path to rsnapshot-diff.
#
cmd_rsnapshot_diff /usr/bin/rsnapshot-diff

# Specify the path to a script (and any optional arguments) to run
right
# before rsnapshot syncs files
#
#cmd_preexec /path/to/preexec/script

# Specify the path to a script (and any optional arguments) to run
right
# after rsnapshot syncs files
#
#cmd_postexec /path/to/postexec/script

# Paths to lvcreate, lvremove, mount and umount commands, for use with
# Linux LVMs.
#
#linux_lvm_cmd_lvcreate /sbin/lvcreate
#linux_lvm_cmd_lvremove /sbin/lvremove
#linux_lvm_cmd_mount /bin/mount
#linux_lvm_cmd_umount /bin/umount

#########################################
# BACKUP LEVELS / INTERVALS #
# Must be unique and in ascending order #
# e.g. alpha, beta, gamma, etc. #
#########################################

# retain hourly 24
retain daily 7
retain weekly 4
retain monthly 12
retain yearly 100

############################################
# GLOBAL OPTIONS #
# All are optional, with sensible defaults #
############################################

# Verbose level, 1 through 5.
# 1 Quiet Print fatal errors only
# 2 Default Print errors and warnings only
# 3 Verbose Show equivalent shell commands being executed
# 4 Extra Verbose Show extra verbose information
# 5 Debug mode Everything
#
verbose 5

# Same as "verbose" above, but controls the amount of data sent to the
# logfile, if one is being used. The default is 3.
# If you want the rsync output, you have to set it to 4
#
loglevel 3

# If you enable this, data will be written to the file you specify. The
# amount of data written is controlled by the "loglevel" parameter.
#
logfile /var/log/rsnapshot.log

# If enabled, rsnapshot will write a lockfile to prevent two instances
# from running simultaneously (and messing up the snapshot_root).
# If you enable this, make sure the lockfile directory is not world
# writable. Otherwise anyone can prevent the program from running.
#
lockfile /var/run/rsnapshot.pid

# By default, rsnapshot check lockfile, check if PID is running
# and if not, consider lockfile as stale, then start
# Enabling this stop rsnapshot if PID in lockfile is not running
#
#stop_on_stale_lockfile 0

# Default rsync args. All rsync commands have at least these options
set.
#
#rsync_short_args -a
#rsync_long_args --delete --numeric-ids --relative --delete-excluded

# ssh has no args passed by default, but you can specify some here.
#
#ssh_args -p 22

# Default arguments for the "du" program (for disk space reporting).
# The GNU version of "du" is preferred. See the man page for more
details.
# If your version of "du" doesn't support the -h flag, try -k flag
instead.
#
#du_args -csh

# If this is enabled, rsync won't span filesystem partitions within a
# backup point. This essentially passes the -x option to rsync.
# The default is 0 (off).
#
#one_fs 0

# The include and exclude parameters, if enabled, simply get passed
directly
# to rsync. If you have multiple include/exclude patterns, put each one
on a
# separate line. Please look up the --include and --exclude options in
the
# rsync man page for more details on how to specify file name patterns.
#
#include ???
#include ???
#exclude ???
#exclude ???
exclude /dev/*
exclude /proc/*
exclude /sys/*
exclude /tmp/*
exclude /run/*
exclude /mnt/*
exclude /media/*
exclude /lost+found
exclude /home/lost+found
exclude /var/lost+found
#
# The include_file and exclude_file parameters, if enabled, simply get
# passed directly to rsync. Please look up the --include-from and
# --exclude-from options in the rsync man page for more details.
#
#include_file /path/to/include/file
#exclude_file /path/to/exclude/file

# If your version of rsync supports --link-dest, consider enabling
this.
# This is the best way to support special files (FIFOs, etc) cross-
platform.
# The default is 0 (off).
#
link_dest 1

# When sync_first is enabled, it changes the default behaviour of
rsnapshot.
# Normally, when rsnapshot is called with its lowest interval
# (i.e.: "rsnapshot alpha"), it will sync files AND rotate the lowest
# intervals. With sync_first enabled, "rsnapshot sync" handles the file
sync,
# and all interval calls simply rotate files. See the man page for more
# details. The default is 0 (off).
#
#sync_first 0

# If enabled, rsnapshot will move the oldest directory for each
interval
# to [interval_name].delete, then it will remove the lockfile and
delete
# that directory just before it exits. The default is 0 (off).
#
#use_lazy_deletes 0

# Number of rsync re-tries. If you experience any network problems or
# network card issues that tend to cause ssh to fail with errors like
# "Corrupted MAC on input", for example, set this to a non-zero value
# to have the rsync operation re-tried.
#
#rsync_numtries 0

# LVM parameters. Used to backup with creating lvm snapshot before
backup
# and removing it after. This should ensure consistency of data in some
special
# cases
#
# LVM snapshot(s) size (lvcreate --size option).
#
#linux_lvm_snapshotsize 100M

# Name to be used when creating the LVM logical volume snapshot(s).
#
#linux_lvm_snapshotname rsnapshot

# Path to the LVM Volume Groups.
#
#linux_lvm_vgpath /dev

# Mount point to use to temporarily mount the snapshot(s).
#
#linux_lvm_mountpath /path/to/mount/lvm/snapshot/during/backup

###############################
### BACKUP POINTS / SCRIPTS ###
###############################

# LOCALHOST
backup / localhost/
#backup /home/ localhost/
#backup /etc/ localhost/
#backup /usr/ localhost/
#backup /usr/local/ localhost/
#backup /var/ localhost/
#backup /var/log/rsnapshot localhost/
#backup /etc/passwd localhost/
#backup /home/foo/My Documents/ localhost/
#backup /foo/bar/ localhost/ one_fs=1,rsync_short_args=-urltvpog
#backup_script /usr/local/bin/backup_pgsql.sh localhost/postgres/
# You must set linux_lvm_* parameters below before using lvm snapshots
#backup lvm://vg0/xen-home/ lvm-vg0/xen-home/

# EXAMPLE.COM
#backup_exec /bin/date "+ backup of example.com started at %c"
#backup ro...@example.com:/home/ example.com/ +rsync_long_args=--
bwlimit=16,exclude=core
#backup ro...@example.com:/etc/ example.com/ exclude=mtab,exclude=core
#backup_exec ssh ro...@example.com "mysqldump -A >
/var/db/dump/mysql.sql"
#backup ro...@example.com:/var/db/dump/ example.com/
#backup_exec /bin/date "+ backup of example.com ended at %c"

# CVS.SOURCEFORGE.NET
#backup_script /usr/local/bin/backup_rsnapshot_cvsroot.sh
rsnapshot.cvs.sourceforge.net/

# RSYNC.SAMBA.ORG
#backup rsync://rsync.samba.org/rsyncftp/ rsync.samba.org/rsyncftp/

----------------------------------------------------------------------

And here is the default /etc/rsnapshot.conf that comes with rsnapshot:
#################################################
# rsnapshot.conf - rsnapshot configuration file #
#################################################
# #
# PLEASE BE AWARE OF THE FOLLOWING RULE: #
# #
# This file requires tabs between elements #
# #
#################################################

#######################
# CONFIG FILE VERSION #
#######################

config_version 1.2

###########################
# SNAPSHOT ROOT DIRECTORY #
###########################

# All snapshots will be stored under this root directory.
#
snapshot_root /var/cache/rsnapshot/

# If no_create_root is enabled, rsnapshot will not automatically create
the
# snapshot_root directory. This is particularly useful if you are
backing
# up to removable media, such as a FireWire or USB drive.
#
#no_create_root 1

#################################
# EXTERNAL PROGRAM DEPENDENCIES #
#################################

# LINUX USERS: Be sure to uncomment "cmd_cp". This gives you extra
features.
# EVERYONE ELSE: Leave "cmd_cp" commented out for compatibility.
#
# See the README file or the man page for more details.
#
cmd_cp /bin/cp

# uncomment this to use the rm program instead of the built-in perl
routine.
#
cmd_rm /bin/rm

# rsync must be enabled for anything to work. This is the only command
that
# must be enabled.
#
cmd_rsync /usr/bin/rsync

# Uncomment this to enable remote ssh backups over rsync.
#
#cmd_ssh /usr/bin/ssh

# Comment this out to disable syslog support.
#
cmd_logger /usr/bin/logger

# Uncomment this to specify the path to "du" for disk usage checks.
# If you have an older version of "du", you may also want to check the
# "du_args" parameter below.
#
#cmd_du /usr/bin/du

# Uncomment this to specify the path to rsnapshot-diff.
#
#cmd_rsnapshot_diff /usr/bin/rsnapshot-diff

# Specify the path to a script (and any optional arguments) to run
right
# before rsnapshot syncs files
#
#cmd_preexec /path/to/preexec/script

# Specify the path to a script (and any optional arguments) to run
right
# after rsnapshot syncs files
#
#cmd_postexec /path/to/postexec/script

# Paths to lvcreate, lvremove, mount and umount commands, for use with
# Linux LVMs.
#
#linux_lvm_cmd_lvcreate /sbin/lvcreate
#linux_lvm_cmd_lvremove /sbin/lvremove
#linux_lvm_cmd_mount /bin/mount
#linux_lvm_cmd_umount /bin/umount

#########################################
# BACKUP LEVELS / INTERVALS #
# Must be unique and in ascending order #
# e.g. alpha, beta, gamma, etc. #
#########################################

retain alpha 6
retain beta 7
retain gamma 4
#retain delta 3

############################################
# GLOBAL OPTIONS #
# All are optional, with sensible defaults #
############################################

# Verbose level, 1 through 5.
# 1 Quiet Print fatal errors only
# 2 Default Print errors and warnings only
# 3 Verbose Show equivalent shell commands being executed
# 4 Extra Verbose Show extra verbose information
# 5 Debug mode Everything
#
verbose 2

# Same as "verbose" above, but controls the amount of data sent to the
# logfile, if one is being used. The default is 3.
# If you want the rsync output, you have to set it to 4
#
loglevel 3

# If you enable this, data will be written to the file you specify. The
# amount of data written is controlled by the "loglevel" parameter.
#
#logfile /var/log/rsnapshot.log

# If enabled, rsnapshot will write a lockfile to prevent two instances
# from running simultaneously (and messing up the snapshot_root).
# If you enable this, make sure the lockfile directory is not world
# writable. Otherwise anyone can prevent the program from running.
#
lockfile /var/run/rsnapshot.pid

# By default, rsnapshot check lockfile, check if PID is running
# and if not, consider lockfile as stale, then start
# Enabling this stop rsnapshot if PID in lockfile is not running
#
#stop_on_stale_lockfile 0

# Default rsync args. All rsync commands have at least these options
set.
#
#rsync_short_args -a
#rsync_long_args --delete --numeric-ids --relative --delete-excluded

# ssh has no args passed by default, but you can specify some here.
#
#ssh_args -p 22

# Default arguments for the "du" program (for disk space reporting).
# The GNU version of "du" is preferred. See the man page for more
details.
# If your version of "du" doesn't support the -h flag, try -k flag
instead.
#
#du_args -csh

# If this is enabled, rsync won't span filesystem partitions within a
# backup point. This essentially passes the -x option to rsync.
# The default is 0 (off).
#
#one_fs 0

# The include and exclude parameters, if enabled, simply get passed
directly
# to rsync. If you have multiple include/exclude patterns, put each one
on a
# separate line. Please look up the --include and --exclude options in
the
# rsync man page for more details on how to specify file name patterns.
#
#include ???
#include ???
#exclude ???
#exclude ???

# The include_file and exclude_file parameters, if enabled, simply get
# passed directly to rsync. Please look up the --include-from and
# --exclude-from options in the rsync man page for more details.
#
#include_file /path/to/include/file
#exclude_file /path/to/exclude/file

# If your version of rsync supports --link-dest, consider enabling
this.
# This is the best way to support special files (FIFOs, etc) cross-
platform.
# The default is 0 (off).
#
#link_dest 0

# When sync_first is enabled, it changes the default behaviour of
rsnapshot.
# Normally, when rsnapshot is called with its lowest interval
# (i.e.: "rsnapshot alpha"), it will sync files AND rotate the lowest
# intervals. With sync_first enabled, "rsnapshot sync" handles the file
sync,
# and all interval calls simply rotate files. See the man page for more
# details. The default is 0 (off).
#
#sync_first 0

# If enabled, rsnapshot will move the oldest directory for each
interval
# to [interval_name].delete, then it will remove the lockfile and
delete
# that directory just before it exits. The default is 0 (off).
#
#use_lazy_deletes 0

# Number of rsync re-tries. If you experience any network problems or
# network card issues that tend to cause ssh to fail with errors like
# "Corrupted MAC on input", for example, set this to a non-zero value
# to have the rsync operation re-tried.
#
#rsync_numtries 0

# LVM parameters. Used to backup with creating lvm snapshot before
backup
# and removing it after. This should ensure consistency of data in some
special
# cases
#
# LVM snapshot(s) size (lvcreate --size option).
#
#linux_lvm_snapshotsize 100M

# Name to be used when creating the LVM logical volume snapshot(s).
#
#linux_lvm_snapshotname rsnapshot

# Path to the LVM Volume Groups.
#
#linux_lvm_vgpath /dev

# Mount point to use to temporarily mount the snapshot(s).
#
#linux_lvm_mountpath /path/to/mount/lvm/snapshot/during/backup

###############################
### BACKUP POINTS / SCRIPTS ###
###############################

# LOCALHOST
backup /home/ localhost/
backup /etc/ localhost/
backup /usr/local/ localhost/
#backup /var/log/rsnapshot localhost/
#backup /etc/passwd localhost/
#backup /home/foo/My Documents/ localhost/
#backup /foo/bar/ localhost/ one_fs=1,rsync_short_args=-urltvpog
#backup_script /usr/local/bin/backup_pgsql.sh localhost/postgres/
# You must set linux_lvm_* parameters below before using lvm snapshots
#backup lvm://vg0/xen-home/ lvm-vg0/xen-home/

# EXAMPLE.COM
#backup_exec /bin/date "+ backup of example.com started at %c"
#backup ro...@example.com:/home/ example.com/ +rsync_long_args=--
bwlimit=16,exclude=core
#backup ro...@example.com:/etc/ example.com/ exclude=mtab,exclude=core
#backup_exec ssh ro...@example.com "mysqldump -A >
/var/db/dump/mysql.sql"
#backup ro...@example.com:/var/db/dump/ example.com/
#backup_exec /bin/date "+ backup of example.com ended at %c"

# CVS.SOURCEFORGE.NET
#backup_script /usr/local/bin/backup_rsnapshot_cvsroot.sh
rsnapshot.cvs.sourceforge.net/

# RSYNC.SAMBA.ORG
#backup rsync://rsync.samba.org/rsyncftp/ rsync.samba.org/rsyncftp/

-------------------------------------------------------------------

[BTW, the rsnapshot backups don't seem to take too much time, but doing
rsync of external usb backup drive A to external usb backup drive B
does take over 90 minutes each time. And that's once a day, every day!
Most of that time is apparently not for data transfer, but for rsync
building the indexes it needs each time.]

Here is the command I use to rsync backup drive A
(/media/default/MSD00001) to backup drive B (/media/default/MSD00002):

time sudo rsync -aAXHxvv --delete-after --numeric-ids --
info=progress2,stats2,name2 --
exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media
/*","/lost+found"} /media/default/MSD00001/ /media/default/MSD00002/

David Christensen

unread,

Sep 5, 2023, 8:20:07 PM9/5/23

to

On 9/5/23 07:34, Michael Kjörling wrote:
> On 4 Sep 2023 13:57 -0700, from dpch...@holgerdanske.com (David Christensen):

>> * I am using zfs-auto-snapshot(8) for snapsnots. Are you using rsnapshot(1)

>> for snapshots?
>
> No. I'm using ZFS snapshots on the source, but not for backup
> purposes. (I have contemplated doing that, but it would increase
> complexity a fair bit.) The backup target is not snapshotted at the > block storage or file system level; however, rsync --link-dest uses
> hardlinks to deduplicate whole files.

+1 for complexity of ZFS backups via snapshots and replication.

My question was incongruous, as "snapshot" has different meanings for
ZFS and rsnapshot(1):

* https://docs.oracle.com/cd/E18752_01/html/819-5461/ftyue.html

snapshot

A read-only copy of a file system or volume at a given point in
time.

* https://rsnapshot.org/rsnapshot/docs/docbook/rest.html

Using rsnapshot, it is possible to take snapshots of your
filesystems at different points in time.

As I understand your network topology and backup strategy, it appears
that you are using rsnapshot(1) for snapshots (in the rsnapshot(1) sense
of the term).

>> * du(1) of the backup file system matches ZFS properties 'referenced' and
>> 'usedbydataset'.
>
> This would be expected, depending on exact specifics (what data du
> traverses over and what your ZFS dataset layout is). To more closely
> match the the _apparent_ size of the files, you'd look at e.g.
> logicalreferenced or logicalused.
>
>> * I am unable to correlate du(1) of the snapshots to any ZFS properties --
>> du(1) reports much more storage than ZFS 'usedbysnapshots', even when scaled
>> by 'compressratio'.
>
> This would also be expected, as ZFS snapshots are copy-on-write and
> thus in effect only bookkeep a delta, whereas du counts the apparent
> size of all files accessible under a path and ZFS snapshots allow
> access to all files within the file system as they appeared at the
> moment the snapshot was created. There are nuances and caveats
> involved but, as a first approximation, immediately after taking a ZFS
> snapshot the size of the snapshot is zero (plus a small amount of
> metadata overhead for the snapshot itself) regardless of the size of
> the underlying dataset, and the apparent size of the snapshot grows as
> changes are made to the underlying dataset which cause some data to be
> referenced only by the snapshot.
>
> In general, ZFS disk space usage accounting for snapshots is really
> rather non-intuitive, but it does make more sense when you consider
> that ZFS is a copy-on-write file system and that snapshots largely
> boil down to an atomic point-in-time marker for dataset state.

Okay. My server contains one backup ZFS file system for each host on my
network. So, the 'logicalreferenced', 'logicalused', and
'usedbysnapshots' properties I posted for one host's backup file system
are affected by the ZFS pool aggregate COW, compression, and/or
deduplcation features.

> (In ZFS, a dataset can be either a file system optionally exposed at a
> directory mountpoint or a volume exposed as a block device.)

I try to use ZFS vocabulary per the current Oracle WWW documentation
(but have found discrepancies). I wonder if ZFS-on-Linux and/or OpenZFS
have diverged (e.g. 'man zfs' on Debian, etc.):

https://docs.oracle.com/cd/E18752_01/html/819-5461/ftyue.html

"A generic name for the following ZFS components: clones, file
systems, snapshots, and volumes."

David

Default User

unread,

Sep 5, 2023, 8:40:06 PM9/5/23

to

***********************************************************************

I just wanted to clarify:

Each time backup drive A is rsync'd to backup drive B, much more than
/media/user/MSD00001/rsnapshot_backups_of_host is being rsync'd. All
of /media/user/MSD00001 is being rsync'd, which is somewhere around
900Gb. Maybe that is why each rsync takes over 90 minutes!

Default User

unread,

Sep 5, 2023, 9:40:06 PM9/5/23

to

1) I did read http://www.taobackup.com/index.html a while back. It is
an ad, but funny and worth the read nevertheless.

2) I used borg for a while. Block-level de-duplication saved a ton of
space. But . . .

Per the borg documentation, I kept two separate (theoretically)
identical backup sets, as repository 1 and repository 2, on backup
drive A. Both repositories were daily backed up to backup drive B.

Somehow, repository 1 on drive A got "messed up". I don't remember the
details, and never determined why it happened.

I had a copy of repository 1 on backup drive B, and two copies of
repository 2 on backup drive B, so, no problem. I will just copy
repository 1 on backup drive B to backup drive A. Right?

Wrong. I could not figure out how to make that work, perhaps in part
because of the way borg manages repositories by ID numbers, not by
repository "names".

So, I posted the problem to the borg mailing list. And my jaw hit the
floor. I was astounded that no one there was able to say that it is
even possible to do such a repository copy/restore, let alone how to do
so!

The best advice was to just create a new repository and start a new
backup series, starting now and going forward. For exmple, let's say
the original series started 2022-01-01. And the repository A "problem"
started 2023-01-01. If I just created a new repository A, starting with
a new series beginning with 2023-01-01, then I only have one copy of
the data from 2022, in repository B. If repository B ever fails, I
have no data at all from 2022!

This was of course just unacceptable to me.

And I started thinking:

The borg website bluntly says that they will keep changing borg,
without regard for backward compatibility. (systemd, anyone?). I have
heard that NASA has unusable data from the early days of the space
program. The data is fine. But can't be used because the hardware
and/or software to use it no longer exists.

And, what is F/LOSS today can become closed and proprietary tomorrow,
and thus unintentionally, or even deliberately, you and your data are
trapped . . . (Audacity? CentOS?).

So even though borg seems to be the flavor of the month, I decided no
thanks. I think I'll just "Keep It Simple".

Now if borg (or whatever) works for you, fine. Use it. This is just my
explanation of why I looked elsewhere. YMMV.

David Christensen

unread,

Sep 6, 2023, 1:40:06 AM9/6/23

to

On 9/5/23 17:39, Default User wrote:
> On Tue, 2023-09-05 at 20:01 -0400, Default User wrote:

>> Now sudo du -sh / says that / seems to be using about 30 Gb. But sudo
>> du -sh /media/user/rsnapshot_backups_of_host, says that the backup
>> directory, /media/user/rsnapshot_backups_of_host on backup drive A,
>> is
>> using a whopping 88 Gb for 24 hourly, 7 daily, and 3 weekly!

That is better than (24+7+3) * 30 Gb = 1020 GB.

88 GB - 30 GB = 58 GB of churn over 24 hours, 7 days, and/or 3 weeks may
be reasonable for your workload. Are you doing multimedia content
creation? Databases? Disk imaging? Anything else big?

>> I am thinking, that CAN'T be right.
>> Maybe each hard link is being counted as a full, actual file, when
>> adding up the space allegedly used.
>>
>> So, how can I determine how much space is really being used for the
>> backups?

AIUI 'rsync --link-dest' hard links files on the destination only when
both the file data and the file metadata are identical. If either
changes, 'rsync --link-dest' considers the files to be different and
does a transfer/ copy.

/var/log/* is a canonical degenerate example file-level deduplication.
My Debian daily driver /var/log is 83 MB. 34 copies of that is 1.8 GB.

The challenge is finding big files with slightly different content, big
files with identical content but different metadata, and/or large
numbers of files with either or both differences.

I would start by using jdupes(1) to find identical backup files on the
backup drive. Then use stat(1) or ls(1) on each group of files to find
different metadata. You may want to put the commands into scripts as
you figure them out.

To find files with mismatched content, I would use jdupes(1) with the
--partial-only option, then jdupes(1), stat(1), and/or ls(1) to check
data and metadata as above.

>> [BTW, the rsnapshot backups don't seem to take too much time, but
>> doing
>> rsync of external usb backup drive A to external usb backup drive B
>> does take over 90 minutes each time. And that's once a day, every
>> day!
>> Most of that time is apparently not for data transfer, but for rsync
>> building the indexes it needs each time.]

COW file systems such as ZFS provide a time vs. space caching trade-off.

>> Here is the command I use to rsync backup drive A
>> (/media/default/MSD00001) to backup drive B
>> (/media/default/MSD00002):
>>
>> time sudo rsync -aAXHxvv --delete-after --numeric-ids --
>> info=progress2,stats2,name2 --
>> exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/med
>> ia
>> /*","/lost+found"} /media/default/MSD00001/ /media/default/MSD00002/

I do not use the --numeric-ids option. I use matching username/UID and
groupname/GID on all of my Debian and FreeBSD hosts. I want user/group
name translation on my Windows/Cygwin and macOS hosts.

Your -v, -v, and --info options are going to generate a lot of output.
I typically use the --progress and --stats options, and request more
only when trouble-shooting.

I do not use the --exclude option. If and when a system crashes, I want
everything; including files that an intruder may have placed in
locations that are commonly not backed up. This means that when using
the -x option, I must make sure to backup all file systems.

> I just wanted to clarify:
>
> Each time backup drive A is rsync'd to backup drive B, much more than
> /media/user/MSD00001/rsnapshot_backups_of_host is being rsync'd. All
> of /media/user/MSD00001 is being rsync'd, which is somewhere around
> 900Gb. Maybe that is why each rsync takes over 90 minutes!

Please run the above rsync(1) command, without -v -v --info, and with
--stats. Post your console session.

Use nmon(1) to watch the backup drives when doing the transfer. Tell us
what you see.

David

Linux-Fan

unread,

Sep 9, 2023, 1:00:06 PM9/9/23

to

Default User writes:

> On Fri, 2023-09-01 at 23:15 +0200, Linux-Fan wrote:
> > Default User writes:

[...]

> > > I HAVE used a number of other backup methodologies, including
> > > Borgbackup, for which I had high hopes, but was highly
> > > disappointed.
> >
> > Would you care to share in what regards BorgBackup failed you?

[...]

> 2) I used borg for a while. Block-level de-duplication saved a ton of
> space. But . . .
>
> Per the borg documentation, I kept two separate (theoretically)
> identical backup sets, as repository 1 and repository 2, on backup
> drive A. Both repositories were daily backed up to backup drive B.
>
> Somehow, repository 1 on drive A got "messed up". I don't remember the
> details, and never determined why it happened.
>
> I had a copy of repository 1 on backup drive B, and two copies of
> repository 2 on backup drive B, so, no problem. I will just copy
> repository 1 on backup drive B to backup drive A. Right?
>
> Wrong. I could not figure out how to make that work, perhaps in part
> because of the way borg manages repositories by ID numbers, not by
> repository "names".

[...]

> And, what is F/LOSS today can become closed and proprietary tomorrow,
> and thus unintentionally, or even deliberately, you and your data are
> trapped . . . (Audacity? CentOS?).
>
> So even though borg seems to be the flavor of the month, I decided no
> thanks. I think I'll just "Keep It Simple".
>
> Now if borg (or whatever) works for you, fine. Use it. This is just my
> explanation of why I looked elsewhere. YMMV.

Thank you very much for taking the time to explain it detailedly.

I can understand that corruption / messed up repositories are really one of
the red flags for backup tools and hence a good reason to avoid such tools.
Hence I can fully understand your decision. That there was no way to recover
despite following the tools best practices (docs) does not improve things...

Just a question for my understanding: You mentioned having multiple
repositories. If I had the situation with two different repositories and one
corrupted my first idea (if the backup program does not offer any internal
functions for these purposes which you confirmed using the mailing list?)
would be to copy the “good” repository at the file level (i.e. with rsync /
tar whatever) and then afterwards update the copy to fixup any metadata that
may be wrong. Did you try out this naive approach during your attempt for
recovery?

I think that currently I am not affected by such issues because I only keep
the most recent state of the backup and do not have any history in my
backups (beyond the “archive” which I keep separate and using my own
program). Hence for me, indeed, the solution to re-create the repository in
event of corruption is viable.

But as the backup programs advertise the possibility to keep multiple states
of the backup in one repository, it is indeed, essential, that one can “move
around” such a repository on the file system while being able to continue
adding to it even after swiching to a different/new location. I have never
thought about testing such a use case for any of the tools that I tried, but
I can see that it is actually quite the essential feature making it even
more strange that it would not be available with Borg?

TIA
Linux-Fan

öö