Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[gentoo-user] Suggestions for backup scheme?

129 views
Skip to first unread message

Grant Edwards

unread,
Jan 30, 2024, 1:20:05 PMJan 30
to
I need to set up some sort of automated backup on a couple Gentoo
machines (typical desktop software development and home use). One of
them used rsnapshot in the past but the crontab entries that drove
that have vanished :/ (presumably during a reinstall or upgrade --
IIRC, it took a fair bit of trial and error to get the crontab entries
figured out).

I believe rsnapshot ran nightly and kept daily snapshots for a week,
weekly snapshots for a month, and monthly snapshots for a couple
years.

Are there other backup solutions that people would like to suggest I
look at to replace rsnapshot? I was happy enough with rsnapshot (when
it was running), but perhaps there's something else I should consider?

--
Grant

Thelma

unread,
Jan 30, 2024, 1:50:04 PMJan 30
to
I backup, periodically:
- corontab (user, root)
- etc
- hylafax

daily:
- data

It all depend what you want you backup, how large is your data.
For backup standard "rsync" over the network does the job OK

I customized this rsync-bacup script:
https://serverfault.com/questions/271527/explain-this-rsync-script-for-me-linux-backups

# This script does personal backups to a rsync backup server. You will end up
# with a 7 day rotating incremental backup. The incrementals will go
# into subdirectories named after the day of the week, and the current
# full backup goes into a directory called "current"
# tri...@linuxcare.com

Michael

unread,
Jan 30, 2024, 2:00:04 PMJan 30
to
You have probably seen the backup packages suggested in this wiki page?

https://wiki.gentoo.org/wiki/Backup

and what's available in the tree:

https://packages.gentoo.org/categories/app-backup

You may also want to consider integral filesystem solutions like btrfs and
zfs, depending on your needs and how often your data change, as well as
related scripts; e.g.:

https://github.com/masc3d/btrfs-sxbackup



signature.asc

Rich Freeman

unread,
Jan 30, 2024, 2:20:06 PMJan 30
to
On Tue, Jan 30, 2024 at 1:15 PM Grant Edwards <grant.b...@gmail.com> wrote:
>
> Are there other backup solutions that people would like to suggest I
> look at to replace rsnapshot? I was happy enough with rsnapshot (when
> it was running), but perhaps there's something else I should consider?
>

I'd echo the other advice. It really depends on your goals.

I think the key selling point for rsnapshot is that it can generate a
set of clones of the filesystem contents that are directly readable.
That isn't as efficient as it can be, but it is very simple to work
with, and it is done about as well as can be done with this sort of
approach. Restoration basically requires no special tooling this way,
so that is great if you want to restore from a generic rescue disk and
not have to try to remember what commands to use.

send-based tools for filesystems like brtrfs/zfs are SUPER-efficient
in execution time/resources as they are filesystem-aware and don't
need to stat everything on a filesystem to identify exactly what
changed in an incremental backup. However, you're usually limited to
restoring to another filesystem of the same type and have to use those
tools. There are some scripts out there that automate the process of
managing all of this (you need to maintain snapshots/etc to allow the
incremental backups). There are a bunch of other tools for backing up
specific applications/filesystems/etc as well. (Every database has
one, which you should definitely use, and there are tools like volsync
for k8s and so on.)

Restic seems to be the most popular tool to backup to a small set of
files on disk/cloud. I use duplicity for historical reasons, and
restic does the same and probably supports more back-ends. These
tools are very useful for cloud backups as they're very efficient
about separating data/indexes and keeping local copies of the latter
so you aren't paying to read back your archive data every time you do
a new incremental backup, and they're very IO-efficient.

Bacula is probably the best solution for tape backups of large numbers
of systems, but it is really crufty and unwieldy. I would NOT use
this to backup one host, and especially not to back up the host
running bacula. Bootstrapping it is a pain. It is very much designed
around a tape paradigm.

If you have windows hosts you want to backup then be sure to find a
solution that supports volume shadow copy - there aren't many. Bacula
is one of them which is the main reason I even use it at this point.
If you don't have that feature then you won't back up the registry,
and you can imagine how losing that is on a windows machine. If
you're just backing up documents though then anything will work, as
long as files aren't open, because windows is extra-fussy about that.

--
Rich

Grant Edwards

unread,
Jan 30, 2024, 2:30:05 PMJan 30
to
On 2024-01-30, Thelma <the...@sys-concept.com> wrote:

> I backup, periodically:
> - corontab (user, root)
> - etc
> - hylafax
>
> daily:
> - data
>
> It all depend what you want you backup, how large is your data.
> For backup standard "rsync" over the network does the job OK

rsnapshot is a perl app that automates/organizes rsync backups, so
it's doing pretty much the same thing as the script below (though it's
a little more sophisticated).
> [...]

Grant Edwards

unread,
Jan 30, 2024, 2:40:03 PMJan 30
to
On 2024-01-30, Michael <confa...@kintzios.com> wrote:
> On Tuesday, 30 January 2024 18:15:09 GMT Grant Edwards wrote:
>> I need to set up some sort of automated backup on a couple Gentoo
>> machines (typical desktop software development and home use). One of
>> them used rsnapshot in the past but the crontab entries that drove
>> that have vanished :/

The crontabs had not dissappeared, I was looking on the wrong
computer. Too many terminal windows open...

> You have probably seen the backup packages suggested in this wiki page?

Yep. Rsnapshot is one of them, and it's what I chose quite a few
years ago.
I'll look through the package database and visit some of the home
pages.

> You may also want to consider integral filesystem solutions like btrfs and
> zfs, depending on your needs and how often your data change, as well as
> related scripts; e.g.:

I don't think I'm ready to switch from ext4.

Grant Edwards

unread,
Jan 30, 2024, 2:50:03 PMJan 30
to
On 2024-01-30, Rich Freeman <ri...@gentoo.org> wrote:
> On Tue, Jan 30, 2024 at 1:15 PM Grant Edwards <grant.b...@gmail.com> wrote:
>>
>> Are there other backup solutions that people would like to suggest I
>> look at to replace rsnapshot? I was happy enough with rsnapshot (when
>> it was running), but perhaps there's something else I should consider?
>
> I'd echo the other advice. It really depends on your goals.

FWIW, I'm backing up only home directories and config stuff
(e.g. /etc). I don't backup the OS itself or anything installed in
/opt by installers or ebuilds.

> I think the key selling point for rsnapshot is that it can generate a
> set of clones of the filesystem contents that are directly readable.

Yes, that's the main advantage of rsnapshot. You can browse through
backups without any special tools.

> That isn't as efficient as it can be, but it is very simple to work
> with, and it is done about as well as can be done with this sort of
> approach. Restoration basically requires no special tooling this
> way, so that is great if you want to restore from a generic rescue
> disk and not have to try to remember what commands to use.

Yep, rsnapshot can take several hours to run every night. But, being
able to look through backups with nothing more than "cd" "ls" and
"cat" sure is nice.

> send-based tools for filesystems like brtrfs/zfs are SUPER-efficient
> in execution time/resources as they are filesystem-aware and don't
> need to stat everything on a filesystem to identify exactly what
> changed in an incremental backup. However, you're usually limited to
> restoring to another filesystem of the same type and have to use those
> tools.

For now, I need something for ext4, but backup-ability is definitely a
a reason to consider switching filesystem types.

> Restic seems to be the most popular tool to backup to a small set of
> files on disk/cloud. I use duplicity for historical reasons, and
> restic does the same and probably supports more back-ends. These
> tools are very useful for cloud backups as they're very efficient
> about separating data/indexes and keeping local copies of the latter
> so you aren't paying to read back your archive data every time you do
> a new incremental backup, and they're very IO-efficient.

I generally backup to a USB 3 external hard drive, so IO efficiency
isn't as much a concern as when backing up to cloud. There are things
I periodially back up to cloud, but that's generally a manual process
involving little more than "scp".

> Bacula is probably the best solution for tape backups of large numbers
> of systems, but it is really crufty and unwieldy. I would NOT use
> this to backup one host, and especially not to back up the host
> running bacula. Bootstrapping it is a pain. It is very much designed
> around a tape paradigm.

Thanks, I'll cross that one off the list. :)

> If you have windows hosts you want to backup

Thankfully, I don't.

Wol

unread,
Jan 30, 2024, 3:10:05 PMJan 30
to
On 30/01/2024 19:19, Rich Freeman wrote:
> I'd echo the other advice. It really depends on your goals.

If you just want a simple backup, I'd use something like rsync onto lvm
or btrfs or something. I've got a little script that sticks today's date
onto the snapshot name (used to snapshot / before I emerge :-) so if you
run something like after each backup you know your snapshot is as it was
that day.

Rsync has a "backup in place option", so it will on;ly overwrite parts
of the file that have changed, so if you use lvm or btrfs to snapshot
the filesystem as part of the backup, you can then just mount that
snapshot to get a complete filesystem image. Full backups for the price
of incremental.

Cheers,
Wol

Rich Freeman

unread,
Jan 30, 2024, 3:20:03 PMJan 30
to
On Tue, Jan 30, 2024 at 3:08 PM Wol <antl...@youngman.org.uk> wrote:
>
> On 30/01/2024 19:19, Rich Freeman wrote:
> > I'd echo the other advice. It really depends on your goals.
>
> If you just want a simple backup, I'd use something like rsync onto lvm
> or btrfs or something. I've got a little script that sticks today's date
> onto the snapshot name

So, you've basically described what rsnapshot does, minus half the
features. You should consider looking at it. It is basically an
rsync wrapper and will automatically rotate multiple snapshots, and
when it makes them they're all hard-linked such that they're as close
to copy-on-write copies as possible. The result is that all those
snapshots don't take up much space, unless your files are constantly
changing.

--
Rich

Grant Edwards

unread,
Jan 30, 2024, 3:40:04 PMJan 30
to
On 2024-01-30, Rich Freeman <ri...@gentoo.org> wrote:
> On Tue, Jan 30, 2024 at 3:08 PM Wol <antl...@youngman.org.uk> wrote:
>>
>> On 30/01/2024 19:19, Rich Freeman wrote:
>> > I'd echo the other advice. It really depends on your goals.
>>
>> If you just want a simple backup, I'd use something like rsync onto lvm
>> or btrfs or something. I've got a little script that sticks today's date
>> onto the snapshot name
>
> So, you've basically described what rsnapshot does, minus half the
> features. You should consider looking at it.

If you do, read carefully the documentation on intervals and
automation.

It took me an embarassing number of tries to get the intervals and
crontab entries to mesh so it worked the way I wanted. It's not really
that difficult (and it's pretty well documented), but I managed to
combine a misreading of how often and in what order the rsync wrapper
was supposed to run with my chronic inability to grok crontab
specifications. Hilarity ensued.

gento...@krasauskas.dev

unread,
Jan 31, 2024, 3:20:06 AMJan 31
to
On Tue, 2024-01-30 at 20:38 +0000, Grant Edwards wrote:
>
> It took me an embarassing number of tries to get the intervals and
> crontab entries to mesh so it worked the way I wanted. It's not
> really
> that difficult (and it's pretty well documented), but I managed to
> combine a misreading of how often and in what order the rsync wrapper
> was supposed to run with my chronic inability to grok crontab
> specifications. Hilarity ensued.
>

I just wanted to share my 2¢. https://crontab.guru has made my life a
lot easier when it comes to setting up crontab.

John Covici

unread,
Jan 31, 2024, 6:50:06 AMJan 31
to
I know you said you wanted to stay with ext4, but going to zfs reduced
my backup time on my entire system from several hours to just a few
minutes because taking a snapshot is so quick and copying to another
pool is also very quick.

--
Your life is like a penny. You're going to lose it. The question is:
How do
you spend it?

John Covici wb2una
cov...@ccs.covici.com

Rich Freeman

unread,
Jan 31, 2024, 8:10:07 AMJan 31
to
On Wed, Jan 31, 2024 at 6:45 AM John Covici <cov...@ccs.covici.com> wrote:
>
> I know you said you wanted to stay with ext4, but going to zfs reduced
> my backup time on my entire system from several hours to just a few
> minutes because taking a snapshot is so quick and copying to another
> pool is also very quick.
>

Honestly, at this point I would not run any storage I cared about on
anything but zfs. There are just so many benefits.

I'd consider btrfs, but I'd have to dig into whether the reliability
issues have been solved. I was using that for a while, but I found
that even features that were touted as reliable had problems from time
to time. That was years ago, however. On paper I think it is the
better option, but I just need to confirm whether I can trust it.

In any case, these COW filesystems, much like git, store data in a way
that makes it very efficient to diff two snapshots and back up only
the data that has changed. They are far superior to rsync in this
regard, providing all the benefits of rsync --checksum but without
even having to stat all of the inodes let alone read file contents.

--
Rich

Grant Edwards

unread,
Jan 31, 2024, 10:40:06 AMJan 31
to
Yep, I just found that site recently myself, and it's a big help.

--
Grant

Grant Edwards

unread,
Jan 31, 2024, 11:00:05 AMJan 31
to
On 2024-01-31, Rich Freeman <ri...@gentoo.org> wrote:
> On Wed, Jan 31, 2024 at 6:45 AM John Covici <cov...@ccs.covici.com> wrote:
>>
>> I know you said you wanted to stay with ext4, but going to zfs reduced
>> my backup time on my entire system from several hours to just a few
>> minutes because taking a snapshot is so quick and copying to another
>> pool is also very quick.
>>
>
> Honestly, at this point I would not run any storage I cared about on
> anything but zfs. There are just so many benefits.

I'll definitely put zfs on my list of things to play with. I've been
a little reluctant in the past because it wasn't natively supported. I
don't use an initrd (or modules in general). So, using a filesystem
that isn't supported in-tree sounded like too much work.

--
Grant

Thelma

unread,
Jan 31, 2024, 12:50:05 PMJan 31
to
If zfs file system is superior to ext4 and it seems to it is.
Why hasn't it been adopted more widely in Linux?

Rich Freeman

unread,
Jan 31, 2024, 1:00:06 PMJan 31
to
On Wed, Jan 31, 2024 at 12:40 PM Thelma <the...@sys-concept.com> wrote:
>
> If zfs file system is superior to ext4 and it seems to it is.
> Why hasn't it been adopted more widely in Linux?
>

The main barrier is that its license isn't GPL-compatible. It is
FOSS, but the license was basically designed to keep it from being
incorporated into the mainline kernel.

The odd thing is that right now Oracle controls both ZFS and btrfs,
with the latter doing mostly the same thing and being GPL-compatible,
but it hasn't tended to be as stable. So we're in a really long
transitional period to btrfs becoming as reliable.

ZFS also cannot be shrunk as easily. I think that is something that
has been improved more recently, but I'm not certain of the state of
it. Also, bootloaders like grub aren't 100% compatible with all of
its later features, and it isn't even clear in the docs which ones are
and aren't supported. So it doesn't hurt to keep /boot off of zfs.

I'm sure ext4 also performs better. It has to be faster to just
overwrite a block in place than to remap the extents around the
change, even if the latter is safer. I'd expect zfs to outperform
ext4 with full data journaling though, which would have a comparable
level of safety, assuming it isn't on RAID. I don't think there are
any RAID implementations that do full write journaling to protect
against the write hole problem, but those would obviously underperform
zfs as well.

--
Rich

Grant Edwards

unread,
Jan 31, 2024, 1:10:05 PMJan 31
to
On 2024-01-31, Thelma <the...@sys-concept.com> wrote:
> On 1/31/24 08:50, Grant Edwards wrote:
>> On 2024-01-31, Rich Freeman <ri...@gentoo.org> wrote:
>>
>>> Honestly, at this point I would not run any storage I cared about on
>>> anything but zfs. There are just so many benefits.
>>
>> I'll definitely put zfs on my list of things to play with. I've been
>> a little reluctant in the past because it wasn't natively supported. I
>> don't use an initrd (or modules in general). So, using a filesystem
>> that isn't supported in-tree sounded like too much work.
>
> If zfs file system is superior to ext4 and it seems to it is.
> Why hasn't it been adopted more widely in Linux?

My understanding is that the license is incompatible with the Linux
kernel's GPL license, so it can't be "built-in" they way that ext4 is.

--
Grant

Wols Lists

unread,
Jan 31, 2024, 1:50:05 PMJan 31
to
On 31/01/2024 17:56, Rich Freeman wrote:
> I don't think there are
> any RAID implementations that do full write journaling to protect
> against the write hole problem, but those would obviously underperform
> zfs as well.

This feature has been added to mdraid, iirc.

Cheers,
Wol

Rich Freeman

unread,
Jan 31, 2024, 4:40:06 PMJan 31
to
Oh, it looks like it has. Kind of annoying that it only works for a
separate device. I guess you could create a separate partition on all
of the devices, create a mirror across all of those, and then use that
as the journal for the real raid. It would be nice if it had an
option for an internal journal.

I'd expect the performance of btrfs/zfs to be better of course because
it just does the write to known-unused blocks, so interrupted writes
don't cause any issues. Depending on how far it gets when interrupted
the write will be ignored (it is just data written into unused space),
or will get completed (the root of the metadata was updated and now
points to a consistent new state).

--
Rich

Michael

unread,
Feb 1, 2024, 5:20:04 AMFeb 1
to
I have been running BTRFS on a system for just over 10 years now, on SSD and
spinning SATA drives. I recall the odd glitch (file corruption, but no data
lost) some 7 or 8 years ago. Since then I've had no problems. A second
system on BTRFS has been running since last summer and had some hard reboots
(I need a bigger UPS) but still no problem with it. Subjectively and
anecdotally therefore, I'd consider BTRFS for Linux as long as long as the use
case requires it - ease of snapshots, flexible space requirements,
compression, etc. Known problems like RAID 5, 6 and perhaps quotas(?) on
BTRFS are for experimentation only. On FreeBSD I use ZFS.

For a personal filesystem where data does not change frequently I'd probably
choose ext4.

[Slightly O/T]: On a new laptop installation on SSD I'm thinking of trying
F2FS for the OS and ext4 for /home. I've only tried F2FS on USB sticks so
far. Would you have some feedback to share on F2FS?
signature.asc

Grant Edwards

unread,
Feb 2, 2024, 6:40:06 PMFeb 2
to
On 2024-01-31, Rich Freeman <ri...@gentoo.org> wrote:

> Honestly, at this point I would not run any storage I cared about on
> anything but zfs. There are just so many benefits.
>
> [...]
>
> In any case, these COW filesystems, much like git, store data in a
> way that makes it very efficient to diff two snapshots and back up
> only the data that has changed. [...]

In order to take advantage of this, I assume that the backup
destination and source both have to be ZFS? Do backup source and
destination need to be in the same filesystem? Or volume? Or Pool?
(I'm not clear on how those differ exactly.) Or can the backup
destination be "unrelated" to the backup source? The primary source of
failure in my world is definitely hardware failure of the disk drive,
so my backup destination is always a separate physical (usually
external) disk drive.

If you'll forgive the analogy, we'll say the the functionality of
rsync (as used by rsnapshot) is built-in to ZFS. Is there an
application that does with ZFS snapshots what the rsnapshot
application itself does with rsync?

I googled for ZFS backup applications, but didn't find anything that
seemed to be widespread and "supported" the way that rsnapshot is.

--
Grant

Mark Knecht

unread,
Feb 2, 2024, 7:00:05 PMFeb 2
to


On Fri, Feb 2, 2024 at 4:39 PM Grant Edwards <grant.b...@gmail.com> wrote:
<SNIP>

>
> I googled for ZFS backup applications, but didn't find anything that
> seemed to be widespread and "supported" the way that rsnapshot is.

I'm not exactly sure I'm following your thoughts above but 
have you investigated True-NAS? It is Open ZFS based and 
does support snapshots.

HTH,
Mark

Michael

unread,
Feb 3, 2024, 8:10:05 AMFeb 3
to
On Friday, 2 February 2024 23:39:18 GMT Grant Edwards wrote:
> On 2024-01-31, Rich Freeman <ri...@gentoo.org> wrote:
> > Honestly, at this point I would not run any storage I cared about on
> > anything but zfs. There are just so many benefits.
> >
> > [...]
> >
> > In any case, these COW filesystems, much like git, store data in a
> > way that makes it very efficient to diff two snapshots and back up
> > only the data that has changed. [...]
>
> In order to take advantage of this, I assume that the backup
> destination and source both have to be ZFS? Do backup source and
> destination need to be in the same filesystem? Or volume? Or Pool?
> (I'm not clear on how those differ exactly.) Or can the backup
> destination be "unrelated" to the backup source? The primary source of
> failure in my world is definitely hardware failure of the disk drive,
> so my backup destination is always a separate physical (usually
> external) disk drive.

TBH using ext4/xfs/f2fs/etc. on the host plus an incremental backup method on
any other fs of choice on external storage is IMHO a better method for a
laptop. Unless your data is changing continuously and you need incremental
backups every 5 minutes what you use is well suited to your use case.


> If you'll forgive the analogy, we'll say the the functionality of
> rsync (as used by rsnapshot) is built-in to ZFS.

Broadly and rather loosely yes, by virtue of the COW and snapshot fs
architecture and the btrfs/zfs send-receive commands.


> Is there an
> application that does with ZFS snapshots what the rsnapshot
> application itself does with rsync?

COW filesystems do not need a 3rd party application. They come with their own
commands which can be called manually, or scripted for convenience and
automation. Various people have created their own scripts and applications,
e.g.

https://unix.stackexchange.com/questions/696513/best-strategy-to-backup-btrfs-root-filesystem


> I googled for ZFS backup applications, but didn't find anything that
> seemed to be widespread and "supported" the way that rsnapshot is.
>
> --
> Grant

There must be quite a few scripts out there, but can't say what support they
may receive. Random search revealed:

https://www.zfsnap.org/

https://github.com/shirkdog/zfsbackup

https://gbyte.dev/blog/simple-zfs-snapshotting-replicating-backup-rotating-convenience-bash-script
signature.asc

Grant Edwards

unread,
Feb 3, 2024, 11:10:06 AMFeb 3
to
On 2024-02-02, Mark Knecht <markk...@gmail.com> wrote:
> On Fri, Feb 2, 2024 at 4:39 PM Grant Edwards <grant.b...@gmail.com>
> wrote:
><SNIP>
>>
>> I googled for ZFS backup applications, but didn't find anything that
>> seemed to be widespread and "supported" the way that rsnapshot is.
>
> I'm not exactly sure I'm following your thoughts above but

rsnapshot is an application that uses rsync to do
hourly/daily/weekly/monthly (user-configurable) backups of selected
directory trees. It's done using rsync to create snapshots. They are
in-effect "incremental" backups, because the snapshots themselves are
effectively "copy-on-write" via clever use of hard-links by rsync. A
year's worth of backups for me is 7 daily + 4 weekly + 12 monthly
snapshots for a total of 23 snapshots. If nothing has changed during
the year, those 23 snapshots take up the same amount of space as a
single snapshot.

My understanding of ZFS is that it has built-in snapshot functionality
that provides something similar to what is done by rsync by its use of
hard-links. In my current setup, there's an application called
rsnapshot that manages/controls the snapshots by invoking rsync in
various ways. My question was about the existence of a similar
application that can be used with ZFS's built-in snapshot support to
provide a similar backup scheme.

> have you investigated True-NAS? It is Open ZFS based and
> does support snapshots.

I'm aware of True-NAS, but I'm not looking for NAS. I was asking
abouth methods to backup one local disk to another local disk.

--
Grant

Grant Edwards

unread,
Feb 3, 2024, 11:20:05 AMFeb 3
to
On 2024-02-03, Michael <confa...@kintzios.com> wrote:

>> If you'll forgive the analogy, we'll say the the functionality of
>> rsync (as used by rsnapshot) is built-in to ZFS.
>
> Broadly and rather loosely yes, by virtue of the COW and snapshot fs
> architecture and the btrfs/zfs send-receive commands.
>
>> Is there an application that does with ZFS snapshots what the
>> rsnapshot application itself does with rsync?
>
> COW filesystems do not need a 3rd party application.

Really? I can edit a configuration file and then ZFS will provide me
with daily/weekly/monthly/yearly snapshots of (say) /home, /etc, and
/usr/local on an external hard drive?

> They come with their own commands which can be called manually, or
> scripted for convenience and automation.

Yes, I know that. AFAICT, they provide commands that do pretty much
what rsync does in my current backup scheme. It's the automation
provided by rsnapshot that I'm asking about.

> Various people have created their own scripts and applications, e.g.
>
> https://unix.stackexchange.com/questions/696513/best-strategy-to-backup-btrfs-root-filesystem
>
>> I googled for ZFS backup applications, but didn't find anything that
>> seemed to be widespread and "supported" the way that rsnapshot is.
>
> There must be quite a few scripts out there, but can't say what support they
> may receive. Random search revealed:
>
> https://www.zfsnap.org/
>
> https://github.com/shirkdog/zfsbackup
>
> https://gbyte.dev/blog/simple-zfs-snapshotting-replicating-backup-rotating-convenience-bash-script

Yes, there seem to be a lot of bare-bones homebrewed scripts like
those. That is the sort of what I was looking for but they all seem a
bit incomplete and unsupported compared rsnapshot. I can install
rsnapshot with a simple "emerge rsnapshot", edit the config file, set
up the crontab entries, and Bob's your uncle: rsnapshot bugfixes and
updates get installed by the usual Gentoo update process, and backups
"just happen".

--
Grant

Wol

unread,
Feb 3, 2024, 12:10:05 PMFeb 3
to
On 03/02/2024 16:02, Grant Edwards wrote:
> rsnapshot is an application that uses rsync to do
> hourly/daily/weekly/monthly (user-configurable) backups of selected
> directory trees. It's done using rsync to create snapshots. They are
> in-effect "incremental" backups, because the snapshots themselves are
> effectively "copy-on-write" via clever use of hard-links by rsync. A
> year's worth of backups for me is 7 daily + 4 weekly + 12 monthly
> snapshots for a total of 23 snapshots. If nothing has changed during
> the year, those 23 snapshots take up the same amount of space as a
> single snapshot.

So as I understand it, it looks like you first do a "cp with hardlinks"
creating a complete new directory structure, but all the files are
hardlinks so you're not using THAT MUCH space for your new image?

Then rsync copies and replaces any files that have been modified?

So each snapshot is using the space required by the directory structure,
plus the space required by any changed files.
>
> My understanding of ZFS is that it has built-in snapshot functionality
> that provides something similar to what is done by rsync by its use of
> hard-links. In my current setup, there's an application called
> rsnapshot that manages/controls the snapshots by invoking rsync in
> various ways. My question was about the existence of a similar
> application that can be used with ZFS's built-in snapshot support to
> provide a similar backup scheme.

ZFS is a "copy on write" filesystem, I believe. So any changed blocks
are rewritten to new blocks, the live snapshot is updated, and if the
original block is required by older snapshots it is retained, else it is
freed. And then you can push a snapshot to another disk. I think ZFS
also has "dedupe on write", so if you're regularly pushing snapshots
you're not wasting space with duplicate data.

And that is why I like "ext over lvm copying with rsync" as my strategy
(not that I actually do it). You have lvm on your backup disk. When you
do a backup you do "rsync with overwrite in place", which means rsync
only writes blocks which have changed. You then take an lvm snapshot
which uses almost no space whatsoever.

So to compare "lvm plus overwrite in place" to "rsnapshot", my strategy
uses the space for an lvm header and a copy of all blocks that have changed.

Your strategy takes a copy of the entire directory structure, plus a
complete copy of every file that has changed. That's a LOT more.

If my hard disk changes by lets say 0.1% a day, and I take daily
snapshots, that's three years before I need to start deleting backups
assuming I'm actually using half my disk (and with terabyte disks, both
the amount of change, and the amount of disk used, is likely to be a lot
less than those figures).

Cheers,
Wol

Rich Freeman

unread,
Feb 3, 2024, 12:40:06 PMFeb 3
to
On Fri, Feb 2, 2024 at 6:39 PM Grant Edwards <grant.b...@gmail.com> wrote:
>
> On 2024-01-31, Rich Freeman <ri...@gentoo.org> wrote:
>
> > In any case, these COW filesystems, much like git, store data in a
> > way that makes it very efficient to diff two snapshots and back up
> > only the data that has changed. [...]
>
> In order to take advantage of this, I assume that the backup
> destination and source both have to be ZFS?

So, the data needs to be RESTORED to ZFS for this to work. However,
the zfs send command serializes the data and so you can just store it
in files. Those files can only be read back into zfs.

It is probably a bit more typical to just pipe the send command into
zfs receive (often over ssh) so that you're just directly mirroring
the filesystem, and not storing the intermediate data.

> Do backup source and
> destination need to be in the same filesystem? Or volume? Or Pool?

No on all of these, but they can be.

> If you'll forgive the analogy, we'll say the the functionality of
> rsync (as used by rsnapshot) is built-in to ZFS. Is there an
> application that does with ZFS snapshots what the rsnapshot
> application itself does with rsync?

There are a few wrappers around zfs send. I'm using
sys-fs/zfs-auto-snapshot and what looks like a much older version of:
https://github.com/psy0rz/zfs_autobackup

>
> I googled for ZFS backup applications, but didn't find anything that
> seemed to be widespread and "supported" the way that rsnapshot is.

They're less popular since many just DIY them, but honestly I think
the wrapper is a nicer solution. It will rotate backups, make sure
that snapshots aren't needed before deleting them, and so on. In
order to do an incremental backup the source/destination systems need
to have matching snapshots to base them on, so that is important if
backups are sporadic. If you're just saving all the send streams then
knowing which ones are obsolete is also important, unless you want to
have points in time.

--
Rich

Michael

unread,
Feb 3, 2024, 1:20:05 PMFeb 3
to
This article offers some comparison tests between ZFS, Btrfs and mdadm+dm-
integrity. Although the setup and scenarios are not directly comparable with
the OP's use case they provide some insight on more typical implementations
where these fs excel.

https://unixsheikh.com/articles/battle-testing-zfs-btrfs-and-mdadm-dm.html
signature.asc

Grant Edwards

unread,
Feb 4, 2024, 1:30:05 AMFeb 4
to
On 2024-02-03, Wol <antl...@youngman.org.uk> wrote:
> On 03/02/2024 16:02, Grant Edwards wrote:
>> rsnapshot is an application that uses rsync to do
>> hourly/daily/weekly/monthly (user-configurable) backups of selected
>> directory trees. It's done using rsync to create snapshots. They are
>> in-effect "incremental" backups, because the snapshots themselves are
>> effectively "copy-on-write" via clever use of hard-links by rsync. A
>> year's worth of backups for me is 7 daily + 4 weekly + 12 monthly
>> snapshots for a total of 23 snapshots. If nothing has changed during
>> the year, those 23 snapshots take up the same amount of space as a
>> single snapshot.
>
> So as I understand it, it looks like you first do a "cp with hardlinks"
> creating a complete new directory structure, but all the files are
> hardlinks so you're not using THAT MUCH space for your new image?

No, the first snaphost is a complete copy of all files. The snapshots
are on a different disk, in a different filesystem, and they're just
plain directory trees that you can brose with normal filesystem
tools. It's not possible to hard-link between the "live" filesystem
and the backup snapshots. The hard-links are to inodes "shared"
between different snapshot directory trees. The first snapshot copies
everything to the backup drive (using rsync).

The next snapshot creates a second directory tree with all unchanged
files hard-linked to the files that were copied as part of the first
snapshot. Any changed files just-plain-copied into the second snapshot
directory tree.

The third snapshot does the same thing (starting with the second
snapshot directory tree).

Rinse and repeat.

Old snapshots trees are simply removed a-la 'rm -rf" when they're no
longer wanted.

> So each snapshot is using the space required by the directory
> structure, plus the space required by any changed files.

Sort of. The backup filesystem has to contain one copy of every file
so that there's something to hard-link to. The backup is completely
stand-alone, so it doesn't make sense to talk about all of the
snapshots containing only deltas. When you get to the "oldest"
snapshot, there's nothing to delta "from".

> [...]
>
> And that is why I like "ext over lvm copying with rsync" as my
> strategy (not that I actually do it). You have lvm on your backup
> disk. When you do a backup you do "rsync with overwrite in place",
> which means rsync only writes blocks which have changed. You then
> take an lvm snapshot which uses almost no space whatsoever.
>
> So to compare "lvm plus overwrite in place" to "rsnapshot", my
> strategy uses the space for an lvm header and a copy of all blocks
> that have changed.
>
> Your strategy takes a copy of the entire directory structure, plus a
> complete copy of every file that has changed. That's a LOT more.

I don't understand, are you saying that somehow your backup doesn't
contain a copy of every file?

--
Grant

Wols Lists

unread,
Feb 4, 2024, 5:00:06 AMFeb 4
to
On 04/02/2024 06:24, Grant Edwards wrote:
> On 2024-02-03, Wol <antl...@youngman.org.uk> wrote:
>> On 03/02/2024 16:02, Grant Edwards wrote:
>>> rsnapshot is an application that uses rsync to do
>>> hourly/daily/weekly/monthly (user-configurable) backups of selected
>>> directory trees. It's done using rsync to create snapshots. They are
>>> in-effect "incremental" backups, because the snapshots themselves are
>>> effectively "copy-on-write" via clever use of hard-links by rsync. A
>>> year's worth of backups for me is 7 daily + 4 weekly + 12 monthly
>>> snapshots for a total of 23 snapshots. If nothing has changed during
>>> the year, those 23 snapshots take up the same amount of space as a
>>> single snapshot.
>>
>> So as I understand it, it looks like you first do a "cp with hardlinks"
>> creating a complete new directory structure, but all the files are
>> hardlinks so you're not using THAT MUCH space for your new image?
>
> No, the first snaphost is a complete copy of all files. The snapshots
> are on a different disk, in a different filesystem, and they're just
> plain directory trees that you can brose with normal filesystem
> tools. It's not possible to hard-link between the "live" filesystem
> and the backup snapshots. The hard-links are to inodes "shared"
> between different snapshot directory trees. The first snapshot copies
> everything to the backup drive (using rsync).

Yes I get that. You create a new partition and copy all your files into it.

I create a new pv (physical volume), lv (logical volume), and copy all
my files into it.
>
> The next snapshot creates a second directory tree with all unchanged
> files hard-linked to the files that were copied as part of the first
> snapshot. Any changed files just-plain-copied into the second snapshot
> directory tree.

You create a complete new directory structure, which uses at least one
block per directory. You can't hard link directories.

I create a LVM snapshot. Dunno how much that is - a couple of blocks?

You copy all the files that have changed, leaving the old copy in the
old tree and the new copy in the new tree - for a 10MB file that's
changed, you use 10MB.

I use rsync's "Overwrite in place" mode, so if I change 10 bytes at the
end of that 10MB file I use ONE block to overwrite it (unless sod
strikes). The old block is left in the old volume, the new block is left
in the new volume.
>
> The third snapshot does the same thing (starting with the second
> snapshot directory tree).

So you end up with multiple directory trees (which could be large in
themselves), and multiple copies of files that have changed. Which could
be huge files.

I end up with ONE copy of my current data, and a whole bunch of dated
mount points, each of which is a full copy as of that date, but only
actually uses enough space to store a diff of the volume - if I change
that 10MB file every backup, but only change lets say 10KB over three
4KB disk blocks, I've only used four blocks - 16KB - per backup!
>
> Rinse and repeat.
>
> Old snapshots trees are simply removed a-la 'rm -rf" when they're no
> longer wanted.
>
>> So each snapshot is using the space required by the directory
>> structure, plus the space required by any changed files.
>
> Sort of. The backup filesystem has to contain one copy of every file
> so that there's something to hard-link to. The backup is completely
> stand-alone, so it doesn't make sense to talk about all of the
> snapshots containing only deltas. When you get to the "oldest"
> snapshot, there's nothing to delta "from".

I get that - it's a different hard drive.
>
>> [...]
>>
>> And that is why I like "ext over lvm copying with rsync" as my
>> strategy (not that I actually do it). You have lvm on your backup
>> disk. When you do a backup you do "rsync with overwrite in place",
>> which means rsync only writes blocks which have changed. You then
>> take an lvm snapshot which uses almost no space whatsoever.
>>
>> So to compare "lvm plus overwrite in place" to "rsnapshot", my
>> strategy uses the space for an lvm header and a copy of all blocks
>> that have changed.
>>
>> Your strategy takes a copy of the entire directory structure, plus a
>> complete copy of every file that has changed. That's a LOT more.
>
> I don't understand, are you saying that somehow your backup doesn't
> contain a copy of every file?
>
YES! Let's make it clear though, we're talking about EVERY VERSION of
every backed up file.

And you need to get your head round the fact I'm not - actually -
backing up my filesystem. I'm actually snapshoting my disk volume, my
disk partition if you like.

Your strategy contains a copy of every file in your original backup, a
full copy of the file structure for every snapshot, and a full copy of
every version of every file that's been changed.

My version contains a complete copy of the current backup and (thanks to
the magic of lvm) a block level diff of every snapshot, which appears to
the system as a complete backup, despite taking up much less space than
your typical incremental backup.

To change analogies completely - think git. My lvm snapshot is like a
git commit. Git only stores the current HEAD, and retrieves previous
commits by applying diffs. If I "check out a backup" (ie mount a backup
volume), lvm applies a diff to the live filesystem.

Cheers,
Wol

Paul Ezvan

unread,
Feb 4, 2024, 6:00:06 AMFeb 4
to
Le 30/01/2024 à 19:15, Grant Edwards a écrit :
> I need to set up some sort of automated backup on a couple Gentoo
> machines (typical desktop software development and home use). One of
> them used rsnapshot in the past but the crontab entries that drove
> that have vanished :/ (presumably during a reinstall or upgrade --
> IIRC, it took a fair bit of trial and error to get the crontab entries
> figured out).
>
> I believe rsnapshot ran nightly and kept daily snapshots for a week,
> weekly snapshots for a month, and monthly snapshots for a couple
> years.
>
> Are there other backup solutions that people would like to suggest I
> look at to replace rsnapshot? I was happy enough with rsnapshot (when
> it was running), but perhaps there's something else I should consider?
>
> --
> Grant

I use restic [1] with the S3 backend. It manages snapshots and supports
several backends. It is way faster than the previous backup solution
I've used (dejadup).

It came handy several times when I had to restore a specific file from a
point in time.

1: https://packages.gentoo.org/packages/app-backup/restic

Grant Edwards

unread,
Feb 4, 2024, 10:50:05 AMFeb 4
to
On 2024-02-04, Wols Lists <antl...@youngman.org.uk> wrote:
> On 04/02/2024 06:24, Grant Edwards wrote:
>
>> I don't understand, are you saying that somehow your backup doesn't
>> contain a copy of every file?
>>
> YES! Let's make it clear though, we're talking about EVERY VERSION of
> every backed up file.

> And you need to get your head round the fact I'm not - actually -
> backing up my filesystem. I'm actually snapshoting my disk volume, my
> disk partition if you like.

OK I see. That's a bit different than what I'm doing. I'm backing up
a specific set of directory trees from a couple different
filesystems. There are large portions of the "source" filesystems that
I have no need to back up. And within those directory trees that do
get backed up there are also some excluded subtrees.

> Your strategy contains a copy of every file in your original backup, a
> full copy of the file structure for every snapshot, and a full copy of
> every version of every file that's been changed.

Right.

> My version contains a complete copy of the current backup and
> (thanks to the magic of lvm) a block level diff of every snapshot,
> which appears to the system as a complete backup, despite taking up
> much less space than your typical incremental backup.

If I were backing up entire filesystems, I can see how that would
definitely be true.

> To change analogies completely - think git. My lvm snapshot is like
> a git commit. Git only stores the current HEAD, and retrieves
> previous commits by applying diffs. If I "check out a backup" (ie
> mount a backup volume), lvm applies a diff to the live filesystem.

Got it, thanks.

Wols Lists

unread,
Feb 5, 2024, 3:30:06 AMFeb 5
to
On 04/02/2024 15:48, Grant Edwards wrote:
> OK I see. That's a bit different than what I'm doing. I'm backing up
> a specific set of directory trees from a couple different
> filesystems. There are large portions of the "source" filesystems that
> I have no need to back up. And within those directory trees that do
> get backed up there are also some excluded subtrees.

But my scheme still works here. The filesystem I'm snapshotting is the
backup. As such, it only contains the stuff I want backed up, copied
across using rsync.

There's nothing stopping me running several rsyncs from the live system,
from several different partitions, to the backup partition.

Cheers,
Wol

J. Roeleveld

unread,
Feb 5, 2024, 7:50:06 AMFeb 5
to

On Wednesday, January 31, 2024 2:01:32 PM CET Rich Freeman wrote:

> On Wed, Jan 31, 2024 at 6:45 AM John Covici <cov...@ccs.covici.com> wrote:

> > I know you said you wanted to stay with ext4, but going to zfs reduced

> > my backup time on my entire system from several hours to just a few

> > minutes because taking a snapshot is so quick and copying to another

> > pool is also very quick.

>

> Honestly, at this point I would not run any storage I cared about on

> anything but zfs.  There are just so many benefits.

>

> I'd consider btrfs, but I'd have to dig into whether the reliability

> issues have been solved. I was using that for a while, but I found

> that even features that were touted as reliable had problems from time

> to time.  That was years ago, however.  On paper I think it is the

> better option, but I just need to confirm whether I can trust it.


I actually looked into the state of btrfs last week and it's still far from usable and not even close to what ZFS offers.


For a good read:

https://arstechnica.com/gadgets/2021/09/examining-btrfs-linuxs-perpetually-half-finished-filesystem/


In short:

- raid5/6/.. are still broken.

- Missing drive prevent boot unless you tell it to accept a missing drive.

- Replacing a broken drive requires a lot of steps to make it sane again


--

Joost

J. Roeleveld

unread,
Feb 5, 2024, 8:00:06 AMFeb 5
to
On Wednesday, January 31, 2024 6:56:47 PM CET Rich Freeman wrote:
> On Wed, Jan 31, 2024 at 12:40 PM Thelma <the...@sys-concept.com> wrote:
> > If zfs file system is superior to ext4 and it seems to it is.
> > Why hasn't it been adopted more widely in Linux?
>
> The main barrier is that its license isn't GPL-compatible. It is
> FOSS, but the license was basically designed to keep it from being
> incorporated into the mainline kernel.

Which isn't as much of an issue as it sounds. You can still add it into the
initramfs and can easily load the module.
And the code still works with the functions the kernel devs pushed behind the
GPL-wall if you simply remove that wall from your own kernel.
(Which is advisable as it will improve performance)

> The odd thing is that right now Oracle controls both ZFS and btrfs,
> with the latter doing mostly the same thing and being GPL-compatible,
> but it hasn't tended to be as stable. So we're in a really long
> transitional period to btrfs becoming as reliable.

After all this time, I have given up on waiting for btrfs. As mentioned in my
other reply, it's still nowhere near reliable.

> ZFS also cannot be shrunk as easily. I think that is something that
> has been improved more recently, but I'm not certain of the state of
> it. Also, bootloaders like grub aren't 100% compatible with all of
> its later features, and it isn't even clear in the docs which ones are
> and aren't supported. So it doesn't hurt to keep /boot off of zfs.

To make this easier, there is a compatiblity option when creating a new zpool.
It's also listed in the zfs-kmod ebuild:
- zpool create -o compatibility=*grub*2 ...
- Refer to /usr/share/zfs/compatibility.d/*grub*2 for list of features.

--
Joost

Rich Freeman

unread,
Feb 5, 2024, 8:40:05 AMFeb 5
to
First, thanks for the Ars link in the other email. I'll give that a read.

On Mon, Feb 5, 2024 at 7:55 AM J. Roeleveld <jo...@antarean.org> wrote:
>
> On Wednesday, January 31, 2024 6:56:47 PM CET Rich Freeman wrote:
> > The main barrier is that its license isn't GPL-compatible. It is
> > FOSS, but the license was basically designed to keep it from being
> > incorporated into the mainline kernel.
>
> Which isn't as much of an issue as it sounds. You can still add it into the
> initramfs and can easily load the module.
> And the code still works with the functions the kernel devs pushed behind the
> GPL-wall if you simply remove that wall from your own kernel.
> (Which is advisable as it will improve performance)

So, that's great for random individuals, but companies are going to be
hesitant to do that, especially for anything they redistribute. This
is part of why it isn't mainstream.

A big part of the reason that Linux is mainstream is that it doesn't
have any legal/license encumbrances. If you have 100 instances of
something and want to have 200 instances, you just turn a dial or add
hardware. There isn't anybody you need to get permission from or pay.

Personally I think the idea that the GPL prevents linking, or that you
can have GPL-only APIs is legally ridiculous. Then again, I thought
some of the court rulings in the Oracle vs Google Android lawsuits
were ridiculous as well around API copyrighting. Dynamic linking is
just adding a look up table to your program. It would be like suing
me because I advised you to call somebody, arguing that by telling you
to call somebody I was violating the copyright of the phone book.
Linking is just an instruction to the loader to go find some symbol
and substitute its address at this location.

However, MANY people would disagree with what I just said, and some
might even sue a company that published a large piece of software that
failed to comply with the mainstream interpretation of the GPL. A
court might agree with them and award damages. I think they and the
court would be wrong in that case, but the police don't care what I
think, and they do care what the court thinks.

The result is that the murky legal situation makes ZFS unattractive.
If I were publishing some large commercial software package, I'd
personally be hesitant to embrace ZFS on Linux in it for that reason,
even though I use it all the time personally.

>
> > The odd thing is that right now Oracle controls both ZFS and btrfs,
> > with the latter doing mostly the same thing and being GPL-compatible,
> > but it hasn't tended to be as stable. So we're in a really long
> > transitional period to btrfs becoming as reliable.
>
> After all this time, I have given up on waiting for btrfs. As mentioned in my
> other reply, it's still nowhere near reliable.

Clearly Oracle likes this state of affairs. Either that, or they are
encumbered in some way from just GPLing the ZFS code. Since they on
paper own the code for both projects it seems crazy to me that this
situation persists.

>
> To make this easier, there is a compatiblity option when creating a new zpool.
> It's also listed in the zfs-kmod ebuild:
> - zpool create -o compatibility=*grub*2 ...
> - Refer to /usr/share/zfs/compatibility.d/*grub*2 for list of features.

Oh, that is VERY helpful. I've found random many-years-old webpages
with the appropriate instructions, but something that is part of the
maintained project is much more useful.

Granted, I think the bottom line is that boot probably shouldn't be on
the same filesystem as large volumes of data, as these feature
restrictions are going to be cumbersome. I'm guessing you can't
shrink vdevs, for example.

--
Rich

J. Roeleveld

unread,
Feb 6, 2024, 8:20:05 AMFeb 6
to
On Monday, February 5, 2024 2:35:12 PM CET Rich Freeman wrote:
> First, thanks for the Ars link in the other email. I'll give that a read.

You're welcome. I found that when I was looking for the latest state of btrfs.
I was actually hoping that the biggest issues had been resolved by now.

> On Mon, Feb 5, 2024 at 7:55 AM J. Roeleveld <jo...@antarean.org> wrote:
> > On Wednesday, January 31, 2024 6:56:47 PM CET Rich Freeman wrote:
> > > The main barrier is that its license isn't GPL-compatible. It is
> > > FOSS, but the license was basically designed to keep it from being
> > > incorporated into the mainline kernel.
> >
> > Which isn't as much of an issue as it sounds. You can still add it into
> > the
> > initramfs and can easily load the module.
> > And the code still works with the functions the kernel devs pushed behind
> > the GPL-wall if you simply remove that wall from your own kernel.
> > (Which is advisable as it will improve performance)
>
> So, that's great for random individuals, but companies are going to be
> hesitant to do that, especially for anything they redistribute. This
> is part of why it isn't mainstream.

Not for Linux. *BSD has no such issues and that is why the mainstream SAN/NAS
distributions are based on *BSD. (replace '*' with your preferred flavour)

> A big part of the reason that Linux is mainstream is that it doesn't
> have any legal/license encumbrances. If you have 100 instances of
> something and want to have 200 instances, you just turn a dial or add
> hardware. There isn't anybody you need to get permission from or pay.
>
<snipped>

> The result is that the murky legal situation makes ZFS unattractive.
> If I were publishing some large commercial software package, I'd
> personally be hesitant to embrace ZFS on Linux in it for that reason,
> even though I use it all the time personally.

Proxmox has ZFS native and afaik, it is using Linux?

> > > The odd thing is that right now Oracle controls both ZFS and btrfs,
> > > with the latter doing mostly the same thing and being GPL-compatible,
> > > but it hasn't tended to be as stable. So we're in a really long
> > > transitional period to btrfs becoming as reliable.
> >
> > After all this time, I have given up on waiting for btrfs. As mentioned in
> > my other reply, it's still nowhere near reliable.
>
> Clearly Oracle likes this state of affairs. Either that, or they are
> encumbered in some way from just GPLing the ZFS code. Since they on
> paper own the code for both projects it seems crazy to me that this
> situation persists.

GPL is not necessarily the best license for releasing code. I've got some
private projects that I could publish. But before I do that, I'd have to
decide on a License. I would prefer something other than GPL.

> > To make this easier, there is a compatiblity option when creating a new
> > zpool. It's also listed in the zfs-kmod ebuild:
> > - zpool create -o compatibility=*grub*2 ...
> > - Refer to /usr/share/zfs/compatibility.d/*grub*2 for list of features.
>
> Oh, that is VERY helpful. I've found random many-years-old webpages
> with the appropriate instructions, but something that is part of the
> maintained project is much more useful.
>
> Granted, I think the bottom line is that boot probably shouldn't be on
> the same filesystem as large volumes of data, as these feature
> restrictions are going to be cumbersome. I'm guessing you can't
> shrink vdevs, for example.

I actually have the kernel and initramfs on a EFI boot partition and that is
enough to get the zpool mounted for use.

There is also "ZFSBootMenu" which, afaik, doesn't need this:

https://docs.zfsbootmenu.org/en/latest/index.html

--
Joost

Grant Edwards

unread,
Feb 6, 2024, 10:40:05 AMFeb 6
to
Ah! Got it. That's one of the things I've been trying to figure out
this entire thread, do I need to switch home and root to ZFS to take
advantage of its snapshot support for backups? In the case you're
describing the "source" filesystem(s) can be anything. It's only the
_backup_ filesystem that needs to be ZFS (or similar).

If (like rsnapshot/rsync's hard-link scheme) ZFS snapshots are normal
directory trees that can be "browsed" with normal filesystem tools,
that would be ideal. [I'll do some googling...]

--
Grant

Grant Edwards

unread,
Feb 6, 2024, 10:50:05 AMFeb 6
to
On 2024-02-05, J. Roeleveld <jo...@antarean.org> wrote:
> On Wednesday, January 31, 2024 6:56:47 PM CET Rich Freeman wrote:
>> On Wed, Jan 31, 2024 at 12:40 PM Thelma <the...@sys-concept.com> wrote:
>> > If zfs file system is superior to ext4 and it seems to it is.
>> > Why hasn't it been adopted more widely in Linux?
>>
>> The main barrier is that its license isn't GPL-compatible. It is
>> FOSS, but the license was basically designed to keep it from being
>> incorporated into the mainline kernel.
>
> Which isn't as much of an issue as it sounds. You can still add it
> into the initramfs and can easily load the module.

What if you don't use an initrd?

I presume that boot/root on ext4 and home on ZFS would not require an
initrd?

J. Roeleveld

unread,
Feb 6, 2024, 11:20:06 AMFeb 6
to
Yes, that wouldn't require an initrd. But why would you limit this?
ZFS works best when given the FULL drive.

For my server, I use "bliss-initramfs" to generate the initramfs and have not
had any issues with this since I started using ZFS.

Especially the ease of generating snapshots also make it really easy to roll
back an update if anything went wrong. If your root-partition isn't on ZFS,
you can't easily roll back.

--
Joost

J. Roeleveld

unread,
Feb 6, 2024, 11:30:05 AMFeb 6
to
On Tuesday, February 6, 2024 4:35:34 PM CET Grant Edwards wrote:
> On 2024-02-05, Wols Lists <antl...@youngman.org.uk> wrote:
> > On 04/02/2024 15:48, Grant Edwards wrote:
> >> OK I see. That's a bit different than what I'm doing. I'm backing up
> >> a specific set of directory trees from a couple different
> >> filesystems. There are large portions of the "source" filesystems that
> >> I have no need to back up. And within those directory trees that do
> >> get backed up there are also some excluded subtrees.
> >
> > But my scheme still works here. The filesystem I'm snapshotting is the
> > backup. As such, it only contains the stuff I want backed up, copied
> > across using rsync.
> >
> > There's nothing stopping me running several rsyncs from the live system,
> > from several different partitions, to the backup partition.
>
> Ah! Got it. That's one of the things I've been trying to figure out
> this entire thread, do I need to switch home and root to ZFS to take
> advantage of its snapshot support for backups? In the case you're
> describing the "source" filesystem(s) can be anything. It's only the
> _backup_ filesystem that needs to be ZFS (or similar).

If you want to use snapshots, the filesystem will need to support it. (either
LVM or ZFS). If you only want to create snapshots on the backupserver, I
actually don't see much benefit over using rsync.

> If (like rsnapshot/rsync's hard-link scheme) ZFS snapshots are normal
> directory trees that can be "browsed" with normal filesystem tools,
> that would be ideal. [I'll do some googling...]

ZFS snapshots can be accessed using normal tools and can even be exposed over
NFS mounts making it super easy to find the files again.

They are normally not visible though, you need to access them specifically
using "/filesystem/path/.zfs/snapshot"

--
Joost

Grant Edwards

unread,
Feb 6, 2024, 12:30:04 PMFeb 6
to
On 2024-02-06, J. Roeleveld <jo...@antarean.org> wrote:

> If you want to use snapshots, the filesystem will need to support it. (either
> LVM or ZFS). If you only want to create snapshots on the backupserver, I
> actually don't see much benefit over using rsync.

Upthread I've been told that ZFS snapshots

1. Require far less disk space than rsync's snapshots.
2. Are far faster.
3. Are atomic.

>> If (like rsnapshot/rsync's hard-link scheme) ZFS snapshots are normal
>> directory trees that can be "browsed" with normal filesystem tools,
>> that would be ideal. [I'll do some googling...]
>
> ZFS snapshots can be accessed using normal tools and can even be exposed over
> NFS mounts making it super easy to find the files again.
>
> They are normally not visible though, you need to access them specifically
> using "/filesystem/path/.zfs/snapshot"

Great, that's exactly what I would hope for. I'm reading up on ZFS,
and from what I've gleaned so far, it seems lake ZFS source and ZFS
backup certainly would be ideal.

It's almost like the ZFS filesystem designers had thought about "how
to backup" from the start. Something that all of the old-school
filesystem designers clearly hadn't. :)

Grant Edwards

unread,
Feb 6, 2024, 12:30:04 PMFeb 6
to
On 2024-02-06, J. Roeleveld <jo...@antarean.org> wrote:
> On Tuesday, February 6, 2024 4:38:11 PM CET Grant Edwards wrote:
>> On 2024-02-05, J. Roeleveld <jo...@antarean.org> wrote:
>> > On Wednesday, January 31, 2024 6:56:47 PM CET Rich Freeman wrote:
>> >> On Wed, Jan 31, 2024 at 12:40 PM Thelma <the...@sys-concept.com> wrote:
>> >> > If zfs file system is superior to ext4 and it seems to it is.
>> >> > Why hasn't it been adopted more widely in Linux?
>> >>
>> >> The main barrier is that its license isn't GPL-compatible. It is
>> >> FOSS, but the license was basically designed to keep it from being
>> >> incorporated into the mainline kernel.
>> >
>> > Which isn't as much of an issue as it sounds. You can still add it
>> > into the initramfs and can easily load the module.
>>
>> What if you don't use an initrd?
>>
>> I presume that boot/root on ext4 and home on ZFS would not require an
>> initrd?
>
> Yes, that wouldn't require an initrd. But why would you limit this?

Because I really, really dislike having to use an initrd. That's
probably just an irrational 30 year old prejudice, but over the
decades I've found live to be far simpler and more pleasant without
initrds. Maybe things have improved over the years, but way back when
I did use distros that required initrds, they seem to be a constant,
nagging source of headaches.

> ZFS works best when given the FULL drive.

Where do you put swap?

> For my server, I use "bliss-initramfs" to generate the initramfs and
> have not had any issues with this since I started using ZFS.
>
> Especially the ease of generating snapshots also make it really easy
> to roll back an update if anything went wrong. If your
> root-partition isn't on ZFS, you can't easily roll back.

True. However, I've never adopted the practice of backing up my root
fs (except for a few specific directories like /etc), and haven't ever
really run into situations where I wished I had. It's all stuff that
can easily be reinstalled.

--
Grant

Wols Lists

unread,
Feb 6, 2024, 3:30:06 PMFeb 6
to
On 06/02/2024 13:12, J. Roeleveld wrote:
>> Clearly Oracle likes this state of affairs. Either that, or they are
>> encumbered in some way from just GPLing the ZFS code. Since they on
>> paper own the code for both projects it seems crazy to me that this
>> situation persists.

> GPL is not necessarily the best license for releasing code. I've got some
> private projects that I could publish. But before I do that, I'd have to
> decide on a License. I would prefer something other than GPL.

Okay. What do you want to achieve. Let's just lump licences into two
categories to start with and ask the question "Who do you want to free?"

If that sounds weird, it's because both Copyleft and Permissive claim to
be free, but have completely different target audiences. Once you've
answered that question, it'll make choosing a licence so much easier.

GPL gives freedom to the END USER. It's intended to protect the users of
your program from being held to ransom.

Permissive gives freedom to the DEVELOPER. It's intended to let other
programmers take advantage of your code and use it.

Once you've decided what sort of licence you want, it'll be easier to
decide what licence you want.

Cheers,
Wol

Wols Lists

unread,
Feb 6, 2024, 3:50:05 PMFeb 6
to
On 06/02/2024 15:35, Grant Edwards wrote:
> If (like rsnapshot/rsync's hard-link scheme) ZFS snapshots are normal
> directory trees that can be "browsed" with normal filesystem tools,
> that would be ideal. [I'll do some googling...]

Bear in mind I'm talking lvm snapshots, not ZFS ...

And you can configure snapshots to grow as required.

I know it's nothing really to do with backups, but if you read the raid
wiki page https://raid.wiki.kernel.org/index.php/System2020 - that's how
I set up my system, with a smattering of lvm. Just look at the sections
pvcreate, vgcreate, lvcreate, it'll tell you how to create the lvm. Then
you just format your lvcreate'd partitions, and you can mount them,
snapshot them, whatever them.

So you can either have one backup partition per source partition, or one
backup partition and copy all your sources into just that one.

Your choice :-)

Cheers,
Wol

Wols Lists

unread,
Feb 6, 2024, 6:20:06 PMFeb 6
to
On 06/02/2024 16:19, J. Roeleveld wrote:
>> Ah! Got it. That's one of the things I've been trying to figure out
>> this entire thread, do I need to switch home and root to ZFS to take
>> advantage of its snapshot support for backups? In the case you're
>> describing the "source" filesystem(s) can be anything. It's only the
>> _backup_ filesystem that needs to be ZFS (or similar).

> If you want to use snapshots, the filesystem will need to support it. (either
> LVM or ZFS). If you only want to create snapshots on the backupserver, I
> actually don't see much benefit over using rsync.

Because snapshotting uses so much less space?

So much so that, for normal usage, I probably have no need to delete any
snapshots, for YEARS?

Okay, space is not an expensive commodity, and you don't want too many
snapshots, simply because digging through all those snapshots would be a
nightmare, but personally I wouldn't use a crude rsync simply because I
prefer to be frugal in my use of resources.

Cheers,
Wol

J. Roeleveld

unread,
Feb 7, 2024, 6:10:06 AMFeb 7
to
On Tuesday, February 6, 2024 6:29:09 PM CET Grant Edwards wrote:
> On 2024-02-06, J. Roeleveld <jo...@antarean.org> wrote:
> > If you want to use snapshots, the filesystem will need to support it.
> > (either LVM or ZFS). If you only want to create snapshots on the
> > backupserver, I actually don't see much benefit over using rsync.
>
> Upthread I've been told that ZFS snapshots
>
> 1. Require far less disk space than rsync's snapshots.
> 2. Are far faster.
> 3. Are atomic.

True, but the speed is reduced by relying on rsync to copy data from your PC
to the backupserver.

> > They are normally not visible though, you need to access them specifically
> > using "/filesystem/path/.zfs/snapshot"
>
> Great, that's exactly what I would hope for. I'm reading up on ZFS,
> and from what I've gleaned so far, it seems lake ZFS source and ZFS
> backup certainly would be ideal.
>
> It's almost like the ZFS filesystem designers had thought about "how
> to backup" from the start. Something that all of the old-school
> filesystem designers clearly hadn't. :)

I think it's also mainly there to quickly keep a backup server on standby for
a quick switch over.

--
Joost

J. Roeleveld

unread,
Feb 7, 2024, 6:20:05 AMFeb 7
to
On Wednesday, February 7, 2024 12:17:03 AM CET Wols Lists wrote:
> On 06/02/2024 16:19, J. Roeleveld wrote:
> >> Ah! Got it. That's one of the things I've been trying to figure out
> >> this entire thread, do I need to switch home and root to ZFS to take
> >> advantage of its snapshot support for backups? In the case you're
> >> describing the "source" filesystem(s) can be anything. It's only the
> >> _backup_ filesystem that needs to be ZFS (or similar).
> >
> > If you want to use snapshots, the filesystem will need to support it.
> > (either LVM or ZFS). If you only want to create snapshots on the
> > backupserver, I actually don't see much benefit over using rsync.
>
> Because snapshotting uses so much less space?
>
> So much so that, for normal usage, I probably have no need to delete any
> snapshots, for YEARS?

My comment was based on using rsync to copy from the source to the backup
filesystem.

> Okay, space is not an expensive commodity, and you don't want too many
> snapshots, simply because digging through all those snapshots would be a
> nightmare, but personally I wouldn't use a crude rsync simply because I
> prefer to be frugal in my use of resources.

What is "too many"?
I currently have about 1800 snapshots on my server. Do have a tool that
ensures it doesn't get out of hand and will remove several over time.

--
Joost

J. Roeleveld

unread,
Feb 7, 2024, 6:20:05 AMFeb 7
to
On Tuesday, February 6, 2024 9:27:35 PM CET Wols Lists wrote:
> On 06/02/2024 13:12, J. Roeleveld wrote:
> >> Clearly Oracle likes this state of affairs. Either that, or they are
> >> encumbered in some way from just GPLing the ZFS code. Since they on
> >> paper own the code for both projects it seems crazy to me that this
> >> situation persists.
> >
> > GPL is not necessarily the best license for releasing code. I've got some
> > private projects that I could publish. But before I do that, I'd have to
> > decide on a License. I would prefer something other than GPL.
>
> Okay. What do you want to achieve. Let's just lump licences into two
> categories to start with and ask the question "Who do you want to free?"

I want my code to be usable by anyone, but don't want anyone to fork it and
start making money off of it without giving me a fair share.

> If that sounds weird, it's because both Copyleft and Permissive claim to
> be free, but have completely different target audiences. Once you've
> answered that question, it'll make choosing a licence so much easier.
>
> GPL gives freedom to the END USER. It's intended to protect the users of
> your program from being held to ransom.

That's not how the kernel devs handle the GPL. They use it to remove choice
from the end user (me) to use what I want (ZFS).
And it's that which I don't like about the GPL.

> Permissive gives freedom to the DEVELOPER. It's intended to let other
> programmers take advantage of your code and use it.
>
> Once you've decided what sort of licence you want, it'll be easier to
> decide what licence you want.

See above

--
Joost

J. Roeleveld

unread,
Feb 7, 2024, 6:30:06 AMFeb 7
to
On Tuesday, February 6, 2024 6:22:34 PM CET Grant Edwards wrote:
> On 2024-02-06, J. Roeleveld <jo...@antarean.org> wrote:
> > On Tuesday, February 6, 2024 4:38:11 PM CET Grant Edwards wrote:

> >> I presume that boot/root on ext4 and home on ZFS would not require an
> >> initrd?
> >
> > Yes, that wouldn't require an initrd. But why would you limit this?
>
> Because I really, really dislike having to use an initrd. That's
> probably just an irrational 30 year old prejudice, but over the
> decades I've found live to be far simpler and more pleasant without
> initrds. Maybe things have improved over the years, but way back when
> I did use distros that required initrds, they seem to be a constant,
> nagging source of headaches.

In the past, initrd's were a nightmare. Even the current tools (dracut,
genkernel) are a pain and force the user to do it their way.
The only initramfs generator I use is the "bliss-initramfs" one and that is
because it actually works and doesn't get in the way.
And I don't build a new kernel for the server.

For my desktops and laptops, I embed the initramfs into the kernel using a
very simple set of files (script with the commands and a config detailing which
files to include)
the total size of both files is about 8K and was mostly grabbed from a howto
page about 10 years ago and has stayed unchanged since then. (I added a little
script to update the config when library versions change, but that is it)

> > ZFS works best when given the FULL drive.
>
> Where do you put swap?

My swap is a ZFS volume. I find using the recommended method of configuring it
is safe and I have not seen any kind of lockup due to swap.
Did have some due to a bug in the HBA-driver when some deranged dev decided to
change sensible defaults though. But it would freeze before even getting to
enabling swap.

> > For my server, I use "bliss-initramfs" to generate the initramfs and
> > have not had any issues with this since I started using ZFS.
> >
> > Especially the ease of generating snapshots also make it really easy
> > to roll back an update if anything went wrong. If your
> > root-partition isn't on ZFS, you can't easily roll back.
>
> True. However, I've never adopted the practice of backing up my root
> fs (except for a few specific directories like /etc), and haven't ever
> really run into situations where I wished I had. It's all stuff that
> can easily be reinstalled.

I did start backup up the full system as restoring from backup (especially
rolling back a snapshot, but same is true when grabbing the backup from tape)
is a lot faster than reinstalling all the software and making sure the config
(which these days isn't just in /etc anymore) is still the same.

--
Joost

Wols Lists

unread,
Feb 7, 2024, 5:00:05 PMFeb 7
to
On 07/02/2024 11:11, J. Roeleveld wrote:
> On Tuesday, February 6, 2024 9:27:35 PM CET Wols Lists wrote:
>> On 06/02/2024 13:12, J. Roeleveld wrote:
>>>> Clearly Oracle likes this state of affairs. Either that, or they are
>>>> encumbered in some way from just GPLing the ZFS code. Since they on
>>>> paper own the code for both projects it seems crazy to me that this
>>>> situation persists.
>>>
>>> GPL is not necessarily the best license for releasing code. I've got some
>>> private projects that I could publish. But before I do that, I'd have to
>>> decide on a License. I would prefer something other than GPL.
>>
>> Okay. What do you want to achieve. Let's just lump licences into two
>> categories to start with and ask the question "Who do you want to free?"
>
> I want my code to be usable by anyone, but don't want anyone to fork it and
> start making money off of it without giving me a fair share.

Okay, that instantly says you want a copyleft licence. So you're stuck
with a GPL-style licence, and if they want to include it in a commercial
closed source product, they need to come back to you and dual licence it.

Personally, I'd go the MPL2 route, but that's my choice. It might not
suit you. But to achieve what you want, you need a copyleft, GPL-style
licence.
>
>> If that sounds weird, it's because both Copyleft and Permissive claim to
>> be free, but have completely different target audiences. Once you've
>> answered that question, it'll make choosing a licence so much easier.
>>
>> GPL gives freedom to the END USER. It's intended to protect the users of
>> your program from being held to ransom.
>
> That's not how the kernel devs handle the GPL. They use it to remove choice
> from the end user (me) to use what I want (ZFS).
> And it's that which I don't like about the GPL.
>
No. That's Oracle's fault. The kernel devs can't include ZFS in linux,
because Oracle (or rather Sun, at the time, I believe) deliberately
*designed* the ZFS licence to be incompatible with the GPL.

After all, there's nothing stopping *you* from combining Linux and ZFS,
it's just that somebody else can't do that for you, and then give you
the resulting binary.

At the end of the day, if someone wants to be an arsehole, there's not a
lot you can do to stop them, and with ZFS that honour apparently goes to
Sun.

Cheers,
Wol

Wols Lists

unread,
Feb 7, 2024, 5:00:06 PMFeb 7
to
On 07/02/2024 11:07, J. Roeleveld wrote:
>> Because snapshotting uses so much less space?
>>
>> So much so that, for normal usage, I probably have no need to delete any
>> snapshots, for YEARS?
> My comment was based on using rsync to copy from the source to the backup
> filesystem.

Well, that's EXACTLY what I'm doing too. NO DIFFERENCE. Actually, there
is a minor difference - because I'm using lvm, I'm also using rsync's
"overwrite in place" switch. In other words, it compares source and
destination *in*place*, and if any block has changed, it overwrites the
change, rather than creating a complete new copy.

Because lvm is COW, that means I have two copies of the file, in two
different snapshots, but inasmuch as the files are identical, there's
only one copy of the identical bits.
>
>> Okay, space is not an expensive commodity, and you don't want too many
>> snapshots, simply because digging through all those snapshots would be a
>> nightmare, but personally I wouldn't use a crude rsync simply because I
>> prefer to be frugal in my use of resources.

> What is "too many"?
> I currently have about 1800 snapshots on my server. Do have a tool that
> ensures it doesn't get out of hand and will remove several over time.
>
"Too many" is whatever you define it to be. I'm likely to hang on to my
/home snapshots for yonks. My / snapshots, on the other hand, I delete
anything more than a couple of months old.

If I can store several years of /home snapshots without running out of
space, why shouldn't I? The problem, if I *am* running out of space, I'm
going to have to delete a *lot* of snapshots to make much difference...

Cheers,
Wol

Frank Steinmetzger

unread,
Feb 7, 2024, 5:40:05 PMFeb 7
to
Am Tue, Jan 30, 2024 at 06:15:09PM -0000 schrieb Grant Edwards:
> I need to set up some sort of automated backup on a couple Gentoo
> machines (typical desktop software development and home use). One of
> them used rsnapshot in the past but the crontab entries that drove
> that have vanished :/ (presumably during a reinstall or upgrade --
> IIRC, it took a fair bit of trial and error to get the crontab entries
> figured out).
>
> I believe rsnapshot ran nightly and kept daily snapshots for a week,
> weekly snapshots for a month, and monthly snapshots for a couple
> years.
>
> Are there other backup solutions that people would like to suggest I
> look at to replace rsnapshot? I was happy enough with rsnapshot (when
> it was running), but perhaps there's something else I should consider?

In my early backup times I, too, used rsnapshot to back up my ~ and rsync
for my big media files. But that only included my PC. My laptop was wholly
un-backed-up. I only syncronised much of my home and my audio collection
between the two with unison. At some point my external 3 TB drive became
free and then I started using borg to finally do proper backups.

Borg is very similar to restic, I actually used the two in parallel for a
while to compare them, but stayed with borg. One pain point was that I
couln’t switch off restic’s own password protection. Since all my backup
disks are LUKSed anyway, I don’t need that.

Since borg works block-based, it does deduplication without extra cost and
it is suitable for big image files which don’t change much. I do full
filesystem backups of /, ~ and my media partition of my main PC and my
laptop. I have one repository for each of those three filesystems, and each
repo receives the data from both machines, so they are deduped. Since both
machines run Arch, their roots are binary identical. The same goes for my
unison-synced homes.

Borg has retention logic built-in. You can say I want to keep the latest
archive of each of the last 6 days/weeks/months/years, and it even goes down
to seconds. And of course you can combine those rules. The only thing is
they don’t overlap, meaning if you want to keep the last 14 days and the
last four weeks, those weekly retentions start after the last daily
snapshots.

In summary, advantages:
+ fast dedup, built-in compression (different algos and levels configurable)
+ big data files allow for quick mirroring of repositories.
I simply rsync my primary backup disk to two other external HDDs.
+ Incremental backups are quite fast because borg uses a cache to detect
changed files quickly.
Disadvantages:
- you need borg to mount the backups it
- it is not as fast as native disk access, especially during restore and
when getting a total file listing due to lots of random I/O on the HDD.


As example, I currently have 63 snapshots in my data partition repository:

# borg list data/
tp_2021-06-07 Mon, 2021-06-07 16:27:44 [5f9ebd9f24353c340691b2a71f5228985a41699d2e23473ae4e9e795669c8440]
kern_2021-06-07 Mon, 2021-06-07 23:58:56 [19c76211a9c35432e6a66ac1892ee19a08368af28d2d621f509af3d45f203d43]
[... 55 more lines ...]
kern_2024-01-14 Sun, 2024-01-14 20:53:23 [499ce7629e64cffb7ec6ec9ffbf0c595e4ede3d93f131a9a4b424b165647f645]
tp_2024-01-14 Sun, 2024-01-14 20:57:42 [ea2baef3e4bb49c5aec7cf8536f7b00b55fb27ecae3a80ef9f5a5686a1da30d5]
kern_2024-01-21 Sun, 2024-01-21 23:42:46 [71aa2ce6cf4021712f949af068498bfda7797b5d1c5ddc0f0ce8862b89e48961]
tp_2024-01-21 Sun, 2024-01-21 23:48:24 [45e35ed9206078667fa62d0e4a1ac213e77f52415f196101d14ee21e79fc393d]
kern_2024-02-04 Sun, 2024-02-04 23:16:43 [e1b015117143fad6b89cea66329faa888cffc990644e157b1d25846220c62448]
tp_2024-02-04 Sun, 2024-02-04 23:23:15 [e9b167ceec1ab9a80cbdb1acf4ff31cd3935fc23e81674cad1b8694d98547aeb]

The last “tp” (Thinkpad) snapshot contains 1 TB, “kern” (my PC) 809 GB.
And here you see how much space this actually takes on disk:

# borg info data/
[ ... ]
Original size Compressed size Deduplicated size
All archives: 56.16 TB 54.69 TB 1.35 TB

Obviously, compression doesn’t do much for media files. But it is very
effective in the repository for the root partitions:

# borg info arch-root/
[ ... ]
Original size Compressed size Deduplicated size
All archives: 1.38 TB 577.58 GB 79.41 GB

--
Grüße | Greetings | Salut | Qapla’
Please do not share anything from, with or about me on any social network.

“She understands. She doesn’t comprehend.” – River Tam, Firefly
signature.asc

William Kenworthy

unread,
Feb 8, 2024, 12:30:05 AMFeb 8
to
I would also like to add my +1 to borgbackup ... I long ago lost the
ability to use snapshots and full size backups due to the sheer amount
of data involved.  Currently I use borg to backup multiple hosts to
individual backups on a  dedicated machine (low power arm based, 6TB
drive).  I also backup from the top level of the directory all those
repos are stored in to another arm system (2TB drive) again using borg. 
As each 1st level backup only adds/changes a few chunks for each
iteration, the second level only takes minutes to run as against
30minutes or so for some of the individual hosts.  The second level adds
redundancy if I lose the 1st level backups, and the second can be
recreated at any time from the 1st level.  This is working for ~15 hosts
and VM's of various types involving hundreds of terabytes of original data.

Downside for VM's is that even a slight change to the image requires the
whole image to be read and check-summed to identify the changes to be
stored.  For images hundreds of gigabytes in size (on my
hardware/network) its actually quicker to mount and backup the internal
files (camera images in my case) than the VM image.

It is more complex than simple schemas, but I regularly restore from
both the first and second level backups for disaster
recovery/testing/rollbacks etc.  There is a management package
(borgmatic) but I have not tried it as use my own scripts.

BillK

J. Roeleveld

unread,
Feb 8, 2024, 1:40:06 AMFeb 8
to
On Wednesday, February 7, 2024 10:50:07 PM CET Wols Lists wrote:
> On 07/02/2024 11:07, J. Roeleveld wrote:
> >> Because snapshotting uses so much less space?
> >>
> >> So much so that, for normal usage, I probably have no need to delete any
> >> snapshots, for YEARS?
> >
> > My comment was based on using rsync to copy from the source to the backup
> > filesystem.
>
> Well, that's EXACTLY what I'm doing too. NO DIFFERENCE. Actually, there
> is a minor difference - because I'm using lvm, I'm also using rsync's
> "overwrite in place" switch. In other words, it compares source and
> destination *in*place*, and if any block has changed, it overwrites the
> change, rather than creating a complete new copy.

I must have missed that in the man-page last time I used rsync. Will have to
recheck and update my notes just in case I need to use rsync again in the
future.

> Because lvm is COW, that means I have two copies of the file, in two
> different snapshots, but inasmuch as the files are identical, there's
> only one copy of the identical bits.
>
> >> Okay, space is not an expensive commodity, and you don't want too many
> >> snapshots, simply because digging through all those snapshots would be a
> >> nightmare, but personally I wouldn't use a crude rsync simply because I
> >> prefer to be frugal in my use of resources.
> >
> > What is "too many"?
> > I currently have about 1800 snapshots on my server. Do have a tool that
> > ensures it doesn't get out of hand and will remove several over time.
>
> "Too many" is whatever you define it to be. I'm likely to hang on to my
> /home snapshots for yonks. My / snapshots, on the other hand, I delete
> anything more than a couple of months old.
>
> If I can store several years of /home snapshots without running out of
> space, why shouldn't I? The problem, if I *am* running out of space, I'm
> going to have to delete a *lot* of snapshots to make much difference...

One of the things I didn't like about LVM was that it would have trouble
dealing with a lot (100+, due to a bug in my script at the time) of snapshots.
And having to manually (or using a script) increase the size given to these
snapshots when a lot of changes are occuring.

ZFS doesn't have this "max amount of changes", but will happily fill up the
entire pool keeping all versions available.
But it was easier to add zpool monitoring for this on ZFS then it was to add
snapshot monitoring to LVM.

I wonder, how do you deal with snapshots getting "full" on your system?

--
Joost

J. Roeleveld

unread,
Feb 8, 2024, 1:40:06 AMFeb 8
to
On Wednesday, February 7, 2024 10:59:38 PM CET Wols Lists wrote:
> On 07/02/2024 11:11, J. Roeleveld wrote:
> > On Tuesday, February 6, 2024 9:27:35 PM CET Wols Lists wrote:
> >> On 06/02/2024 13:12, J. Roeleveld wrote:
> >>>> Clearly Oracle likes this state of affairs. Either that, or they are
> >>>> encumbered in some way from just GPLing the ZFS code. Since they on
> >>>> paper own the code for both projects it seems crazy to me that this
> >>>> situation persists.
> >>>
> >>> GPL is not necessarily the best license for releasing code. I've got
> >>> some
> >>> private projects that I could publish. But before I do that, I'd have to
> >>> decide on a License. I would prefer something other than GPL.
> >>
> >> Okay. What do you want to achieve. Let's just lump licences into two
> >> categories to start with and ask the question "Who do you want to free?"
> >
> > I want my code to be usable by anyone, but don't want anyone to fork it
> > and
> > start making money off of it without giving me a fair share.
>
> Okay, that instantly says you want a copyleft licence. So you're stuck
> with a GPL-style licence, and if they want to include it in a commercial
> closed source product, they need to come back to you and dual licence it.
>
> Personally, I'd go the MPL2 route, but that's my choice. It might not
> suit you. But to achieve what you want, you need a copyleft, GPL-style
> licence.

I'll have a look at that one.

> >> If that sounds weird, it's because both Copyleft and Permissive claim to
> >> be free, but have completely different target audiences. Once you've
> >> answered that question, it'll make choosing a licence so much easier.
> >>
> >> GPL gives freedom to the END USER. It's intended to protect the users of
> >> your program from being held to ransom.
> >
> > That's not how the kernel devs handle the GPL. They use it to remove
> > choice
> > from the end user (me) to use what I want (ZFS).
> > And it's that which I don't like about the GPL.
>
> No. That's Oracle's fault. The kernel devs can't include ZFS in linux,
> because Oracle (or rather Sun, at the time, I believe) deliberately
> *designed* the ZFS licence to be incompatible with the GPL.

Maybe not included fully into the kernel, but there is nothing preventing it
to be packaged with a Linux distribution.
It's just the hostility from Linus Torvalds and Greg Kroah-Hartman against ZFS
causing the issues.

See the following post for a clear description (much better written than I
can):
https://eerielinux.wordpress.com/2019/01/28/zfs-and-gpl-terror-how-much-freedom-is-there-in-linux/

Especially the lkml thread linked from there:
https://lore.kernel.org/lkml/2019011018...@kroah.com/

> After all, there's nothing stopping *you* from combining Linux and ZFS,
> it's just that somebody else can't do that for you, and then give you
> the resulting binary.

Linux (kernel) and ZFS can't be merged. Fine.
But, Linux (the OS, as in, kernel + userspace) and ZFS can be merged legally.

> At the end of the day, if someone wants to be an arsehole, there's not a
> lot you can do to stop them, and with ZFS that honour apparently goes to
> Sun.

See what I put above.

--
Joost

Wols Lists

unread,
Feb 8, 2024, 12:40:06 PMFeb 8
to
On 08/02/2024 06:32, J. Roeleveld wrote:
>> Personally, I'd go the MPL2 route, but that's my choice. It might not
>> suit you. But to achieve what you want, you need a copyleft, GPL-style
>> licence.

> I'll have a look at that one.

Basically, each individual source file is copyleft, but not the work as
a whole. So if anybody copies/modifies YOUR work, they have to
distribute your work with their binary, but this requirement does not
extend to everyone else's work.
>
> Maybe not included fully into the kernel, but there is nothing preventing it
> to be packaged with a Linux distribution.
> It's just the hostility from Linus Torvalds and Greg Kroah-Hartman against ZFS
> causing the issues.
>
> See the following post for a clear description (much better written than I
> can):
> https://eerielinux.wordpress.com/2019/01/28/zfs-and-gpl-terror-how-much-freedom-is-there-in-linux/
>
> Especially the lkml thread linked from there:
> https://lore.kernel.org/lkml/2019011018...@kroah.com/
>
>> After all, there's nothing stopping*you* from combining Linux and ZFS,
>> it's just that somebody else can't do that for you, and then give you
>> the resulting binary.

> Linux (kernel) and ZFS can't be merged. Fine.

But they can.

> But, Linux (the OS, as in, kernel + userspace) and ZFS can be merged legally.
>
Likewise here, they can.

The problem is, the BINARY can NOT be distributed. And the problem is
the ZFS licence, not Linux.

What Linus, and the kernel devs, and that crowd *think* is irrelevant.
What matters is what SUSE, and Red Hat, and Canonical et al think. And
if they're not prepared to take the risk of distributing the kernel with
ZFS built in, because they think it's a legal minefield, then that's
THEIR decision.

That problem doesn't apply to gentoo, because it distributes the linux
kernel and ZFS separately, and combines them ON THE USER'S MACHINE. But
the big distros are not prepared to take the risk of combining linux and
ZFS, and distributing the resulting *derived* *work*.

Cheers,
Wol

Wols Lists

unread,
Feb 8, 2024, 12:50:05 PMFeb 8
to
On 08/02/2024 06:38, J. Roeleveld wrote:
> ZFS doesn't have this "max amount of changes", but will happily fill up the
> entire pool keeping all versions available.
> But it was easier to add zpool monitoring for this on ZFS then it was to add
> snapshot monitoring to LVM.
>
> I wonder, how do you deal with snapshots getting "full" on your system?

As far as I'm, concerned, snapshots are read-only once they're created.
But there is a "grow the snapshot as required" option.

I don't understand it exactly, but what I think happens is when I create
the snapshot it allocates, let's say, 1GB. As I write to the master
copy, it fills up that 1GB with CoW blocks, and the original blocks are
handed over to the backup snapshot. And when that backup snapshot is
full of blocks that have been "overwritten" (or in reality replaced),
lvm just adds another 1GB or whatever I told it to.

So when I delete a snapshot, it just goes through those few blocks,
decrements their use count (if they've been used in multiple snapshots),
and if the use count goes to zero they're handed back to the "empty" pool.

All I have to do is make sure that the sum of my snapshots does not fill
the lv (logical volume). Which in my case is a raid-5.

Cheers,
Wol

J. Roeleveld

unread,
Feb 9, 2024, 8:00:05 AMFeb 9
to
On Thursday, February 8, 2024 6:36:56 PM CET Wols Lists wrote:
> On 08/02/2024 06:32, J. Roeleveld wrote:
> >> After all, there's nothing stopping*you* from combining Linux and ZFS,
> >> it's just that somebody else can't do that for you, and then give you
> >> the resulting binary.
> >
> > Linux (kernel) and ZFS can't be merged. Fine.
>
> But they can.

Not if you want to release it

> > But, Linux (the OS, as in, kernel + userspace) and ZFS can be merged
> > legally.
> Likewise here, they can.
>
> The problem is, the BINARY can NOT be distributed. And the problem is
> the ZFS licence, not Linux.

You can distribute the binary of both, just not embedded into a single binary.

> What Linus, and the kernel devs, and that crowd *think* is irrelevant.

It is, as they are actively working on removing API calls that filesystems like
ZFS actually need and hiding them behind a GPL wall.

> What matters is what SUSE, and Red Hat, and Canonical et al think. And
> if they're not prepared to take the risk of distributing the kernel with
> ZFS built in, because they think it's a legal minefield, then that's
> THEIR decision.

I'm not talking about distributing ZFS embedded into the kernel. It's
perfectly fine to distribute a distribution with ZFS as a kernel module. The
issue is caused by the linux kernel devs blocking access to (previously
existing and open) API calls and limiting them to GPL only.

> That problem doesn't apply to gentoo, because it distributes the linux
> kernel and ZFS separately, and combines them ON THE USER'S MACHINE. But
> the big distros are not prepared to take the risk of combining linux and
> ZFS, and distributing the resulting *derived* *work*.

I would class Ubuntu as a big distribution and proxmox is also used a lot.
Both have ZFS support.

--
Joost

J. Roeleveld

unread,
Feb 9, 2024, 8:10:05 AMFeb 9
to
On Thursday, February 8, 2024 6:44:50 PM CET Wols Lists wrote:
> On 08/02/2024 06:38, J. Roeleveld wrote:
> > ZFS doesn't have this "max amount of changes", but will happily fill up
> > the
> > entire pool keeping all versions available.
> > But it was easier to add zpool monitoring for this on ZFS then it was to
> > add snapshot monitoring to LVM.
> >
> > I wonder, how do you deal with snapshots getting "full" on your system?
>
> As far as I'm, concerned, snapshots are read-only once they're created.
> But there is a "grow the snapshot as required" option.
>
> I don't understand it exactly, but what I think happens is when I create
> the snapshot it allocates, let's say, 1GB. As I write to the master
> copy, it fills up that 1GB with CoW blocks, and the original blocks are
> handed over to the backup snapshot. And when that backup snapshot is
> full of blocks that have been "overwritten" (or in reality replaced),
> lvm just adds another 1GB or whatever I told it to.

That works with a single snapshot.
But, when I last used LVM like this, I would have multiple snapshots. When I
change something on the LV, the original data would be copied to the snapshot.
If I would have 2 snapshots for that LV, both would grow at the same time.

Or is that changed in recent versions?

> So when I delete a snapshot, it just goes through those few blocks,
> decrements their use count (if they've been used in multiple snapshots),
> and if the use count goes to zero they're handed back to the "empty" pool.

I know this is how ZFS snapshots work. But am not convinced LVM snapshots work
the same way.

> All I have to do is make sure that the sum of my snapshots does not fill
> the lv (logical volume). Which in my case is a raid-5.

I assume you mean PV (Physical Volume)?

I actually ditched the whole idea of raid-5 when drives got bigger than 1TB. I
currently use Raid-6 (or specifically RaidZ2, which is the ZFS "equivalent")

--
Joost

Wols Lists

unread,
Feb 9, 2024, 10:50:05 AMFeb 9
to
On 09/02/2024 12:57, J. Roeleveld wrote:
>> I don't understand it exactly, but what I think happens is when I create
>> the snapshot it allocates, let's say, 1GB. As I write to the master
>> copy, it fills up that 1GB with CoW blocks, and the original blocks are
>> handed over to the backup snapshot. And when that backup snapshot is
>> full of blocks that have been "overwritten" (or in reality replaced),
>> lvm just adds another 1GB or whatever I told it to.

> That works with a single snapshot.
> But, when I last used LVM like this, I would have multiple snapshots. When I
> change something on the LV, the original data would be copied to the snapshot.
> If I would have 2 snapshots for that LV, both would grow at the same time.
>
> Or is that changed in recent versions?

Has what changed? As I understand it, the whole point of LVM is that
everything is COW. So any individual block can belong to multiple snapshots.

When you write a block, the original block is not changed. A new block
is linked in to the current snapshot to replace the original. The
original block remains linked in to any other snapshots.

So disk usage basically grows by the number of blocks you write. Taking
a snapshot will use just a couple of blocks, no matter how large your LV is.
>
>> So when I delete a snapshot, it just goes through those few blocks,
>> decrements their use count (if they've been used in multiple snapshots),
>> and if the use count goes to zero they're handed back to the "empty" pool.
> I know this is how ZFS snapshots work. But am not convinced LVM snapshots work
> the same way.
>
>> All I have to do is make sure that the sum of my snapshots does not fill
>> the lv (logical volume). Which in my case is a raid-5.
> I assume you mean PV (Physical Volume)?

Quite possibly. VG, PV, LV. I know which one I need (by reading the
docs), I don't particularly remember which is which off the top of my head.
>
> I actually ditched the whole idea of raid-5 when drives got bigger than 1TB. I
> currently use Raid-6 (or specifically RaidZ2, which is the ZFS "equivalent")
>
Well, I run my raid over dm-integrity so, allegedly, I can't suffer disk
corruption. My only fear is a disk loss, which raid-5 will happily
recover from. And I'm not worried about a double failure - yes it could
happen, but ...

Given that my brother's ex-employer was quite happily running a raid-6
with maybe petabytes of data, over a double disk failure (until an
employee went into the data centre and said "what are those red
lights"), I don't think my 20TB of raid-5 is much :-)

Cheers,
Wol

Peter Humphrey

unread,
Feb 9, 2024, 12:20:04 PMFeb 9
to
On Friday, 9 February 2024 15:48:45 GMT Wols Lists wrote:

> ... And I'm not worried about a double failure - yes it could happen,
> but ...
>
> Given that my brother's ex-employer was quite happily running a raid-6
> with maybe petabytes of data, over a double disk failure (until an
> employee went into the data centre and said "what are those red
> lights"), I don't think my 20TB of raid-5 is much :-)

[OT - anecdote]

I used to work in power generation and transmission (CEGB, for those with long
memories), in which every system was required to be fault tolerant - one fault
at a time. As Wol says, that's fine until your one fault has appeared and not
been noticed. Then another fault appears - and the reactor shuts down!
Carpeting comes next...

Oh, frabjous day!

[/OT]

--
Regards,
Peter.
0 new messages