Using CephFS rstats for speeding up incremental backups

44 views
Skip to first unread message

Burkhard Linke

unread,
Nov 2, 2022, 10:30:32 AM11/2/22
to bareos-devel
Hi,

we are using bareos as backup solution for CephFS based volumes (== subdirectories of a certain mountpoint). Backups jobs are scheduled nightly. Each volume can contain several million files, so traversing them takes a lot of time. It also puts extra load on the CephFS metadata service.

CephFS supports recursive statistics for sub directories, including number of files, aggregated size of files. Example:

root@XXX:/vol# getfattr -n ceph.dir.rbytes backup
# file: backup
ceph.dir.rbytes="7476825741052"

More important than size and number of files is the 'rctime' attribute, which contains the timestamp of the last change in the whole subdirectory tree:

root@XXX:/vol# getfattr -n ceph.dir.rctime backup
# file: backup
ceph.dir.rctime="1667351791.667516362"

If I were able to compare this to the time of the last backup run (either full/differential/incremental), I might be able to skip over whole volumes and speed up the nightly backup significantly.

Questions:

1. Is it possible to run a script within a job and skip the job depending on exit code?
2. Is it possible to pass the timestamp of the last job run to such a script?
3. Can a skip method be implemented in the FD itself? Check whether a directory is using cephfs, check whether recursive stats are available, skip if rctime is older than last job run?
4. Use a FD plugin instead of extending the FD core code?

Best regards,
Burkhard Linke

Andreas Rogge

unread,
Nov 3, 2022, 12:26:03 PM11/3/22
to bareos...@googlegroups.com
Hi Burkhard,

Am 02.11.22 um 15:30 schrieb Burkhard Linke:
> 1. Is it possible to run a script within a job and skip the job
> depending on exit code?
Running scripts is only possible with RunScript right now. Using that
you could fail the job, but that's probably not what you want.

> 2. Is it possible to pass the timestamp of the last job run to such a
> script?
When you run a job it will have a so-called "since-time" which is the
time when the previous job started (when you run a differential job, it
is the time of the full that is based on).
When you configure a RunScript you can use "%s" as a placeholder that
will be replaced by the since-time.

> 3. Can a skip method be implemented in the FD itself? Check whether a
> directory is using cephfs, check whether recursive stats are available,
> skip if rctime is older than last job run?
That would be doable, but the xattr code is pretty far away from the
code that traverses the directory tree, so it is probably not as
straightforward as it might seem at first.

> 4. Use a FD plugin instead of extending the FD core code?
That is probably the way to go - at least to get a working proof of
concept running.
We are currently working on a change that will improve plugin
performance a lot (see PR #1297 [1]). With that change, a plugin can now
pass open file descriptors to the core allowing it to achieve
near-native backup performance with a plugin.

In a plugin you also have much more freedom to adapt to the needs of the
system you're backing up. In case of cephfs you could easily take a
snapshot and then rewrite the paths to remove the /.snap/your-snapshot/
so restore will just write to the right location.

Having said all that, I wonder how you want to proceed.
We can probably provide some limited guidance to get you started
implementing a plugin yourself, or I can get you in touch with sales to
discuss funded development for this.

Best Regards,
Andreas

[1] https://github.com/bareos/bareos/pull/1297/

--
Andreas Rogge andrea...@bareos.com
Bareos GmbH & Co. KG Phone: +49 221-630693-86
http://www.bareos.com

Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
Komplementär: Bareos Verwaltungs-GmbH
Geschäftsführer: S. Dühr, M. Außendorf, J. Steffens, Philipp Storz

Burkhard Linke

unread,
Nov 4, 2022, 6:06:44 AM11/4/22
to bareos-devel
Hi,

I just had a closer look at a plugin implementation (libcloud for processing S3 buckets) and some of the plugin interface and core code.

The idea was to simply skip over all directories  if the rctime based check indicates that the directory has not been changed since the last backup. So a depth first traversal would only hit changed directories.

But I do not see a way to skip the entire directory content without traversing it at least. The incremental jobs indicate the right state for files and directories (FT_NOCHG/FT_DIRNOCHG), but if no information is given at all for
former entries, bareos assumes that the file/directory was deleted. So if a plugin skips a directory completely, its content will be marked as deleted and pruned according to the retention setup.

If this is correct, it is not possible to skip the traversal of unchanged subdirectories (either in plugins or the fd core itself).

Best regards,
Burkhard

Andreas Rogge

unread,
Nov 10, 2022, 8:51:46 AM11/10/22
to bareos...@googlegroups.com
Hi,

Am 04.11.22 um 11:06 schrieb Burkhard Linke:
> I just had a closer look at a plugin implementation (libcloud for
> processing S3 buckets) and some of the plugin interface and core code.
That's probably way too complicated, for what you want the LocalFileset
plugin is probably a better starting point.

> But I do not see a way to skip the entire directory content without
> traversing it at least. The incremental jobs indicate the right state
> for files and directories (FT_NOCHG/FT_DIRNOCHG), but if no information
> is given at all for
> former entries, bareos assumes that the file/directory was deleted. So
> if a plugin skips a directory completely, its content will be marked as
> deleted and pruned according to the retention setup.

> If this is correct, it is not possible to skip the traversal of
> unchanged subdirectories (either in plugins or the fd core itself).

That's the case if you have accurate enabled. For testing your approach
it should be sufficient to test with accurate disabled.
For a real working plugin, you would need a way to traverse the accurate
filelist in the plugin and issue a FT_NOCHG or FT_DIRNOCHG for all files
in directories you skipped.
Maybe we can even provide some way to do that recursively - maybe by
adding something like FT_SUBTREENOCHG to signal that the whole subtree
is guaranteed to have no changes.
I'm not sure how much work that actually is, but it is definitely doable.

Best Regards,
Andreas
Reply all
Reply to author
Forward
0 new messages