Continuous spooling and despooling

Christian Svensson

unread,

Jul 8, 2021, 5:33:54 AM7/8/21

to bareos-users

Good day,

When benchmarking my backup with and without spooling it seems that with spooling the jobs take longer than without spooling, even though the efficiencies of the resources have been increased.

My thinking is that this is due to the fact that Bareos first spools data, then despools it, then spools it again and so on - causing 100% network usage, followed by 100% tape usage alternating.

My tape bandwidth and network bandwidth are about the same, around 1 Gbit/s.

Instead, I would like to propose a change where the spooling process creates two spool files (each bound to 50% of Maximum Spool Size), Spool-A and Spool-B.

When Spool-A is filled up, Bareos-SD starts to despool that file while continuing to spool data to Spool-B.

If Spool-B fills up while Spool-A is running, the spooling is paused until the spooling of Spool-A is done. At that moment Spool-B is despooled and Spool-A is used to spool data.

This should make sure that the slowest part is always being used 100% during the backup job, which should be a significant improvement if you are running a single large job.

If you are running multiple jobs in parallel then this solution will do little to help you as you will spool and despool onto different spool files and possibly multiple drives.

Thoughts?

I am considering writing a proof-of-concept by modifying the spool.cc parts and among other things move the intermediate call to DespoolData (https://github.com/bareos/bareos/blob/7cd54133cd9a4f206259b3612f3ad6ab7add9743/core/src/stored/spool.cc#L547-L550) into a background thread. Since I haven't touched the Bareos code base before I would be grateful for any tips or hints regarding if this is an acceptable way for potential upstream inclusion later on.

Regards,

James Youngman

unread,

Jul 8, 2021, 7:31:08 AM7/8/21

to Christian Svensson, bareos-users

On Thu, Jul 8, 2021 at 10:33 AM 'Christian Svensson' via bareos-users <bareos...@googlegroups.com> wrote:

> Instead, I would like to propose a change where the spooling process creates two spool files (each bound to 50% of Maximum Spool Size), Spool-A and Spool-B.
> When Spool-A is filled up, Bareos-SD starts to despool that file while continuing to spool data to Spool-B.
> If Spool-B fills up while Spool-A is running, the spooling is paused until the spooling of Spool-A is done. At that moment Spool-B is despooled and Spool-A is used to spool data.

I like this idea. My configuration for example has a single machine which is much larger than all the others put together. The throughput (since I turned spooling on on 2021-06-04) is lower for the longest jobs, and that's where I really want to save time. On the other hand the clients with throughput small enough to need spooling have backups that don't run for a long time, and so there would not be much shoe-shining.

bareos=> SELECT CASE WHEN starttime < '2021-06-04' THEN 'OFF' WHEN starttime > '2021-06-05' THEN 'ON' ELSE 'UNKNOWN' END AS spooling,
ROUND(AVG(JobFiles),0) AS AverageFiles,
ROUND(AVG(JobBytes/1024.0/1024.0/1024.0),1) AS AverageGB,
AVG(endtime-starttime) AS avg_elapsed, AVG(JobBytes/1024.0/1024.0/EXTRACT(EPOCH FROM (endtime-starttime))) AS avg_throughput_mbps,
Name AS job
FROM Job
WHERE level = 'F'
AND endtime != starttime
GROUP by Name, spooling
ORDER BY job, spooling, avg_throughput_mbps DESC;
spooling | averagefiles | averagegb | avg_elapsed | avg_throughput_mbps | job
----------+--------------+-----------+------------------+---------------------+---------------------------------
ON | 95392 | 6.1 | 00:12:27.5 | 6.35006567352883 | backup-Big-in-Japan-fd-all
UNKNOWN | 145402 | 10.7 | 00:56:15.666667 | 9.85466560430505 | backup-Big-in-Japan-fd-all
OFF | 56 | 0.4 | 00:02:41 | 2.3997043586406 | BackupCatalog
ON | 84 | 2.1 | 00:02:02.578947 | 22.6342882288465 | BackupCatalog
UNKNOWN | 84 | 0.7 | 00:02:22 | 4.74277415074093 | BackupCatalog
OFF | 843949 | 30.0 | 00:07:28 | 68.6448513461011 | backup-horizon-fd-all
ON | 781091 | 29.4 | 00:08:42.666667 | 61.3886728985652 | backup-horizon-fd-all
OFF | 876132 | 13276.2 | 1 day 07:00:21 | 121.794402469763 | backup-jupiter-fd-all
ON | 2654049 | 10568.3 | 1 day 16:30:13 | 72.0647217388582 | backup-jupiter-fd-all
OFF | 2616869 | 1405.3 | 07:37:43 | 52.3996788929579 | backup-jupiter-fd-home
ON | 2874971 | 1416.3 | 08:12:06 | 49.1188398851964 | backup-jupiter-fd-home
ON | 304803 | 1630.5 | 07:37:05 | 60.8783728477318 | backup-jupiter-fd-nas
ON | 339038 | 9564.7 | 1 day 13:12:14.5 | 73.137894083823 | backup-jupiter-fd-nas-video
OFF | 116521 | 7.8 | 00:08:35 | 15.4905055259038 | backup-substrate-fd-all
ON | 109400 | 7.7 | 00:10:21.666667 | 13.2956005905249 | backup-substrate-fd-all
ON | 71047 | 3.0 | 00:02:08 | 23.6317118629813 | backup-terminator-fd-all
OFF | 75407 | 3.2 | 00:00:56 | 58.7631317887987 | backup-terminator-fd-everything
ON | 75419 | 3.2 | 00:02:17 | 24.1265843106012 | backup-terminator-fd-everything
ON | 0 | 0.0 | 00:04:44.333333 | 0 | RestoreFiles
UNKNOWN | 1 | 0.2 | 00:06:14 | 0.366163643718768 | RestoreFiles
(20 rows)

This analysis would be better if I could figure out the write throughput while despooling, but I don't know how to extract that from the Bareos catalog (though as another thread pointed out, you can get this from tapestats or in the future, Prometheus).

James.

Christian Svensson

unread,

Jul 8, 2021, 11:59:45 AM7/8/21

to James Youngman, bareos-users

Hi James,

On Thu, Jul 8, 2021 at 1:31 PM James Youngman <ja...@youngman.org> wrote:

I like this idea. My configuration for example has a single machine which is much larger than all the others put together.

Very nice. Always good to get feedback from somebody else.

Your configuration is indeed similar to mine, and I guess is a quite common scenario.

This analysis would be better if I could figure out the write throughput while despooling, but I don't know how to extract that from the Bareos catalog.

Not sure about the catalog, but I at least have these log lines on my jobs:

2021-07-08 09:14:50 bareos-sd JobId 405: Spooling data again ...

2021-07-08 09:16:53 bareos-sd JobId 405: Committing spooled data to Volume "P0033SL4". Despooling 9,181,225,019 bytes ...

2021-07-08 09:18:12 bareos-sd JobId 405: Despooling elapsed time = 00:01:19, Transfer rate = 116.2 M Bytes/second

You might be able to parse those log lines and get what you are looking for.

In my case, using Prometheus metrics and Grafana, this is how the runs look for me.

https://storage.googleapis.com/bluecmd/spooling-interleaved.png

You can quite clearly see that network and tape bandwidth alternates.

Regards,

Christian Svensson

unread,

Jul 11, 2021, 9:45:27 AM7/11/21

to James Youngman, bareos-users

Good day,

I have now posted a (hopefully) working PR here: https://github.com/bareos/bareos/pull/886

I would appreciate feedback on it to know if it is worth pursuing.

I understand running development software for backups can be scary, but I would appreciate it if people could test the patch out and see if it helps to speed up their backups.

Regards,

rivim...@gmail.com

unread,

Aug 5, 2021, 11:04:55 AM8/5/21

to bareos-users

There is a bug report in the bareos tracker for this functionality:

https://bugs.bareos.org/view.php?id=1311

Reply all

Reply to author

Forward

Message has been deleted