BugCast BBC#8396: rerun on migration

9 views

Skip to first unread message

p...@citylink.dinoex.sub.org

unread,

Apr 23, 2020, 8:12:53 PM4/23/20

to bareos-users

Hello all,

and welcome to the BugCast of today!

Today we will look at the consequences of a documentation bug. If the documentation says that you must do something which you cannot do because the software does not offer options to do it, then the respective thing cannot function. But then this is not a bug, because it works as documented: the documentation clearly states that to get the thing working you must do something which is not possible to do. So we call that a documentation bug.

We did already contemplate this in BBC#5162 - so here now is what appears to be the sequel. And, this one definitely is a bug.

The switch Job:RerunFailedLevels makes certain that if a Job does not end successful, the next invocation of the Job will automatically escalate to the previously failed level (Diff or Full). As discussed in BBC#5162, and as clearly stated in the documentation, a job that is *still running* will be considered as a *failed* job. So, if your incremental jobs happen to happen more frequently than the duration for a full backup, then all of your backups will become full backups.

Obviousely this is complete crap, but - works-as-designed.

Now what is new in this BugCast is: RerunFailedLevels will not only consider previousely started backups for the same Client for rerunning, it will also trigger on entirely different jobs with completely another purpose: If for instance you happen to migrate or copy your jobs from disk to tape or archive at some later time, then, during that copy, your regular client backups will suddenly be escalated to Full! Without any sensible reason, but that's the way that crap works! A migrate (or copy) happens to run a 'virtual' instance of the original backup job alongside of the migrate job - and that virtual item happens to be in the job table of the catalog, with Level=Full and Status=Running - and thats enough for RerunFailedLevels to trigger another full backup!

So, as that whole crappy behaviour annoyed me already when I tried with I think Rel. 2.2.8, I now decided to say good riddance to either problem (the one that works-as-designed with the documentation-bug, and the real bug), once and for all:

+--- core/src/cats/sql_find.cc.orig     2020-04-15 23:12:31.997100000 +0000
++++ core/src/cats/sql_find.cc 2020-04-15 23:15:52.216751000 +0000
+@@ -288,7 +288,7 @@
+
+   /* Differential is since last Full backup */
+   Mmsg(cmd,
+-       "SELECT Level FROM Job WHERE JobStatus NOT IN ('T','W') AND "
++       "SELECT Level FROM Job WHERE JobStatus IN ('E','e','f','A','I') AND "
+        "Type='%c' AND Level IN ('%c','%c') AND Name='%s' AND ClientId=%s "
+        "AND FileSetId=%s AND StartTime>'%s' "
+        "ORDER BY StartTime DESC LIMIT 1",

Have fun, and stay tuned for the next BugCast!

----------

Footnotes:

* Definition of 'Bug', according to the uWSGI people: ""Remember, if
you cannot use uWSGI in some scenario, it is a uWSGI bug."
[https://uwsgi-docs.readthedocs.io/en/latest/FAQ.html]