Plugin returncode bRC_Skip

23 views
Skip to first unread message

Peter

unread,
May 5, 2024, 4:45:39 PM5/5/24
to bareos...@googlegroups.com
This here seems a usual pattern within plugins:

def start_backup_file(self, savepkt):
if not self.paths_to_backup:
return bareosfd.bRC_Skip

Question: shouldn't a plugin return bRC_Stop at this point?

I perceived an endless loop (with the postgresql plugin) where
the plugin would return bRC_Skip, and the controlling entity
would then just call it again, forever.

It seems in R.23 it is possible to return bRC_Stop at that point
(which makes things a bit more easy over-all, anyway).

cheerio,
PMc

Bruno Friedmann (bruno-at-bareos)

unread,
May 13, 2024, 7:43:23 AM5/13/24
to bareos-devel
Hi Peter, we have internally the following task which emerge during PostgreSQL plugin refreshment.

Maybe you could develop your perception?

plugin fix usage of bRC_Skip versus bRC_stop:

First step: create systemtest that triggers the problem that showed up.

As bareos user, and bareos developer I would like to see a coherent usage of bRC_Skip / versus bRC_stop

in 23 bRC_Skip is defined as skip proposed file, and continue with next.

In certain case, if a plugin enter start_backup_file() with an empty list, this will create an infinite loop (was the case met by Qumulo plugin). So start_backup_file() should return bRC_Stop (continue with next step -> end_backup_file()

From a first quick review

gfapi-fd.cc (around line 1424) may be affected on a similar way (we may want to extend the systemtest to test an empty incremental if not yet the case)

core/src/plugins/filed/python/pyfiles/BareosFdPluginBaseclass.py has start_backup_file returning bRC_Skip ?
same remark for core/src/plugins/filed/python/pyfiles/BareosFdPluginLocalFilesBaseclass.py

In contrib/fd-plugins

  • bareos_mysql_dump/BareosFdMySQLclass.py:159:
  • bareos_tasks/BareosFdTaskClass.py:293:
  • openvz7/BareosFdPluginVz7CtFs.py:381

systemtests:

We may want to have one test run with an empty incremental and such plugin to be sure to cover the case

postgresql & libcloud are ok.


Peter

unread,
May 29, 2024, 5:06:50 AM5/29/24
to Bruno Friedmann (bruno-at-bareos), bareos-devel
On Mon, May 13, 2024 at 04:43:23AM -0700, Bruno Friedmann (bruno-at-bareos) wrote:
! Hi Peter, we have internally the following task which emerge during
! PostgreSQL plugin refreshment.
!
! Maybe you could develop your perception?

Hi Bruno,

here is the story:

I found that the fd-postgresql plugin can create encapsuled database
backups in a single Bareos job, in order to restore the database to
it's current state. But it cannot support full point-in-time
recoverability (restore to any previous time).

Long ago I scripted a redolog (aka 'WAL') backup, suitable for
point-in-time. No plugin is needed for that, just some scripts
in fileset, before and after, and proper locking. The accompanying
batabase filetree backup was a really simple task up to postgres-12.
Obviousely, for restore from this scheme one has to fetch the
appropriate filetree AND all the required chain of redologs from
Bareos.

With postgres-13 and it's kinda weird "low-level" backup scheme,
a plugin is necessary.
So I tried to write one, and it worked, but I didn't like it. I
didn't know how to talk to postgres from python (usually I use ruby),
and I didn't really grasp the ideas of the calling logic for a plugin,
i.e. in which call one should do what and where/how to return errors
(coding something that runs is easy, handling all possible failure
causes is the hard job).
That is when I touched the bRC_Skip and other returncodes, and
honestly I didn't really understand it.

I think now in Bareos-23 it has become better, because now I somehow
seem to understand it. ;)

Seeing the new fd-postgresql plugin now, I decided to change to that
for the filetree backup, as it can talk directly to the databases
(only it seems pg8000 doesn't support Kerberos/GSSAPI - at least I
didn't find it).

I do not need the redolog handling (as I have my own for that), but
the plugin insists that "wal_archive_dir" is configured.
Now this is what my 'archive_command' basically does:

cp "${Src}" "$SpoolDir/${Dest}" \
&& fsync "$SpoolDir/${Dest}" \
&& cmp "${Src}" "$SpoolDir/${Dest}" 2>/dev/null \
&& mv "$SpoolDir/${Dest}" "$SpoolDir/${Dest}.ok" \
&& fsync "$SpoolDir/${Dest}.ok"

After successful copy and sync, the redologs get renamed with a suffix
".ok", as I don't want my scripts to grab some halfway written log
produced by a crash or whatever distortion.

So, the fd-postgresql plugin collected these redologs from the
spooldir as they are, and then it asked postgres for the name of the
most current log - and postgres doesn't know about this ".ok" suffix.

So the plugin tried to fetch the file without the suffix - which does
exist for a short time. And so sometimes it worked, sometimes it
didn't, and sometimes the job went into an endless loop.

I investigated, and so far this is what I assume happening: The plugin
creates an internal list of the files to collect
("self.paths_to_backup" in the fd-postgresql plugin).
The order of that list appears to be unspecific.
But, if the *last* file in the list does not exist (anymore) at the
moment when it is collected, then we get into the endless loop.

So much for how to create a possible testcase.

Now, what did I do:
I added an option to disable the entire redolog handling in the
plugin (because I don't need it). Then, as the "incremental" level
would now be without any function, I added another option to
run that as a normal incremental filetree backup (which may or may
not be useful, depending on the usage patterns of the database).
And, as I was already at it, I also added a "path_prefix" option,
because postgres may live in some chroot/container/jail, and the
paths as seen from postgres may not be the same as seen from
bareos-fd (and there's no fun in configuring and managing another
bareos-fd in each and every such jail).

So that's the story. (attaching my changes)
Cheerio,
Peter 'PMc' Much



!
! plugin fix usage of bRC_Skip versus bRC_stop:
!
! First step: create systemtest that triggers the problem that showed up.
!
! As bareos user, and bareos developer I would like to see a coherent usage
! of bRC_Skip / versus bRC_stop
!
! in 23 bRC_Skip is defined as *skip proposed file*, and continue with next.
!
! In certain case, if a plugin enter start_backup_file() with an empty list,
! this will create an infinite loop (was the case met by Qumulo plugin). So
! start_backup_file() should return bRC_Stop (continue with next step ->
! end_backup_file()
!
! From a first quick review
!
! gfapi-fd.cc (around line 1424) may be affected on a similar way (we may
! want to extend the systemtest to test an empty incremental if not yet the
! case)
!
! core/src/plugins/filed/python/pyfiles/BareosFdPluginBaseclass.py has
! start_backup_file returning bRC_Skip ?
! same remark for
! core/src/plugins/filed/python/pyfiles/BareosFdPluginLocalFilesBaseclass.py
!
! In contrib/fd-plugins
!
! - bareos_mysql_dump/BareosFdMySQLclass.py:159:
! - bareos_tasks/BareosFdTaskClass.py:293:
! - openvz7/BareosFdPluginVz7CtFs.py:381
!
! systemtests:
!
! We may want to have one test run with an empty incremental and such plugin
! to be sure to cover the case
!
! postgresql & libcloud are ok.
!
patch-xk-postgresql-imgbck
Reply all
Reply to author
Forward
0 new messages