bareos-fd-postgresql

59 views
Skip to first unread message

Philippe

unread,
Jun 24, 2024, 9:08:20 AM6/24/24
to bareos-users
Hi all,

until now I was using pg_dumpall to backup my postgresql databases, but
I'd like to move to the plugin bareos-fd-postgresql.

I configured wal archiving:

> archive_mode = on
> archive_command = 'install -D %p /var/lib/pgsql/wal_archive/%f'

and the plugin:

> FileSet {
> Name = postgres
> Description = "Fileset for postgres"
> Include {
> Options {
> Signature = XXH128
> Compression = LZ4HC
> }
> Plugin = "python3"
> ":module_name=bareos-fd-postgresql"
> ":db_user=postgres"
> ":db_host=/run/postgresql > ":wal_archive_dir=/var/lib/pgsql/wal_archive"
> }
> }

Since I have no dedicated backup user/role I wanted to use the
postgres-user together with "peer" authentication, as I don't want to
use the same password on every host nor have a dedicated fileset for
each postgresql-host I want to backup.

Is there a smart way of maybe bareos-fd dropping its privileges to the
user 'postgres' to be able to connect to the dbms via peer authentication?

Another possibility would be not to set ':db_user=postgres' and to add a
'root' role to the postgresql server:

> create role root login;
> grant all on schema public to root;
> grant all on all tables in schema public to root;
> grant select, update on all sequences in schema public to root;
> grant execute on all functions in schema public to root;
> alter role root with superuser;

This works, but are these the minimum permissions required to perform a
backup?

What's best practice?

Thanks & kind regards,

Philippe

Philippe

unread,
Jun 25, 2024, 3:07:30 AM6/25/24
to bareos...@googlegroups.com
Hi,

looking at the code (and testing it), apparently a SUPERUSER is enough
for the backup.

Kind regards,

Philippe

Philippe

unread,
Jul 18, 2024, 9:09:52 AM7/18/24
to bareos...@googlegroups.com
Hi all,

I'm using the bareos-fd-postgresql plugin to backup the director's database.

The config is:

--%snip%--
> Job {
> Name = backup-mydirector-postgres
> Client = mydirector
> JobDefs = postgres
> Storage = File-mystorage
> Maximum Concurrent Jobs = 1
> }
>
> JobDefs {
> Name = postgres
> JobDefs = DefaultJob
> FileSet = postgres
> }
>
> FileSet {
> Name = postgres
> Description = "Fileset for postgres"
> Include {
> Options {
> Signature = XXH128
> Compression = LZ4HC
> }
> Plugin = "python3"
> ":module_name=bareos-fd-postgresql"
> ":db_host=/run/postgresql"
> ":wal_archive_dir=/var/lib/pgsql/wal_archive"
> ":switch_wal_timeout=180"
> }
> }
--%snip%--

The dbms is configured as follows:

--%snip%--
> max_wal_size = 1GB
> min_wal_size = 80MB
> archive_mode = on
> archive_command = 'install -D %p /var/lib/pgsql/wal_archive/%f'
> restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
> archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/wal_archive %r'
--%snip%--

There is no replication slave.


From time to time I get the following error:

> 18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got last_backup_stop_time 1721215228 from restore object of job 44528
> 18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got last_lsn 17/85000000 from restore object of job 44528
> 18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Got pg major version 13 from restore object of job 44528
> 18-Jul 11:20 mydirector JobId 44591: Using Device "File-mystorage" to write.
> 18-Jul 11:20 mydirector JobId 44591: Extended attribute support is enabled
> 18-Jul 11:20 mydirector JobId 44591: ACL support is enabled
> 18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: python: 3.9.18 (main, May 16 2024, 00:00:00)
> [GCC 11.4.1 20231218 (Red Hat 11.4.1-3.0.1)] | pg8000: 1.31.2
> 18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Connected to PostgreSQL version 130014
> 18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Current LSN 17/87538B18, last LSN: 17/85000000
> 18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: A difference was found, between current_lsn 17/87538B18 and last LSN: 17/85000000
> 18-Jul 11:20 mydirector JobId 44591: python3-fd-mod: Current LSN 17/880001A8, last LSN: 17/85000000
> 18-Jul 11:23 mydirector JobId 44591: Fatal error: python3-fd-mod: Timeout waiting 180 sec. for wal file 000000010000001700000088 to be archived
> 18-Jul 11:23 mydirector JobId 44591: Fatal error: filed/fd_plugins.cc:673 PluginSave: Command plugin "python3:module_name=bareos-fd-postgresql:db_host=/run/postgresql:wal_archive_dir=/var/lib/pgsql/wal_archive:switch_wal_timeout=180" requested, but job is already cancelled.
> 18-Jul 11:23 mydirector JobId 44591: python3-fd-mod: Database connection closed.
> 18-Jul 11:20 mystorage JobId 44591: Connected File Daemon at 192.168.1.5:9102, encryption: TLS_AES_256_GCM_SHA384 TLSv1.3
> 18-Jul 11:23 mydirector JobId 44591: Fatal error: Director's comm line to SD dropped

As you can see, I already increased the default value of 60s for
switch_wal_timeout to 180s, but this error still shows up.

The database is stored on an nvme, with no performance bottlenecks (ram,
cpu).

Does anyone have an idea of how to get this fixed?

Bruno Friedmann (bruno-at-bareos)

unread,
Jul 18, 2024, 10:35:04 AM7/18/24
to bareos-users
Hi Philippe,

May I just suggest to revisit the reason of that sentence:
> I'm using the bareos-fd-postgresql plugin to backup the director's database

You will not be able to use that backup without a working bareos instance and the whole plugin environment.  I would really advise to have a native dump of the catalog, so you can extract it from volume and/or restore it with almost nothing.
to make a Disaster Recovery the most efficient as possible. Having a read only slave being a must.

Beside that for the plugin the failure you've seeing is the fact that the new wal is not present for any reason (not yet flushed, placed into that directory). Your minimal wal size is 80MB
I won't exclude there's maybe an issue with the code in certain case :-)

if it is failing often enough you may want to run that job with setdebug level 150

Is this still happening is you comment out the following pg conf line.
> archive_cleanup_command = 'pg_archivecleanup /var/lib/pgsql/wal_archive %r' 
Like this is more documented to be used by a slave cluster ?

Philippe

unread,
Jul 18, 2024, 12:19:16 PM7/18/24
to bareos...@googlegroups.com
Hi Bruno,

thanks for your remarks.

I have an additional daily cronjob that does a pg_dumpall to another
server; having a bareos-fd-postgresql backup of the bareos database is
more a "it's there why not use it?" approach for playing around with the
features bareos offers. The configuration and stuff comes from an
ansible playbook.

The archive_cleanup_command is never executed on the master, that's
right. It's configured just because I wanted to have it already set in
case I setup a replication in the future. I'll remove the line and see
what happens.

The error comes up quite seldom. There are something like four to ten
days in a row where everything works fine. Increasing the debug level is
a good idea.

Kind regards,

Philippe
Reply all
Reply to author
Forward
0 new messages