Hello all,
[OK after typing all this I remembered that there was a debug log setting for barman so I turned it on and then it was obvious what was happening. See below.]
We are running a 3 node repmgr cluster (PG16) and I recently set up barman ( 3.11.1) on one of the designated slaves.
My plan was to set up barman to connect to localhost on the slave, to put less stress on the master (important data is committed synchronously anyway). Backups reside on strongly redundant network filer, and are synched offsite by a different job.
After some confusion about the WAL archive (I have that already set up for repmgr), barman check dbserver flashes all green.
I can start a compressed backup, but it remains (for hours) in the "WAITING_FOR_WALS" state, OK, the db server is not seeing much traffic, yet, but still I would like it to finish within a reasonable amount of time.
[Funny enough, there is still the warning about the WAITING_FOR_WALS status after backup finishes, but when I checked a minute later with show-backup, it finally showed "DONE" - I also had to use the barman user for the primary connection, not the streaming_barman, because the replication role seemed OK for getting the system ID, but not for switching the WALs]
I thought that setting up a primary_conninfo would help based on the docs, but I hit a snag there, barman tells me that the systemid is not the same on the master and the slave (but it is if I request the info via psql as the barman user from both servers, and if I switch the streaming_conninfo to the master, it shows the same in barman diagnose).
Primary and standby have same system ID: FAILED (primary_conninfo and conninfo should point to primary and standby servers which share the same system identifier)
-> this was in the debug log:
barman.postgres DEBUG: Error calling pg_is_in_recovery() function: connection to server at "dbmaster" (10.0.0.1), port 5432 failed: FATAL: no pg_hba.conf entry for
host "10.0.0.2", user "streaming_barman", database "postgres", SSL encryption
How can I further diagnose what's not working with the primary_conninfo, is that even going to solve my "
WAITING_FOR_WALS" problem and is my setup halfway OK at all? I really want to refrain from using a separate server for barman, and pulling the backup from the master db, if possible.
[cluster]
description = "PostgreSQL Cluster"
conninfo = host=localhost user=barman dbname=postgres
streaming_conninfo = host=localhost user=streaming_barman dbname=postgres
primary_conninfo = host=dbmaster user=barman dbname=postgres
slot_name = barman
backup_method = postgres
streaming_archiver = on
wal_retention_policy = 'recovery window of 7 days'
backup_directory = /filer/barman/cluster