barman recover questions ( --get-wal, restore to last known state, )

389 views
Skip to first unread message

B S

unread,
Jan 31, 2022, 3:54:03 AMJan 31
to Barman, Backup and Recovery Manager for PostgreSQL
barman -v
2.18 Barman by EnterpriseDB (www.enterprisedb.com)

I'm running Barman and Postgres 14 in separate containers with rootless podman management and geo-redundancy feature, so my configuration is more complicated than other online articles I've found. Since the absence of experience in database backup, I'm have been messed about some Barman features.
My choice point is: 'Scenario 1: Backup via streaming protocol' (streaming-only) from Barman documentation.
After run-up containers, Barman successfully streaming backups and WAL files and  I'm not having any issues at that point.
So:

1. When I'm use 'recover' command with option '--get-wal', Barman generate Postgres option 'restore_command' in file 'postgresql.auto.conf' with default value. But, I need a specific command with an ssh connection. I couldn't find the desired behavior in the documentation and had to manually edit the 'postgresql.auto.conf' file each time after recovery. Is this a reasonable decision?
I specified the 'restore_command' option in the 'postgresql.conf' main file, but it doesn't seem to matter. I don't understand when Postgres uses this option. I understand that the "postgresql.auto.conf" file takes precedence, but in which case does Postgres use the "restore_command" option?

2. After completing the backup process by Barman 'backup' command and adding more records to the database, the new WAL files with changes after backup keep accumulating by Barman.
Suppose the database container was destroyed after being backed up and adding a few WAL files on top of that backup. I know that the 'get-wal' Barman command can also receive '*.partial' files. How can I restore by Barman a database from the latest backup using all WAL files since that backup?

3. Am I correct in understanding that geographical redundancy with Barman's 'primary_ssh_command' can only get closed WAL files, not '*.partial' files?

Michael Wallace

unread,
Jan 31, 2022, 5:43:23 AMJan 31
to pgba...@googlegroups.com
1. Yes there is currently no way of specifying a custom SSH command so you would need to edit postgresql.auto.conf yourself. PostgreSQL would use the restore_command in postgresql.conf if there was no restore_command set in postgresql.auto.conf - this is possible in scenarios where `barman recover` is not being used, such as manually co-ordinating a recovery.

2. Take a look at the point-in-time recover feature: https://docs.pgbarman.org/release/2.18/#point-in-time-recovery - if you use one of the recovery target options along with the --get-wal option then Barman will automatically add the --partial option to the restore command.

3. Correct - the passive Barman server will only sync WALs that have been archived by the primary Barman server and this does not include `*.partial` WAL files.

--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pgbarman/6c0841c1-0492-48a4-8a92-ff1ae77c4921n%40googlegroups.com.

B S

unread,
Jan 31, 2022, 1:24:39 PMJan 31
to Barman, Backup and Recovery Manager for PostgreSQL
Michael, Thank you very much.
This saved me a lot of time.
I experimented with different options in the 'recover' command.

Does it look like specifying a '--target-tli' parameter is the only way to restore the database to the last state known to the Barman (taking into all accumulated WAL files after the end of the backup itself)?

I read all the documentation about the '--target-immediate' parameter, but I could not understand what it actually means and how it differs from restoring without parameters to the last backup point. I would greatly appreciate more explanation and examples.

понедельник, 31 января 2022 г. в 13:43:23 UTC+3, michael...@enterprisedb.com:

Michael Wallace

unread,
Feb 1, 2022, 6:24:08 AMFeb 1
to pgba...@googlegroups.com
Actually my answer was not quite correct - if the PITR targets are not used at all then no recovery_target or recovery_target_timeline option will be set in postgresql.auto.conf and postgresql will use the default values [1]. For recovery_target_timeline this is `latest` so as long as `--get-wal` is used in the `barman recover` command then PostgreSQL should recover all the way up to the last transaction in the partial WAL.

The `--target-immediate` option will cause `recovery_target = 'immediate'` to be set in postgresql.auto.conf which will stop the recovery as soon as the backup has reached a consistent state and avoid replaying any subsequent WALs.


B S

unread,
Feb 2, 2022, 5:50:25 AMFeb 2
to Barman, Backup and Recovery Manager for PostgreSQL
It's absolutely clear now. Many thanks.

After some playing with Barman's backups and recoveries, I get the following errors related to the new 'backup' command:


In terminal:

ERROR: Impossible to start the backup. Check the log for more details, or run 'barman check gl'


In barman.log file:

2022-02-02 12:26:01,703 [5549] barman.wal_archiver INFO: Found 1 xlog segments from streaming for gl. Archive all segments in one run.
2022-02-02 12:26:01,703 [5549] barman.wal_archiver INFO: Archiving segment 1 of 1 from streaming: gl/00000003.history
2022-02-02 12:26:01,707 [5550] barman.server INFO: Starting receive-wal for server gl
2022-02-02 12:26:01,824 [5550] barman.wal_archiver INFO: Activating WAL archiving through streaming protocol
2022-02-02 12:26:01,897 [5550] barman.command_wrappers INFO: gl: pg_receivewal: starting log streaming at 0/19000000 (timeline 3)
2022-02-02 12:26:01,910 [5550] barman.command_wrappers INFO: gl: pg_receivewal: error: could not send replication command "START_REPLICATION": ERROR:  requested timeline 3 is not in this >2022-02-02 12:26:01,911 [5550] barman.command_wrappers INFO: gl: pg_receivewal: error: disconnected
2022-02-02 12:26:01,913 [5550] barman.server ERROR: ArchiverFailure:pg_receivewal terminated with error code: 1
2022-02-02 12:26:17,435 [5553] barman.server ERROR: Check 'replication slot' failed for server 'gl'
2022-02-02 12:26:17,442 [5553] barman.server ERROR: Check 'receive-wal running' failed for server 'gl'
2022-02-02 12:26:17,444 [5553] barman.server ERROR: Impossible to start the backup. Check the log for more details, or run 'barman check gl'


$ Barman's 'check' command:

Server gl:
        PostgreSQL: OK
        superuser or standard user with backup privileges: OK
        PostgreSQL streaming: OK
        wal_level: OK
        replication slot: FAILED (slot 'barman' not initialised: is 'receive-wal' running?)
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        backup minimum size: OK (137.4 MiB)
        wal maximum age: OK (no last_wal_maximum_age provided)
        wal size: OK (48.0 MiB)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 4 backups, expected at least 3)
        pg_basebackup: OK
        pg_basebackup compatible: OK
        pg_basebackup supports tablespaces mapping: OK
        systemid coherence: OK
        pg_receivexlog: OK
        pg_receivexlog compatible: OK
        receive-wal running: FAILED (See the Barman log file for more details)
        archiver errors: OK


$ Barman's 'status' command:

Server gl:
        Description: gl Database (Streaming-Only)
        Active: True
        Disabled: False
        PostgreSQL version: 14.1
        Cluster state: in production
        pgespresso extension: Not available
        Current data size: 164.4 MiB
        PostgreSQL Data directory: /var/lib/postgresql/data
        Current WAL segment: 00000002000000000000001A
        Passive node: False
        Retention policies: enforced (mode: auto, retention: RECOVERY WINDOW OF 4 WEEKS, WAL retention: MAIN)
        No. of available backups: 4
        First available backup: 20220131T195320
        Last available backup: 20220201T142738
        Minimum redundancy requirements: satisfied (4/3)
               

$ Barman's 'receive-wal' command:

gl: pg_receivewal: starting log streaming at 0/19000000 (timeline 3)
gl: pg_receivewal: error: could not send replication command "START_REPLICATION": ERROR:  requested timeline 3 is not in this server's history
gl: pg_receivewal: error: disconnected
ERROR: ArchiverFailure:pg_receivewal terminated with error code: 1


$ Barman's 'receive-wal --create-slot':

ERROR: Replication slot 'barman' already exists

$ Barman's 'receive-wal --reset':

ERROR: The receive-wal position is ahead of PostgreSQL current WAL lsn (000000030000000000000019.partial > 00000002000000000000001B)


It looks like the database is on Timeline 2 after the recovery, but Barman's state is already pointing to Timeline 3.

Please explain how I should act in such situations. I'm trying to figure it out now because when the database is filled with real data, I won't have the opportunity and time to experiments.

And one more question that worries me a lot. Let's say I want the Barman to completely reset his state. Is it enough to just delete the server directory in the Barman directory or does it store the related data somewhere else? Will there be any problem later?

вторник, 1 февраля 2022 г. в 14:24:08 UTC+3, michael...@enterprisedb.com:

Luca Ferrari

unread,
Feb 2, 2022, 6:33:14 AMFeb 2
to Barman, Backup and Recovery Manager for PostgreSQL
On Wed, Feb 2, 2022 at 11:50 AM B S <bors...@gmail.com> wrote:
> gl: pg_receivewal: starting log streaming at 0/19000000 (timeline 3)
> gl: pg_receivewal: error: could not send replication command "START_REPLICATION": ERROR: requested timeline 3 is not in this server's history

Sounds to me like you recovered your server to a previous timeline and
then promoted it.

Luca

B S

unread,
Feb 2, 2022, 6:48:18 AMFeb 2
to Barman, Backup and Recovery Manager for PostgreSQL
Unfortunately, I don't understand what 'then promoted it' means. It would be good to understand how to resolve this situation and avoid its occurrence in the future.

среда, 2 февраля 2022 г. в 14:33:14 UTC+3, fluc...@gmail.com:

B S

unread,
Feb 2, 2022, 2:06:29 PMFeb 2
to Barman, Backup and Recovery Manager for PostgreSQL
After extensive searching, I found advice from a person in a similar situation:
- "I've discovered that empting the streaming folder barman continues with the streaming replication without delete all the wal history."

I cleared the 'streaming' directory and ran the 'barman cron' command. Everything worked and the Barman got the last partial WAL file.

I think I'm beginning to understand a little what's going on. But I am worried about the question of the side effects of such actions. And what is the correct procedure to avoid such situation and reset the Barman? I understand that by deleting all files from the 'streaming' directory, I have lost some data.

среда, 2 февраля 2022 г. в 14:48:18 UTC+3, B S:

Michael Wallace

unread,
Feb 3, 2022, 5:21:12 AMFeb 3
to pgba...@googlegroups.com
Without knowing exactly the sequence of backups / restores it's hard to say for sure but the 000000030000000000000019.partial file would have been written by pg_receivewal while streaming WALs from the PostgreSQL server when it was on timeline 3. If the server was then restored from a backup on an older timeline but the WALs were not replayed beyond those necessary to make recovery consistent (this could happen if --target-immediate was used, or if --get-wal was not used) then pg_receivewal would notice the .partial file was from a higher timeline and produce the error you were seeing. After clearing the streaming directory, pg_receivewal is happy to start streaming from the PostgreSQL server's current timeline however any transactions recorded in the .partial file will have been lost.

Whether deleting the file in the streaming directory was the right thing to do depends on how you arrived at this scenario. If you had intentionally used --target-immediate because you had decided to abandon timeline 3 (perhaps a recovery had gone wrong creating a need to revert to an earlier timeline) then it might be ok to lose the transactions recorded in the .partial file (as a general rule I'd recommend just moving the .partial files somewhere outside of Barman's streaming directory until you're sure you don't need them) . If you had recovered to the earlier timeline by omitting --get-wal but had intended to recover all the way up to timeline 3 then deleting the .partial file would almost certainly not be the right thing to do - a better approach would be to retry the recovery with --get-wal so that PostgreSQL can retrieve that .partial file on timeline 3 and replay the transactions. Starting `barman receive-wal` would then continue streaming the 000000030000000000000019 WAL segment from the server and all would be well.

Regarding the question about completely resetting Barman state, all the Barman state for that server is in the server directory aside from any configuration information in the barman config file.

--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.

B S

unread,
Feb 5, 2022, 6:55:07 AMFeb 5
to Barman, Backup and Recovery Manager for PostgreSQL
It was a very detailed and helpful explanation. Thank you.
I played around with various backup and restore scenarios. And it looks like the veil of secrecy is being lifted.

If I use a 'barman recover --immediate' command, and now to make it work, I execute the 'barman receive-wal --reset' command. This is fine?

четверг, 3 февраля 2022 г. в 13:21:12 UTC+3, michael...@enterprisedb.com:

Michael Wallace

unread,
Feb 7, 2022, 4:44:04 PMFeb 7
to pgba...@googlegroups.com
I don't think `barman receive-wal --reset` will help after a recovery with `--target-immediate` because the `--reset` option includes logic which will raise an error if the most recently streamed WAL is ahead of the PostgreSQL current WAL - I would expect to see an error such as: `ERROR: The receive-wal position is ahead of PostgreSQL current WAL lsn (0000000400000002000000B9.partial > None)`. I think you'd still need to manually clear out the `streaming` directory after that recovery.

Reply all
Reply to author
Forward
0 new messages