Barman backup status "changed from WAITING_FOR_WALS to FAILED"

1,615 views
Skip to first unread message

Marco Cassiano

unread,
Oct 22, 2019, 3:06:20 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL

HI,


I have a dedicated Barman Server that backs up several Postgres databases on different servers.
Often, but not always, only the backup of a specific Postgres Database fails with "status changed from WAITING_FOR_WALS to FAILED".

Barman version is 2.7, Postgres Version is 11.5.


Thanks in advance to anyone who might give a suggestion

Marco

Luca Ferrari

unread,
Oct 22, 2019, 3:16:22 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL
On Tue, Oct 22, 2019 at 9:06 AM Marco Cassiano <cassi...@mmfg.it> wrote:
> I have a dedicated Barman Server that backs up several Postgres databases on different servers.
> Often, but not always, only the backup of a specific Postgres Database fails with "status changed from WAITING_FOR_WALS to FAILED".

You should get some more details about the error in the logs.
I've never experienced that, but waiting for wals happens when barman
has not received/archived any wal, while failed could happen when a
wal si missing or it is not in sync with the time the backup started.
Any hint from the wals?

Luca

Marco Cassiano

unread,
Oct 22, 2019, 3:28:01 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL
Thank you Luca,

here it is what I see in the log:

2019-10-22 09:05:23,478 [8500] barman.command_wrappers INFO: pg64infolog-prod: pg_receivewal: finished segment at 18/D8000000 (timeline 1)
2019-10-22 09:05:23,644 [22658] barman.command_wrappers DEBUG: Command return code: 0
2019-10-22 09:05:23,644 [22658] barman.command_wrappers DEBUG: Command stdout:
2019-10-22 09:05:23,645 [22658] barman.command_wrappers DEBUG: Command stderr:
2019-10-22 09:05:23,684 [22658] barman.wal_archiver DEBUG: Archived 1 out of 1 xlog segments from streaming for pg64prod
2019-10-22 09:05:23,727 [22264] barman.command_wrappers DEBUG: Command return code: 0
2019-10-22 09:05:23,727 [22264] barman.command_wrappers DEBUG: Command stdout:
2019-10-22 09:05:23,727 [22264] barman.command_wrappers DEBUG: Command stderr: pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
WARNING:  skipping special file "./postgresql.conf"
WARNING:  skipping special file "./pg_hba.conf"
WARNING:  skipping special file "./pg_ident.conf"
pg_basebackup: base backup completed

2019-10-22 09:05:23,727 [22264] barman.infofile DEBUG: Config file 'postgresql.conf' already in PGDATA
2019-10-22 09:05:23,727 [22264] barman.infofile DEBUG: Config file 'pg_hba.conf' already in PGDATA
2019-10-22 09:05:23,728 [22264] barman.infofile DEBUG: Config file 'pg_ident.conf' already in PGDATA
2019-10-22 09:05:23,729 [22264] barman.backup_executor INFO: Copy done (time: 7 minutes, 14 seconds)
2019-10-22 09:05:23,754 [22264] barman.backup_executor INFO: Finalising the backup.
2019-10-22 09:05:23,823 [22264] barman.backup DEBUG: calculating backup size
2019-10-22 09:05:24,066 [8500] barman.command_wrappers INFO: pg64infolog-prod: pg_receivewal: finished segment at 18/D9000000 (timeline 1)
2019-10-22 09:05:24,150 [22264] barman.backup INFO: Backup size: 37.2 GiB
2019-10-22 09:05:24,151 [22264] barman.backup INFO: Backup end at LSN: 18/D8020948 (0000000100000018000000D8, 00020948)
2019-10-22 09:05:24,151 [22264] barman.backup INFO: Backup completed (start time: 2019-10-22 08:58:09.635281, elapsed time: 7 minutes, 14 seconds)
2019-10-22 09:05:24,171 [22264] barman.server DEBUG: Starting archive-wal for server pg64infolog-prod
2019-10-22 09:05:24,172 [22264] barman.wal_archiver INFO: Found 2 xlog segments from file archival for pg64infolog-prod. Archive all segments in one run.
2019-10-22 09:05:24,173 [22264] barman.wal_archiver INFO: Archiving segment 1 of 2 from file archival: pg64infolog-prod/0000000100000018000000D7
2019-10-22 09:05:24,173 [22264] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/incoming/0000000100000018000000D7\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/0000000100000018000000D7.tmp\''
2019-10-22 09:05:24,574 [22264] barman.command_wrappers DEBUG: Command return code: 0
2019-10-22 09:05:24,575 [22264] barman.command_wrappers DEBUG: Command stdout:
2019-10-22 09:05:24,575 [22264] barman.command_wrappers DEBUG: Command stderr:
2019-10-22 09:05:24,616 [22264] barman.wal_archiver INFO: Archiving segment 2 of 2 from file archival: pg64infolog-prod/0000000100000018000000D7.00000028.backup
2019-10-22 09:05:24,617 [22264] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/incoming/0000000100000018000000D7.00000028.backup\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/0000000100000018000000D7.00000028.backup.tmp\''
2019-10-22 09:05:24,622 [22264] barman.command_wrappers DEBUG: Command return code: 0
2019-10-22 09:05:24,623 [22264] barman.command_wrappers DEBUG: Command stdout:
2019-10-22 09:05:24,623 [22264] barman.command_wrappers DEBUG: Command stderr:
2019-10-22 09:05:24,647 [22264] barman.wal_archiver DEBUG: Archived 2 out of 2 xlog segments from file archival for pg64infolog-prod
2019-10-22 09:05:24,648 [22264] barman.wal_archiver INFO: Found 1 xlog segments from streaming for pg64infolog-prod. Archive all segments in one run.
2019-10-22 09:05:24,648 [22264] barman.wal_archiver INFO: Archiving segment 1 of 1 from streaming: pg64infolog-prod/0000000100000018000000D7
2019-10-22 09:05:24,648 [22264] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c -d > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/0000000100000018000000D7\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/0000000100000018000000D7.uncompressed\''
2019-10-22 09:05:24,760 [22264] barman.command_wrappers DEBUG: Command return code: 0
2019-10-22 09:05:24,760 [22264] barman.command_wrappers DEBUG: Command stdout:
2019-10-22 09:05:24,760 [22264] barman.command_wrappers DEBUG: Command stderr:
2019-10-22 09:05:24,912 [22264] barman.wal_archiver DEBUG: Archived 1 out of 1 xlog segments from streaming for pg64infolog-prod
2019-10-22 09:05:24,913 [22264] barman.server DEBUG: Checking backup 20191022T085809 of server pg64infolog-prod
2019-10-22 09:05:24,942 [22264] barman.server DEBUG: Check finished: the status of backup 20191022T085809 of server pg64infolog-prod changed from WAITING_FOR_WALS to FAILED

Luca Ferrari

unread,
Oct 22, 2019, 4:16:09 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL
On Tue, Oct 22, 2019 at 9:28 AM Marco Cassiano <cassi...@mmfg.it> wrote:
> 2019-10-22 09:05:24,913 [22264] barman.server DEBUG: Checking backup 20191022T085809 of server pg64infolog-prod
> 2019-10-22 09:05:24,942 [22264] barman.server DEBUG: Check finished: the status of backup 20191022T085809 of server pg64infolog-prod changed from WAITING_FOR_WALS to FAILED

Uhm...I hoped there was something more but that is not very useful.
I suspect `check-backup 20191022T085809` will not print any other
interesting message with a detail about the failure, because seems to
me barman knows the reason it failed.
I think it is missing a WAL segment, and this seems to be confirmed
here <https://github.com/2ndquadrant-it/barman/issues/244>.

Sorry, I don't know how to solve that.

Marco Cassiano

unread,
Oct 22, 2019, 4:29:49 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL
Luca,

thank you a lot anyway.

Marco

Il giorno martedì 22 ottobre 2019 10:16:09 UTC+2, Luca Ferrari ha scritto:

Gabriele Bartolini

unread,
Oct 22, 2019, 4:35:24 AM10/22/19
to pgba...@googlegroups.com
Hi Marco,

as first step, please update Barman to the latest 2.9 version - as all outstanding bugs have been fixed and all versions of Barman are back compatible (trunk development).

As Luca pointed out, it means that at least one WAL file from the start to the end of the backup is missing (remember you are using concurrent backup with streaming replication, not exclusive backup which forces the WAL file to be shipped as part of the stop backup procedure).

Cheers,
Gabriele

--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pgbarman/942da75f-6466-4ea1-8acb-161fc56d979d%40googlegroups.com.


--
Gabriele Bartolini - Head of Global Support
PostgreSQL Training, Services and Support
gabriele.bartolini@2ndQuadrant.com | www.2ndQuadrant.com

Luca Ferrari

unread,
Oct 22, 2019, 4:47:40 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL
On Tue, Oct 22, 2019 at 10:35 AM Gabriele Bartolini
<gabriele....@2ndquadrant.com> wrote:
> As Luca pointed out, it means that at least one WAL file from the start to the end of the backup is missing (remember you are using concurrent backup with streaming replication, not exclusive backup which forces the WAL file to be shipped as part of the stop backup procedure).

"By default, backup_options is transparently set to exclusive_backup
for back compatibility reasons. Users of PostgreSQL 9.6 should set
backup_options to concurrent_backup."

What if he sets exclusive_backup? Just to try if the WAL is shipped at
the end of the backup.

Luca

Gabriele Bartolini

unread,
Oct 22, 2019, 5:08:48 AM10/22/19
to pgba...@googlegroups.com
Hi Luca,

There is no way to use exclusive backup with streaming replication. Per se, streaming replication enables concurrent backups. Exclusive backups can only use rsync copy method.

Cheers,
Gabriele

Luca Ferrari

unread,
Oct 22, 2019, 5:10:41 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL
On Tue, Oct 22, 2019 at 11:08 AM Gabriele Bartolini
<gabriele....@2ndquadrant.com> wrote:
> There is no way to use exclusive backup with streaming replication. Per se, streaming replication enables concurrent backups. Exclusive backups can only use rsync copy method.

Yeah, shame on me, I realized I've written a stupid thing as soon as I
clicked the send button.

Marco Cassiano

unread,
Oct 22, 2019, 5:22:03 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL
Thank you Gabriele and Luca,

I updated barman to 2.9 version.

First run worked for my illusion....
The second one ended with "Check finished: the status of backup 20191021T001001 of server pg64infolog-prod changed from WAITING_FOR_WALS to FAILED"
See the log at the bottom of the message.

I also include the excerpt from barman.conf :

[pg64infolog-prod]

; Human readable description
description =  "DB postgres INFOLOG PROD"

conninfo = host=...... user=...... dbname=postgres

streaming_conninfo = host=....... user=.........

; Backup settings (via pg_basebackup)
backup_method = postgres
streaming_backup_name = barman_streaming_backup

; WAL streaming settings (via pg_receivexlog)
streaming_archiver = on
;slot_name = barman
;streaming_archiver_name = barman_receive_wal
;streaming_archiver_batch_size = 50

; WAL file: mantengo attivo anche la ricezione via wal file
archiver = on

; PATH setting for this server
#path_prefix = "/usr/local/pgsql-10.7/bin"
path_prefix = "/usr/local/pgsql-11/bin" 


2019-10-21 00:16:16,151 [30043] barman.command_wrappers INFO: pg64infolog-prod: pg_receivewal: finished segment at 18/B000000 (timeline 1)
2019-10-21 00:16:16,644 [23433] barman.command_wrappers DEBUG: Command return code: 0
2019-10-21 00:16:16,645 [23433] barman.command_wrappers DEBUG: Command stdout:
2019-10-21 00:16:16,645 [23433] barman.command_wrappers DEBUG: Command stderr: pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
WARNING:  skipping special file "./postgresql.conf"
WARNING:  skipping special file "./pg_hba.conf"
WARNING:  skipping special file "./pg_ident.conf"
pg_basebackup: base backup completed

2019-10-21 00:16:16,645 [23433] barman.infofile DEBUG: Config file 'postgresql.conf' already in PGDATA
2019-10-21 00:16:16,645 [23433] barman.infofile DEBUG: Config file 'pg_hba.conf' already in PGDATA
2019-10-21 00:16:16,645 [23433] barman.infofile DEBUG: Config file 'pg_ident.conf' already in PGDATA
2019-10-21 00:16:16,646 [23433] barman.backup_executor INFO: Copy done (time: 6 minutes, 15 seconds)
2019-10-21 00:16:16,659 [23433] barman.backup_executor INFO: Finalising the backup.
2019-10-21 00:16:16,743 [23433] barman.backup DEBUG: calculating backup size
2019-10-21 00:16:16,952 [30043] barman.command_wrappers INFO: pg64infolog-prod: pg_receivewal: finished segment at 18/C000000 (timeline 1)
2019-10-21 00:16:17,030 [23433] barman.backup INFO: Backup size: 37.2 GiB
2019-10-21 00:16:17,030 [23433] barman.backup INFO: Backup end at LSN: 18/B000060 (00000001000000180000000B, 00000060)
2019-10-21 00:16:17,031 [23433] barman.backup INFO: Backup completed (start time: 2019-10-21 00:10:01.494941, elapsed time: 6 minutes, 15 seconds)
2019-10-21 00:16:17,043 [23433] barman.server DEBUG: Starting archive-wal for server pg64infolog-prod
2019-10-21 00:16:17,074 [23433] barman.wal_archiver INFO: Found 4 xlog segments from file archival for pg64infolog-prod. Archive all segments in one run.
2019-10-21 00:16:17,074 [23433] barman.wal_archiver INFO: Archiving segment 1 of 4 from file archival: pg64infolog-prod/000000010000001800000008
2019-10-21 00:16:17,074 [23433] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/incoming/000000010000001800000008\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/000000010000001800000008.tmp\''
2019-10-21 00:16:17,292 [23433] barman.command_wrappers DEBUG: Command return code: 0
2019-10-21 00:16:17,292 [23433] barman.command_wrappers DEBUG: Command stdout:
2019-10-21 00:16:17,292 [23433] barman.command_wrappers DEBUG: Command stderr:
2019-10-21 00:16:17,301 [23433] barman.wal_archiver INFO: Archiving segment 2 of 4 from file archival: pg64infolog-prod/000000010000001800000009
2019-10-21 00:16:17,302 [23433] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/incoming/000000010000001800000009\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/000000010000001800000009.tmp\''
2019-10-21 00:16:17,511 [23433] barman.command_wrappers DEBUG: Command return code: 0
2019-10-21 00:16:17,512 [23433] barman.command_wrappers DEBUG: Command stdout:
2019-10-21 00:16:17,512 [23433] barman.command_wrappers DEBUG: Command stderr:
2019-10-21 00:16:17,522 [23433] barman.wal_archiver INFO: Archiving segment 3 of 4 from file archival: pg64infolog-prod/00000001000000180000000A
2019-10-21 00:16:17,522 [23433] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/incoming/00000001000000180000000A\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/00000001000000180000000A.tmp\''
2019-10-21 00:16:17,647 [23433] barman.command_wrappers DEBUG: Command return code: 0
2019-10-21 00:16:17,648 [23433] barman.command_wrappers DEBUG: Command stdout:
2019-10-21 00:16:17,648 [23433] barman.command_wrappers DEBUG: Command stderr:
2019-10-21 00:16:17,659 [23433] barman.wal_archiver INFO: Archiving segment 4 of 4 from file archival: pg64infolog-prod/00000001000000180000000A.00000028.backup
2019-10-21 00:16:17,660 [23433] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/incoming/00000001000000180000000A.00000028.backup\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/00000001000000180000000A.00000028.backup.tmp\''
2019-10-21 00:16:17,666 [23433] barman.command_wrappers DEBUG: Command return code: 0
2019-10-21 00:16:17,666 [23433] barman.command_wrappers DEBUG: Command stdout:
2019-10-21 00:16:17,666 [23433] barman.command_wrappers DEBUG: Command stderr:
2019-10-21 00:16:17,674 [23433] barman.wal_archiver DEBUG: Archived 4 out of 4 xlog segments from file archival for pg64infolog-prod
2019-10-21 00:16:17,675 [23433] barman.wal_archiver INFO: Found 3 xlog segments from streaming for pg64infolog-prod. Archive all segments in one run.
2019-10-21 00:16:17,675 [23433] barman.wal_archiver INFO: Archiving segment 1 of 3 from streaming: pg64infolog-prod/000000010000001800000008
2019-10-21 00:16:17,676 [23433] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c -d > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/000000010000001800000008\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/000000010000001800000008.uncompressed\''
2019-10-21 00:16:17,763 [23433] barman.command_wrappers DEBUG: Command return code: 0
2019-10-21 00:16:17,764 [23433] barman.command_wrappers DEBUG: Command stdout:
2019-10-21 00:16:17,764 [23433] barman.command_wrappers DEBUG: Command stderr:
2019-10-21 00:16:17,916 [23433] barman.wal_archiver INFO: Archiving segment 2 of 3 from streaming: pg64infolog-prod/000000010000001800000009
2019-10-21 00:16:17,916 [23433] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c -d > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/000000010000001800000009\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/000000010000001800000009.uncompressed\''
2019-10-21 00:16:18,003 [23433] barman.command_wrappers DEBUG: Command return code: 0
2019-10-21 00:16:18,003 [23433] barman.command_wrappers DEBUG: Command stdout:
2019-10-21 00:16:18,004 [23433] barman.command_wrappers DEBUG: Command stderr:
2019-10-21 00:16:18,157 [23433] barman.wal_archiver INFO: Archiving segment 3 of 3 from streaming: pg64infolog-prod/00000001000000180000000A
2019-10-21 00:16:18,157 [23433] barman.command_wrappers DEBUG: Command: 'barman_command(){ gzip -c -d > "$2" < "$1";}; barman_command \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/00000001000000180000000A\' \'/barmanbackup/pg64infolog-prod/wals/0000000100000018/00000001000000180000000A.uncompressed\''
2019-10-21 00:16:18,245 [23433] barman.command_wrappers DEBUG: Command return code: 0
2019-10-21 00:16:18,245 [23433] barman.command_wrappers DEBUG: Command stdout:
2019-10-21 00:16:18,245 [23433] barman.command_wrappers DEBUG: Command stderr:
2019-10-21 00:16:18,288 [23433] barman.wal_archiver DEBUG: Archived 3 out of 3 xlog segments from streaming for pg64infolog-prod
2019-10-21 00:16:18,289 [23433] barman.server DEBUG: Checking backup 20191021T001001 of server pg64infolog-prod
2019-10-21 00:16:18,312 [23433] barman.server DEBUG: Check finished: the status of backup 20191021T001001 of server pg64infolog-prod changed from WAITING_FOR_WALS to FAILED

Gunnar "Nick" Bluth

unread,
Oct 22, 2019, 6:10:19 AM10/22/19
to pgba...@googlegroups.com
Am 22.10.19 um 11:22 schrieb Marco Cassiano:
> Thank you Gabriele and Luca,
>
> I updated barman to 2.9 version.
>
> First run worked for my illusion....
> The second one ended with "Check finished: the status of backup
> 20191021T001001 of server pg64infolog-prod changed from WAITING_FOR_WALS
> to FAILED"
> See the log at the bottom of the message.

I've stumbled into this recently, after setting up a streaming only
BARMAN for a more or less idle CMS DB.

Please see
https://github.com/2ndquadrant-it/barman/issues/244
where I noted my findings so far (it boils down to BARMAN leaving the
most recent WAL segment in the "streaming" folder).

Best regards,
--
Gunnar "Nick" Bluth
Geschäftsführer

Pro Open GmbH
Eimermacherweg 106
D-48159 Münster

HRB 18122, AG MS

Mobil +49 172 8853339
Email: gunnar...@pro-open.de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam"

signature.asc

Luca Ferrari

unread,
Oct 22, 2019, 6:25:18 AM10/22/19
to Barman, Backup and Recovery Manager for PostgreSQL
On Tue, Oct 22, 2019 at 12:10 PM Gunnar "Nick" Bluth
<gunnar...@pro-open.de> wrote:
> Please see
> https://github.com/2ndquadrant-it/barman/issues/244

Yeah, already linked in a previous reply and on the issue itself. I
suspect it has not been assigned due to pgconf.eu.

> where I noted my findings so far (it boils down to BARMAN leaving the
> most recent WAL segment in the "streaming" folder).

I'm just wondering: what if a barman scripts performs the archiving of
the incoming wal segment? I suspect the internal list of missing wals
will still reference it, and so the backup will result broken, but a
following `check-backup` could notice it is complete.

What if barman is synchronous replica?

Gunnar "Nick" Bluth

unread,
Oct 22, 2019, 6:45:14 AM10/22/19
to pgba...@googlegroups.com
Am 22.10.19 um 12:10 schrieb Gunnar "Nick" Bluth:
> Am 22.10.19 um 11:22 schrieb Marco Cassiano:
>> Thank you Gabriele and Luca,
>>
>> I updated barman to 2.9 version.
>>
>> First run worked for my illusion....
>> The second one ended with "Check finished: the status of backup
>> 20191021T001001 of server pg64infolog-prod changed from WAITING_FOR_WALS
>> to FAILED"
>> See the log at the bottom of the message.
>
> I've stumbled into this recently, after setting up a streaming only
> BARMAN for a more or less idle CMS DB.
>
> Please see
> https://github.com/2ndquadrant-it/barman/issues/244
> where I noted my findings so far (it boils down to BARMAN leaving the
> most recent WAL segment in the "streaming" folder).

Just realised Luca had backlinked _from_ there already... d'oh.

Anyway, the logs don't really help pinpointing this particular issue, I
added a
output.error("Missing WAL: %s !" % (wal,))

in backup.py +1097 to track it down.

Then looked at the "streaming" directory during and at the end of a
backup and saw the exact missing file lurking there, while my backup was
changed to "FAILED".

The - allegedly - simple solution (trigger a second switch_wal at the
end if all we are is streaming) doesn't work, at least not where I tried
(backup_execcutor.py line 1489ff):
# Ask PostgreSQL to switch to another WAL file. This is needed
# to archive the transaction log file containing the backup
# end position, which is required to recover from the backup.
try:
postgres.switch_wal()
if self.executor.server.config.archiver is False and \
self.executor.server.config.streaming_archiver is True:
postgres.switch_wal()


So, I guess this special condition has to be covered in backup.py +1096.
signature.asc

Gunnar "Nick" Bluth

unread,
Oct 22, 2019, 6:47:18 AM10/22/19
to pgba...@googlegroups.com
Am 22.10.19 um 12:24 schrieb Luca Ferrari:
If the server is idle at the end of the backup (i.e., didn't produce
enough WAL to generate one more segment), that will still not fix it.
The culprit is the "leave the last segment in streaming/ for the
walreceiver".
signature.asc

Luca Ferrari

unread,
Oct 23, 2019, 6:15:19 AM10/23/19
to Barman, Backup and Recovery Manager for PostgreSQL
On Tue, Oct 22, 2019 at 12:47 PM Gunnar "Nick" Bluth
<gunnar...@pro-open.de> wrote:
> If the server is idle at the end of the backup (i.e., didn't produce
> enough WAL to generate one more segment), that will still not fix it.
> The culprit is the "leave the last segment in streaming/ for the
> walreceiver".

I cannot even reproduce that thing.
I tested it against a backup that takes roughly 5 minutes to complete,
and inserted a tuple every 5 seconds:

% i=0; while [ $i -lt 50 ]; do sleep 2; psql -U luca -c "insert into
foo(i,t) values($i,'test');" testdb; i=$(( i + 1 )); echo "done $i";
done

I ensured that the inserts keep going after the pg_basebackup
completed and stop after barman reports "Finalising the backup." so to
offload the server before WAL archiving ends. However, I cannot
reproduce the problem even toggling archive_timemout to off.
I used a slot for the backup, that should guarantee to get all the
WALs, and that's the OP configuration, too.
Barman 2.8 against PostgreSQL 11.5.

Luca

Gunnar "Nick" Bluth

unread,
Oct 23, 2019, 9:47:08 AM10/23/19
to pgba...@googlegroups.com
Am 23.10.19 um 12:14 schrieb Luca Ferrari:
Now imagine a Drupal DB of a webserver that receives _no_ edits or even
page displays... no WAL, in other words. Which is (sadly) a pretty exact
descritpion of my website ;-)

So then, there will be no newer WAL segment than the one from the final
switch_wal(), which doesn't get pushed/sorted into the archive and thus
is not found by check_backup().
signature.asc

Luca Ferrari

unread,
Oct 23, 2019, 10:02:32 AM10/23/19
to Barman, Backup and Recovery Manager for PostgreSQL
On Wed, Oct 23, 2019 at 3:47 PM Gunnar "Nick" Bluth
<gunnar...@pro-open.de> wrote:
> So then, there will be no newer WAL segment than the one from the final
> switch_wal(), which doesn't get pushed/sorted into the archive and thus
> is not found by check_backup().

That's also what I tested: I did a very little WAL activity, stopped
and did a backup. Therefore, no WAL activity.
Summary: with little activity before the backup and none during the
backup, or doing little activity during the backup (not enough to
trigger a complete WAL switch) I was unable to reproduce the issue.

I'm not saying this is not an issue, I'm saying I don't know how to
reproduce it.

If I had to do an attempt, I would postpone the check-backup action
with a sleep in order to see if it is a problem tied to some sort of
network latency.

Some barman developers can comment on this.

Luca

Gabriele Bartolini

unread,
Oct 24, 2019, 4:31:22 AM10/24/19
to pgba...@googlegroups.com
Hi guys,

By looking at the diagnose shared in Github issue 244, we suspect that the WAL archive is not aligned with the origin Postgres. Let me explain.

The backup first WAL file is 000000010000001600000024, while the end WAL is 000000010000001600000025. By looking at the status, PostgreSQL's pg_stat_archiver says that  "last_archived_wal" is 000000010000001600000026.

However, "last_archived_wal_per_timeline" reports that the archive contains 00000001000000420000002B, which is much further ahead in the sequence than 000000010000001600000025. Therefore, the following check is not skipped, as Barman expects that the end WAL file has already been archived (we have in the archive a WAL file that is ahead of that):


This means that:

1. A WAL file was undeliberately shipped from another Postgres server in the same incoming directory, or
2. PostgreSQL was upgraded in place or reinstalled, reusing the same location in Barman

Does this resonate with you?

From the code stand-point this is not a bug. Unfortunately, WAL file validation is not implemented in Barman yet.

Cheers,
Gabriele


--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.

Marco Cassiano

unread,
Oct 24, 2019, 9:35:26 AM10/24/19
to Barman, Backup and Recovery Manager for PostgreSQL
Thank you Gabriele,

you got it!

I did the following :

- Deleted the backup directory
- barman rebuild-xlogdb pg64infolog-prod
- barman switch-wal pg64infolog-prod
- barman receive-wal pg64infolog-prod --reset

and now it works!!

This was the last Postgresql Server that I migrated to v 11.5 (the only one that I pg_restored from the dump and not pg_upgraded), I probably missed some step in submitting it to pgbarman even if I thought to have done the same things I did for the other servers.

Now I recognize as a symptom one thing that I didn't mention because I thought it was caused by a barman misbehaviour (sorry to have doubted of pgbarman!).

What I usually see with barman list-backup is

pg64infolog-prod 20191024T143446 - Thu Oct 24 14:42:12 2019 - Size: 37.3 GiB - WAL Size: 0 B
pg64infolog-prod 20191024T141531 - Thu Oct 24 14:22:32 2019 - Size: 37.3 GiB - WAL Size: 6.6 MiB
pg64infolog-prod 20191024T140619 - Thu Oct 24 14:13:08 2019 - Size: 37.3 GiB - WAL Size: 2.3 MiB


What I saw before cleaning all up was

pg64infolog-prod 20191024T143446 - Thu Oct 24 14:42:12 2019 - Size: 37.3 GiB - WAL Size: 900 MB
pg64infolog-prod 20191024T141531 - Thu Oct 24 14:22:32 2019 - Size: 37.3 GiB - WAL Size: 0 B
pg64infolog-prod 20191024T140619 - Thu Oct 24 14:13:08 2019 - Size: 37.3 GiB - WAL Size: 0 B

That is a lot of MB in WAL Size only for the last backup (when it seldom worked)

Thanks a lot again!!!

For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgba...@googlegroups.com.

Luca Ferrari

unread,
Oct 24, 2019, 11:35:26 AM10/24/19
to Barman, Backup and Recovery Manager for PostgreSQL
On Thu, Oct 24, 2019 at 10:31 AM Gabriele Bartolini
<gabriele....@2ndquadrant.com> wrote:
> The backup first WAL file is 000000010000001600000024, while the end WAL is 000000010000001600000025. By looking at the status, PostgreSQL's pg_stat_archiver says that "last_archived_wal" is 000000010000001600000026.
>
> However, "last_archived_wal_per_timeline" reports that the archive contains 00000001000000420000002B, which is much further ahead in the sequence than 000000010000001600000025. Therefore, the following check is not skipped, as Barman expects that the end WAL file has already been archived (we have in the archive a WAL file that is ahead of that):

Interesting. I've tried to reproduce it, without any success, so I
could have missed some part of the explanation:
1) I deliberated created a 16MB file in the incoming directory with a
name pointing to a LSN in the future
2) did a backup as regularly

The diagnose output is:

"current_xlog": "0000000700000245000000E1",
"last_archived_wal": "0000000700000245000000E0",
...
"wals": {
"last_archived_wal_per_timeline": {
"00000007": {
"compression": "gzip",
"name": "0000000700000446000000FF",
"size": 15923,
"time": 1571930557.187798
}
}
}


and the backup did run fine:

% barman list-backup miguel
miguel 20191024T172522 - Thu Oct 24 17:27:42 2019 - Size: 425.4 MiB -
WAL Size: 63.3 KiB
% barman check-backup miguel 20191024T172522
% echo $?
0

What am I missing?

Luca
Reply all
Reply to author
Forward
0 new messages