receive-wal

3,314 views
Skip to first unread message

Alf Normann Klausen

unread,
Jan 8, 2017, 8:18:03 PM1/8/17
to Barman, Backup and Recovery Manager for PostgreSQL
Hi,

When trying to start the receive-wal I am getting this error. Both on my master and my replica.
Postgres is version 9.6 and barman version 2.1

[barman@barman:0 ~]$ barman receive-wal datavarehus
Starting receive-wal for server datavarehus
datavarehus
: pg_receivexlog: starting log streaming at 5AB/85000000 (timeline 1)
datavarehus
:
 pg_receivexlog
: unexpected termination of replication stream: ERROR:  
requested WAL segment
00000001000005AB00000085 has already been removed
datavarehus
: pg_receivexlog: disconnected
ERROR
: pg_receivexlog terminated with error code: 1
[barman@barman:0 ~]$

Before I could fix this by issuing barman receive-wal datavarehus --reset but now it does not help.

My server "barman check" looks like this:

[barman@barman:0 ~]$ barman check datavarehus
Server datavarehus:
   
PostgreSQL: OK
    superuser
: OK
   
PostgreSQL streaming: OK
    wal_level
: OK
    directories
: OK
    retention policy settings
: OK
    backup maximum age
: OK (no last_backup_maximum_age provided)
    compression settings
: OK
    failed backups
: OK (there are 0 failed backups)
    minimum redundancy requirements
: OK (have 3 backups, expected at least 1)
    ssh
: OK (PostgreSQL server)
   
not in recovery: OK
    archive_mode
: OK
    archive_command
: OK
    continuous archiving
: OK
    pg_receivexlog
: OK
    pg_receivexlog compatible
: OK
    receive
-wal running: FAILED (See the Barman log file for more details)
    archiver errors
: FAILED (duplicates: 19)
[barman@barman:0 ~]$

PS: I am able to run a backup when trying without streaming.


Kind regards,
Alf

Alf Normann Klausen

unread,
Jan 9, 2017, 7:56:30 AM1/9/17
to Barman, Backup and Recovery Manager for PostgreSQL
In the log files on both the master and the replica db server, I find the following errors:

< 2017-01-09 13:50:01.515 CET > LOG:  connection received: host=192.168.4.52 port=60336
< 2017-01-09 13:50:01.516 CET > LOG:  replication connection authorized: user=postgres
< 2017-01-09 13:50:01.517 CET > ERROR:  requested WAL segment 00000001000005AB00000085 has already been removed

It looks like the barman server is trying to get a WAL segment from the pg database that already has been archived and removed from the pg_xlog directory.

How is it possible that the barman server can not "keep up" with the pg database that is trying to backup?
I have tried the "barman receive-wal datavarehus --reset" and then "barman receive-wal datavarehus" commands.
I have also tried the "barman rebuild-xlogdb datavarehus" command and then tried to restart the receive-wal, but still gets the same error.

How can I "completely" reset the barman servers wal state, in a way that everything is "from scratch"? Do I have to delete all old backups?

Hoping for some tips!  :-)
Thanks a lot!

Kind regards from,
Alf

Alf Normann Klausen

unread,
Jan 9, 2017, 8:37:10 AM1/9/17
to Barman, Backup and Recovery Manager for PostgreSQL
Hi, for further info, here is some output from my barman.log file, for pg db master:
2017-01-09 14:33:02,343 [41717] barman.command_wrappers INFO: datavarehus: pg_receivexlog: disconnected
2017-01-09 14:33:03,318 [41716] barman.wal_archiver INFO: No xlog segments found from streaming for datavarehus.
2017-01-09 14:34:01,545 [41743] barman.wal_archiver INFO: Found 1 xlog segments from file archival for datavarehus. Archive all segments in one run.
2017-01-09 14:34:01,545 [41743] barman.wal_archiver INFO: Archiving segment 1 of 1 from file archival: datavarehus/00000001000006720000003A
2017-01-09 14:34:01,547 [41744] barman.server INFO: Starting receive-wal for server datavarehus
2017-01-09 14:34:01,590 [41744] barman.command_wrappers INFO: datavarehus: pg_receivexlog: starting log streaming at 5AB/85000000 (timeline 1)
2017-01-09 14:34:01,591 [41744] barman.command_wrappers INFO: datavarehus: pg_receivexlog: unexpected termination of replication stream: ERROR:  requested WAL segment 00000001000005AB00000085 has already been removed
2017-01-09 14:34:01,592 [41744] barman.command_wrappers INFO: datavarehus: pg_receivexlog: disconnected
2017-01-09 14:34:02,594 [41743] barman.wal_archiver INFO: No xlog segments found from streaming for datavarehus.
[barman@barman:0 ~]$


..and from pg db replica:
2017-01-09 14:35:01,784 [41774] barman.server INFO: Starting receive-wal for server datavarehus3
2017-01-09 14:35:01,796 [41773] barman.wal_archiver INFO: No xlog segments found from streaming for datavarehus3.
2017-01-09 14:35:01,827 [41774] barman.command_wrappers INFO: datavarehus3: pg_receivexlog: starting log streaming at 5F7/59000000 (timeline 1)
2017-01-09 14:35:01,828 [41774] barman.command_wrappers INFO: datavarehus3: pg_receivexlog: unexpected termination of replication stream: ERROR:  requested WAL segment 00000001000005F700000059 has already been removed
2017-01-09 14:35:01,829 [41774] barman.command_wrappers INFO: datavarehus3: pg_receivexlog: disconnected
2017-01-09 14:36:02,049 [41792] barman.wal_archiver INFO: No xlog segments found from streaming for datavarehus3.
2017-01-09 14:36:02,053 [41793] barman.server INFO: Starting receive-wal for server datavarehus3
2017-01-09 14:36:02,095 [41793] barman.command_wrappers INFO: datavarehus3: pg_receivexlog: starting log streaming at 5F7/59000000 (timeline 1)
2017-01-09 14:36:02,096 [41793] barman.command_wrappers INFO: datavarehus3: pg_receivexlog: unexpected termination of replication stream: ERROR:  requested WAL segment 00000001000005F700000059 has already been removed
2017-01-09 14:36:02,097 [41793] barman.command_wrappers INFO: datavarehus3: pg_receivexlog: disconnected

Kind regards,

Alf



mandag 9. januar 2017 02.18.03 UTC+1 skrev Alf Normann Klausen følgende:

Gabriele Bartolini

unread,
Jan 9, 2017, 11:28:55 AM1/9/17
to pgba...@googlegroups.com
Barman's behaviour is the same as any standby server. That is why, as outline in the documentation, if you want to make sure that all WAL files are streamed, you need  a replication slot.

Otherwise, all you have to do is execute receive-wal --reset and restart from the current WAL.

--
 Gabriele Bartolini - 2ndQuadrant Italia - Director
 PostgreSQL Training, Services and Support
 gabriele....@2ndQuadrant.it | www.2ndQuadrant.it

--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alf Normann Klausen

unread,
Jan 10, 2017, 6:21:29 AM1/10/17
to Barman, Backup and Recovery Manager for PostgreSQL

Hi Gabriele,

 

Thanks for this information. I am trying my best to understand all aspects of this.

 

On my slave I use (like most others, I assume) wal streaming to make the standby server up to date. Also I have set up archiving using “archive_mode = on” and “archive_command = 'rsync -a %p bar...@192.168.4.52:/mnt/nfs/barman/datavarehus3/incoming/%f'” and I use barman get-wal to “play forward” if the standby server has been down: “restore_command = 'ssh bar...@192.168.4.52 barman get-wal datavarehus %f > %p'”

This works well. Yesterday when the standby was down for 6,5 hours, the barman get-wal restore_command fixed it in about 63 minutes.

 

When trying to run barman receive-log to reset the wal status, delete the slot, and then create the slot again, I get this output:


[barman@barman:1 ~]$ barman receive-wal datavarehus3 --reset

Starting receive-wal for server datavarehus3

Resetting receive-wal directory status


[barman@barman:1 ~]$ barman receive-wal datavarehus3 --drop-slot

Dropping physical replication slot 'barman' on server 'datavarehus3'

Replication slot 'barman' dropped


[barman@barman:1 ~]$ barman receive-wal datavarehus3 --create-slot

Creating physical replication slot 'barman' on server 'datavarehus3'

Replication slot 'barman' created


[barman@barman:1 ~]$ barman receive-wal datavarehus3

Starting receive-wal for server datavarehus3

datavarehus3:pg_receivexlog: starting log streaming at 5F7/59000000 (timeline 1)

datavarehus3:pg_receivexlog: unexpected termination of replication stream: ERROR:  requested WAL segment

00000001000005F700000059 has already been removed

datavarehus3:pg_receivexlog: disconnected

ERROR: pg_receivexlog terminated with error code: 1


[barman@barman:1 ~]$barman replication-status datavarehus3

Status of streaming
clients
for server 'datavarehus3':

 Current xlog location on master: 68D/6D65FFF8

 No streaming clients attached

[barman@barman:1 ~]$



Is this correct, or am I missing something? Can I use the "barman get-wal" on barman server to get barman to find the archived wal files?


Isn’t the archiving compatible with receive-wal?


Kind regards,

Alf




mandag 9. januar 2017 17.28.55 UTC+1 skrev Gabriele Bartolini følgende:
Barman's behaviour is the same as any standby server. That is why, as outline in the documentation, if you want to make sure that all WAL files are streamed, you need  a replication slot.

Otherwise, all you have to do is execute receive-wal --reset and restart from the current WAL.
--
 Gabriele Bartolini - 2ndQuadrant Italia - Director
 PostgreSQL Training, Services and Support
 gabriele.bartolini ( at ) 2ndQuadrant.it | www.2ndQuadrant.it


For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.

Alf Normann Klausen

unread,
Jan 10, 2017, 6:55:41 AM1/10/17
to Barman, Backup and Recovery Manager for PostgreSQL
Hi,

I found the following workaround:

[barman@barman: ~]$ mv /mnt/nfs/barman/datavarehus3/ /mnt/nfs/barman/datavarehus3.20160110
[barman@barman: ~]$ mkdir /mnt/nfs/barman/datavarehus3/

[barman@barman: ~]$ barman receive-wal datavarehus3 --stop
Stopped process receive-wal(308)

[barman@barman: ~]$ barman receive-wal datavarehus3 --reset
Starting receive-wal for server datavarehus3
Resetting receive-wal directory status
[barman@barman: ~]$ barman cron
Skipping temporarily disabled server 'datavarehus4'
Starting WAL archiving for server datavarehus
Starting streaming archiver for server datavarehus
Starting WAL archiving for server datavarehus3
Starting streaming archiver for server datavarehus3
[barman@barman: ~]$ pw # pw is alias for ps -fU barman |grep receive-wal
barman    
785     1  2 12:52 ?        00:00:00 /usr/bin/python /bin/barman -c /etc/barman.conf -q receive-wal datavarehus3
[barman@barman: ~]$ pr # pr is alias for ps -fU barman |grep receivexlog
barman    
788   785  1 12:52 ?        00:00:00 /usr/pgsql-9.6/bin/pg_receivexlog --dbname=dbname=replication host=192.168.4.127 replication=true user=postgres application_name=barman_receive_wal --verbose --no-loop --no-password --directory=/mnt/nfs/barman/datavarehus3/streaming --slot=barman --synchronous
[barman@barman: ~]$

And now my pg slave server datavarehus3 is streaming again!

barman@barman: ~]$ barman replication-status datavarehus3

Status of streaming clients for server 'datavarehus3':
  Current xlog location on master: 68D/99F7FEC8
  Number of streaming clients: 1

  1. Async WAL streamer
     Application name: barman_receive_wal
     Sync stage      : 1/3 1-safe
     Communication   : TCP/IP
     IP Address      : 192.168.4.52 / Port: 39220 / Host: -
     User name       : postgres
     Current state   : streaming (async)
     Replication slot: barman
     WAL sender PID  : 100506
     Started at      : 2017-01-10 12:52:52.443487+01:00
EXCEPTION: float() argument must be a string or a number
See log file for more details.
[barman@barman: ~]$


Kind regards,
Alf


mandag 9. januar 2017 02.18.03 UTC+1 skrev Alf Normann Klausen følgende:

Alf Normann Klausen

unread,
Jan 10, 2017, 9:40:36 AM1/10/17
to Barman, Backup and Recovery Manager for PostgreSQL
Hi,

The same workaround fixed my master db too. There must have been some inconsistency in the backup directory.

[barman@barman:1 barman]$ barman replication-status datavarehus
Status of streaming clients for server 'datavarehus':
 
Current xlog location on master: 68E/BC940000
 
Number of streaming clients: 2


 
1. Async WAL streamer
     
Application name:
barman_receive_wal
     
Sync stage      : 2/3 WAL Sent (min)
     
Communication   : TCP/IP
     IP
Address      : 192.168.4.52 / Port: 52360 / Host: -

     
User name       : postgres
     
Current state   : streaming (async)
     
Replication slot:
barman
     WAL sender PID  
: 185610
     
Started at      : 2017-01-10 13:43:37.869717+01:00
     
Sent location   : 68E/BC940000 (diff: 0 B)
     
Write location  : 68E/BC918000 (diff: -160.0 KiB)
     
Flush location  : 68E/BC000000 (diff: -9.2 MiB)

 
2. Async standby
     
Application name: walreceiver
     
Sync stage      : 4/5 2-safe
     
Communication   : TCP/IP
     IP
Address      : 192.168.4.127 / Port: 49072 / Host: -
     
User name       : replicador
     
Current state   : streaming (async)
     WAL sender PID  
: 169933
     
Started at      : 2017-01-09 15:14:03.399024+01:00
     
Standby's xmin  : 1750644
     Sent location   : 68E/BC940000 (diff: 0 B)
     Write location  : 68E/BC940000 (diff: 0 B)
     Flush location  : 68E/BC940000 (diff: 0 B)
     Replay location : 68E/BC93DF98 (diff: -8.1 KiB)
[barman@barman:1 barman]$

Thanks again for what a nice product barman is!


Kind regards,
Alf


mandag 9. januar 2017 02.18.03 UTC+1 skrev Alf Normann Klausen følgende:

Siddhartha Gurijala

unread,
May 16, 2020, 1:09:50 PM5/16/20
to Barman, Backup and Recovery Manager for PostgreSQL
Thank you so much i kind of had similar issue and this fixed my issue. Stopping and resetting the wal. 
Reply all
Reply to author
Forward
0 new messages