configure new barman server - streaming backups are stuck

1,763 views
Skip to first unread message

Mariel Cherkassky

unread,
Jul 15, 2018, 4:51:48 AM7/15/18
to Barman, Backup and Recovery Manager for PostgreSQL
Hi,
I'm trying to configure a barman server that would manage the backup and restore in the cluster in our company. I'm trying understand a few things and I have many questions. I would be happy to get some anwers : 

1)I decided to use the streaming option for the backups - for wals and for full physical backup with pg_basebackup. However, I was asked (according to documentation) to configure the archive_command. My question is why ? Is there a  reason to use the archive_command and the streaming (pg_recievewal) option ? Now, If I disable the archive_command and use only the streaming option, to automate the process of recieving wals I need to run for each server : barman receive-wal my-pg-server ? Is there a reason to have both archive_command and streaming option ?

2)When I use the command barman receive-wal ptktl-psgsqldb1 I'm getting the next output once a wal is switched : 
ptktl-psgsqldb1: pg_receivewal: starting log streaming at 1F1/F7000000 (timeline 3)
ptktl-psgsqldb1: pg_receivewal: finished segment at 1F1/F8000000 (timeline 3)


However, I dont see those wals in the incoming directory. Why is that ? Is it just an alert for a switch or a message that means that that wal was recieved ?

3)When using the check option I have the next error : 
empty incoming directory: FAILED ('/PostgreSQL/barman//ptktl-psgsqldb1/incoming' must be empty when archiver=off) but I set streaming_archiver to on. Moreover I dont see any errors in the errors directory.

4)When I tried to take a full backup via barman backup ptktl-psgsqldb1 it seems that the backup is stuck : 
Starting backup using postgres method for server ptktl-psgsqldb1 in /PostgreSQL/barman//ptktl-psgsqldb1/base/20180715T114042
Backup start at LSN: 1F1/FB040E28 (00000003000001F1000000FB, 00040E28)
Starting backup copy via pg_basebackup for 20180715T114042

The database is empty ( a new one) but still it doesnt finish...
Any idea why ?

I think that 4 and 2 are connected. Any idea guys ?

Thanks , Mariel.

Achilleas Mantzios

unread,
Jul 15, 2018, 5:24:56 AM7/15/18
to pgba...@googlegroups.com
hi again,
to answer a few ones :
1) there is a reason, yes, and the reason is redundancy, if the streaming goes down (e.g. due to a stupid firewall rule that some admin configured) then barman can continue with no problems. The same is true vise versa as well.
2) two things here :
a) you dont run barman receive-wal by hand. Instead and based on your configuration, barman cron will do this for you. This is described in the manual.
b) you'll see the wal files received via streaming in  ptktl-psgsqldb1/streaming not incoming
3) delete any remaining files in /incoming
4) depending on the load on the server you may have some delays, depending on whether the next checkpoint takes place. Try to do pg_start_backup in your server (exclusive non-concurrent) and watch out how many minutes will pass before it returns. Its the same thing. You many try taking your first backup with : the --immediate-checkpoint option, but if there is load on the server this will have a negative impact to your users , performance-wise


good luck!

Mariel Cherkassky

unread,
Jul 15, 2018, 5:56:22 AM7/15/18
to Barman, Backup and Recovery Manager for PostgreSQL
Hi,
first of all thanks for the fast response !! Regarding my questions : 

1)If I'll use both archive_command(rsync) and streaming option I will have duplicated wals on the barman side and then I will backup every wal twice, wont I ?
2)Yes I got it that barman cron will do the job for me, but why you mentioned that the streaming directory is the one that will contain the wals ? I checked the incoming directory in my configureation and this is the result : 
barman show-server ptktl-psgsqldb1 | grep incoming
        incoming_wals_directory: /PostgreSQL/barman//ptktl-psgsqldb1/incoming
        max_incoming_wals_queue: None

I though that the wals will be coppied to the incoming directory. If it isnt true, why I have that directory ? 

3)I deleted the content of the directory as you suggested.

4)I didnt have a load problem but I found out that it was because I didnt assign any value to the archive_command and because of that the pg_stop_backup failed to finish. After setting the archive_command to /bin/true it worked like magic.

5)Now It seems that i have problems with the slot : 
2018-07-15 12:36:39,512 [9043] barman.server ERROR: Check 'receive-wal running' failed for server 'ptktl-psgsqldb1'
2018-07-15 12:42:20,175 [11270] barman.server ERROR: Check 'replication slot' failed for server 'ptktl-psgsqldb1'
2018-07-15 12:42:20,178 [11270] barman.server ERROR: Check 'failed backups' failed for server 'ptktl-psgsqldb1'
2018-07-15 12:42:20,183 [11270] barman.server ERROR: Check 'receive-wal running' failed for server 'ptktl-psgsqldb1'

barman check ptktl-psgsqldb1
...

        replication slot: FAILED (slot 'barman' not active: is 'receive-wal' running?)
        receive-wal running: FAILED (See the Barman log file for more details)
When I check the slots status in pg_replication_slot I see that the status of the slot is f. Is this is the reason behind the errors ? If yes how can I active the slot ? Or its just means that the slot is not in use right now ? 

Thanks , Mariel.

בתאריך יום ראשון, 15 ביולי 2018 בשעה 11:51:48 UTC+3, מאת Mariel Cherkassky:

Achilleas Mantzios

unread,
Jul 15, 2018, 7:20:09 AM7/15/18
to pgba...@googlegroups.com

---------- Forwarded message ----------
From: Achilleas Mantzios <mantzio...@gmail.com>
Date: Sun, Jul 15, 2018 at 1:04 PM
Subject: Re: הודעה פרטית בעניין: [pgbarman] configure new barman server - streaming backups are stuck
To: Mariel Cherkassky <mariel.c...@gmail.com>


1) barman takes care of checking for same filenames different checksum. so don't worry about it. However if barman discovers this, then this is considered grave error and should be dealt immediately.
2)
wal_archive files go to /incoming
pg_receivewal files go to /streaming
processed files got compressed and placed inside /wals

4) it should suffice commenting out "archiver = on" , or setting it to off inside your barman conf. No need to tell barman that you have wal archiving in place and then fool him that this works (with archive_comman='/bin/true' )

On Sun, Jul 15, 2018 at 12:45 PM, Mariel Cherkassky <mariel.c...@gmail.com> wrote:
Hi,
first of all thank you for the quick response !!
Regarding my questions : 

1)In case I configure both archive_command and streaming, I would have duplicated wals in the barman server wont I ? I will backup each wal twice which means wasting space. Am I wrong ?

2)Yes I read more and I understood that I will need to use cron. However, why then I see that wals are created/coppied  into the incoming directory ? While the streaming directory contains only files with suffix .partial at the end.
3)I deleted the files as you suggested, but I checked and the incoming_wals_directory is set to the incoming directory :
 
[postgres@ptkpl-barman log]$ barman show-server ptktl-psgsqldb1 | grep incoming
        incoming_wals_directory: /PostgreSQL/barman//ptktl-psgsqldb1/incoming
Why then the wals will be in the streaming directory ?

4)The reason behind it is because I didnt put anything in the archive_command. 


5)Now when I check the server via barman check I get the next errors : 
    replication slot: FAILED (slot 'barman' not active: is 'receive-wal' running?)
        receive-wal running: FAILED (See the Barman log file for more details)

from the barman log I get the next output : 

2018-07-15 12:36:39,512 [9043] barman.server ERROR: Check 'receive-wal running' failed for server 'ptktl-psgsqldb1'
2018-07-15 12:42:20,175 [11270] barman.server ERROR: Check 'replication slot' failed for server 'ptktl-psgsqldb1'
2018-07-15 12:42:20,178 [11270] barman.server ERROR: Check 'failed backups' failed for server 'ptktl-psgsqldb1'
2018-07-15 12:42:20,183 [11270] barman.server ERROR: Check 'receive-wal running' failed for server 'ptktl-psgsqldb1'

I queried the pg_replication_slots and I saw that the barman replication slot isnt active (active=f). How can I change that ? Or this value becomes active when it is in use ?

Thanks !

בתאריך יום ראשון, 15 ביולי 2018 בשעה 12:24:56 UTC+3, מאת Achilleas Mantzios:

Achilleas Mantzios

unread,
Jul 15, 2018, 7:25:33 AM7/15/18
to Mariel Cherkassky, pgba...@googlegroups.com
First, read the docs carefully. And decide which configuration suits you. Then try to configure for it. If you have any questions bring em on.

On Sun, Jul 15, 2018 at 2:21 PM, Mariel Cherkassky <mariel.c...@gmail.com> wrote:
Hi,
1)What do you mean by "However if barman discovers this, then this is considered grave error and should be dealt immediately." I didnt understand.
2)But I should set the archive_command to copy the wals to the incoming directory right ? So only after the archives in /incoming / streaming are processes they are moved into /wals directory ?
3) So if I will use both archive_command(rsync) and streaming it it better to set the archiver to on ?
hi again,

Mariel Cherkassky

unread,
Jul 15, 2018, 7:27:35 AM7/15/18
to Achilleas Mantzios, pgba...@googlegroups.com
To be honest I read the documentation during our conversation. I saw that there are many more parameters that needed to be configured. I change the archive_command to rsync it to my barman server and I set archiver=on on the barman config file. However, I'm still getting the same error. any hint ?

thanks , Mariel.

Achilleas Mantzios

unread,
Jul 15, 2018, 7:41:38 AM7/15/18
to Mariel Cherkassky, pgba...@googlegroups.com
Hmm I remember with other work included it took me about one full day to full master barman docs...
anyways, so you decided on which topology?

which error are you referring to?

If your archive_command is set to rsync to barman@barmanserver:..../incoming then you must set archiver to on, so that barman correctly processes those files, and clears /incoming dir too. In vast majority of cases, /incoming and /streaming will receive identical files (hopefully). The presence of both is for your redundancy / safety . BUT there must be NO differences between two files with identical names in /incoming and /streaming . If this happens, - same names - different checksums then barman will write those in /errors dir. Barman check command will report the error, and you are required to take action if and when that happens. (I speculate that if you stream and archive from the same master nothing wrong will ever happen)

Achilleas Mantzios

unread,
Jul 15, 2018, 7:42:59 AM7/15/18
to Mariel Cherkassky, pgba...@googlegroups.com
Forgot to explicitly say : files from /incoming and /streaming end up in /wals. (and in bad situations in /errors)

Mariel Cherkassky

unread,
Jul 15, 2018, 7:46:17 AM7/15/18
to Achilleas Mantzios, pgba...@googlegroups.com
I didnt master the barman docs, I read them to get the idea behind it. I'm doing it during my tests (hands on). As you recommended, I decided to use both streaming and archive_command(rsync). Therefore, I configured in the specific server barman conf both :
streaming_archiver=on
archiver=on

my archive_command : 
archive_command = 'rsync -a %p postgres@ptkpl-pgwatch2:/PostgreSQL/barman//ptktl-psgsqldb1/incoming/%f'

When I try to backup I'm getting the next error : 
barman backup ptktl-psgsqldb1
ERROR: Impossible to start the backup. Check the log for more details, or run 'barman check ptktl-psgsqldb1'
2018-07-15 14:45:51,107 [16771] barman.server ERROR: Check 'replication slot' failed for server 'ptktl-psgsqldb1'
2018-07-15 14:45:51,110 [16771] barman.server INFO: Ignoring failed check 'failed backups' for server 'ptktl-psgsqldb1'
2018-07-15 14:45:51,115 [16771] barman.server ERROR: Check 'receive-wal running' failed for server 'ptktl-psgsqldb1'
2018-07-15 14:45:51,119 [16771] barman.server ERROR: Impossible to start the backup. Check the log for more details, or run 'barman check ptktl-psgsqldb1'

Achilleas Mantzios

unread,
Jul 15, 2018, 7:51:48 AM7/15/18
to Mariel Cherkassky, pgba...@googlegroups.com
You should create the slot by running :
barman receive-wal --create-slot server_name

what does
barman check <your_server>
say?

Achilleas Mantzios

unread,
Jul 15, 2018, 7:53:29 AM7/15/18
to Mariel Cherkassky, pgba...@googlegroups.com
also do :
ps aux | grep barman
and verify that the barman receive-wal (spawned by the barman cron in /etc/cron.d/barman ) is running .

Mariel Cherkassky

unread,
Jul 15, 2018, 8:00:11 AM7/15/18
to Achilleas Mantzios, pgba...@googlegroups.com
It seems that I had to start the cron. I thought that the cron is nessasary only in case we want to backup the wals and not only take a physical backup. Thank you very much!!!! 



Achilleas Mantzios

unread,
Jul 15, 2018, 9:24:10 AM7/15/18
to Mariel Cherkassky, pgba...@googlegroups.com
Hey glad you made it work! Also barman cron manages wals/backups according to retention policy. Also to my understanding, barman cron is what reads /incoming and /streaming and puts them into /wals.

Mariel Cherkassky

unread,
Jul 15, 2018, 9:44:34 AM7/15/18
to Achilleas Mantzios, pgba...@googlegroups.com
Yes I got it, thank you very much !

Mariel Cherkassky

unread,
Jul 18, 2018, 8:18:31 AM7/18/18
to Achilleas Mantzios, pgba...@googlegroups.com
Hi Achilleas Mantzios,
I configured barman and the pg server that I backup. However, I dont understand why the wals arent moving to the wals directory. You said that they will be moved from streaming/incoming to the wals directory once they are proccessed. However, I see that the wals are comming from the archive_command/replication_streaming into the incoming/streaming directories but I dont see that they are coppied into the wals directory.

For example : 

postgres=# select pg_switch_wal();
 pg_switch_wal 
---------------
 1F2/8004350
(1 row)

postgres=# select * from pg_walfile_name_offset('1F2/8004350');
        file_name         | file_offset 
--------------------------+-------------
 00000003000001F200000008 |       17232
(1 row)


[ptkpl-pgbarman incoming]$ ls -ltr
total 49152
-rw------- 1 postgres postgres 16777216 Jul 18 14:56 00000003000001F200000006
-rw------- 1 postgres postgres 16777216 Jul 18 15:02 00000003000001F200000007
-rw------- 1 postgres postgres 16777216 Jul 18 15:16 00000003000001F200000008

[ptkpl-pgbarman streaming]$ ls -ltr
total 65536
-rw------- 1 postgres postgres 16777216 Jul 18 14:56 00000003000001F200000006
-rw------- 1 postgres postgres 16777216 Jul 18 15:02 00000003000001F200000007
-rw------- 1 postgres postgres 16777216 Jul 18 15:16 00000003000001F200000008
-rw------- 1 postgres postgres 16777216 Jul 18 15:16 00000003000001F200000009.partial

[ptkpl-pgbarman wals]$ ls -ltr
total 8
-rw------- 1 postgres postgres  60 Jul 15 10:53 00000003.history
-rw-rw-r-- 1 postgres postgres 387 Jul 17 13:37 xlog.db
drwxrwxr-x 2 postgres postgres 214 Jul 17 13:37 00000003000001F2

Achilleas Mantzios

unread,
Jul 18, 2018, 8:24:10 AM7/18/18
to Mariel Cherkassky, pgba...@googlegroups.com
look into 00000003000001F2 . This is a directory containing all wal files with 00000003000001F2 prefix.

Mariel Cherkassky

unread,
Jul 18, 2018, 8:25:54 AM7/18/18
to Achilleas Mantzios, pgba...@googlegroups.com
I didnt realize that its directory :(.

However, its doesnt include all the wals : 
[postgres@ptkpl-pgbarman 00000003000001F2]$ ls -ltr
total 4316
-rw------- 1 postgres postgres   84484 Jul 15 14:48 00000003000001F200000001
-rw------- 1 postgres postgres   33493 Jul 15 14:49 00000003000001F200000002
-rw------- 1 postgres postgres     198 Jul 15 14:49 00000003000001F200000002.00000028.backup
-rw------- 1 postgres postgres   27302 Jul 15 14:49 00000003000001F200000003
-rw------- 1 postgres postgres   76668 Jul 15 15:06 00000003000001F200000004
-rw------- 1 postgres postgres 4184802 Jul 17 13:19 00000003000001F200000005

Achilleas Mantzios

unread,
Jul 18, 2018, 8:28:42 AM7/18/18
to Mariel Cherkassky, pgba...@googlegroups.com
whats the content in /incomng , /streaming /wals now? all files from /incoming , /streaming should end up in /wals . Is your cron running ?

Mariel Cherkassky

unread,
Jul 18, 2018, 8:37:18 AM7/18/18
to Achilleas Mantzios, pgba...@googlegroups.com
I guess it takes a few minutes to move the files from the incoming/streaming to the wals directory(In my case 20 minutes).  Right now the wals indeed were moved to the wals directory.

Achilleas Mantzios

unread,
Jul 18, 2018, 9:41:03 AM7/18/18
to Mariel Cherkassky, pgba...@googlegroups.com
so it seems to work, how often have you setup "barman cron" to run? every minute as is the default from the package ?
Reply all
Reply to author
Forward
0 new messages