archiver errors: FAILED (duplicates: 1)

4,354 views
Skip to first unread message

qg...@appannie.com

unread,
Nov 23, 2016, 1:08:47 AM11/23/16
to Barman, Backup and Recovery Manager for PostgreSQL
Hi Guys,

Just upgrade barman 1.6 to 2.0, still using old ssh-rsync as our backup method. 

After upgrade we can not get rid of a check error, all other functions seem to work well, could you help to explain what is this error means? And how can we get rid of it?

barman check all

Server u-master:

PostgreSQL: OK

superuser: OK

wal_level: OK

directories: OK

retention policy settings: OK

backup maximum age: OK (interval provided: 7 days, latest backup age: 19 hours, 56 minutes)

compression settings: OK

failed backups: OK (there are 0 failed backups)

minimum redundancy requirements: OK (have 8 backups, expected at least 1)

ssh: OK (PostgreSQL server)

pgespresso extension: OK

archive_mode: OK

archive_command: OK

continuous archiving: OK

archiver errors: FAILED (duplicates: 1)

Giulio Calacoci

unread,
Nov 28, 2016, 9:57:56 AM11/28/16
to pgbarman
Hi,

The error means that during the execution of the archiver, some
duplicate WAL file have been found.
This mean that for some reason Barman have received a WAL with the
same name twice.

This is something that happens when for any reason 2 servers are
sending 2 WALs with the same name to the same barman's incoming
directory.
The first one received by Barman is archived, the second one is stored
under the errors directory in your <barman_home>/<server_name> folder.

You should investigate on this issue to discover why Barman have
received 2 WALs with the same name, and understand if the correct one
have been archived.

We should probably add a section to the barman documentation about
archiving errors.

Regards
Giulio


On 23 November 2016 at 07:08, qguan via Barman, Backup and Recovery
> This email may contain or reference confidential information and is intended
> only for the individual to whom it is addressed. Please refrain from
> distributing, disclosing or copying this email and the information contained
> within unless you are the intended recipient. If you received this email in
> error, please notify us at le...@appannie.com immediately and remove it from
> your system.
>
> --
> --
> You received this message because you are subscribed to the "Barman for
> PostgreSQL" group.
> To post to this group, send email to pgba...@googlegroups.com
> To unsubscribe from this group, send email to
> pgbarman+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/pgbarman?hl=en?hl=en-GB
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Barman, Backup and Recovery Manager for PostgreSQL" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pgbarman+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Giulio Calacoci - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
giulio....@2ndQuadrant.it | www.2ndQuadrant.it

Diego Briceno

unread,
Oct 12, 2017, 5:40:55 AM10/12/17
to Barman, Backup and Recovery Manager for PostgreSQL
Hello Everyone

re: duplicates 

 I am at that point now... a 9.4 postgres barman backup has been working fine for months and suddenly we are getting *.duplicates.

-This is one master +2 slaves with no issues on postgres replication.

From barman :
 barman replication-status server1
Status of streaming clients for server 'server1':
  Current xlog location on master: C/1D40CEA0
  Number of streaming clients: 2

  1. Async standby
     Application name: walreceiver
     Sync stage      : 5/5 Hot standby (max)
     Communication   : TCP/IP
     IP Address      : xx.xx.xx.20 / Port: 58014 / Host: -
     User name       : replicator
     Current state   : streaming (async)
     WAL sender PID  : 28393
     Started at      : 2017-09-30 08:49:21.059010+00:00
     Sent location   : C/1D40CEA0 (diff: 0 B)
     Write location  : C/1D40CEA0 (diff: 0 B)
     Flush location  : C/1D40CEA0 (diff: 0 B)
     Replay location : C/1D40CEA0 (diff: 0 B)

  2. Async standby
     Application name: walreceiver
     Sync stage      : 5/5 Hot standby (max)
     Communication   : TCP/IP
     IP Address      : xx.xx.xx.120 / Port: 34744 / Host: -
     User name       : replicator
     Current state   : streaming (async)
     WAL sender PID  : 20906
     Started at      : 2017-09-30 07:54:38.565905+00:00
     Sent location   : C/1D40CEA0 (diff: 0 B)
     Write location  : C/1D40CEA0 (diff: 0 B)
     Flush location  : C/1D40CEA0 (diff: 0 B)
     Replay location : C/1D40CEA0 (diff: 0 B)


-Slaves had `archive on`and `archive command` lines #hashed out under (postgres.conf) so are not running.

- I have checked barman logs in search of why we are getting duplicates

2017-10-12 07:45:03,740 [3044] barman.wal_archiver INFO: Archiving server1/000000010000000C0000001C
2017-10-12 07:45:10,452 [3044] barman.wal_archiver INFO:        Error: 000000010000000C0000001C is already present in server server1. File moved to errors directory.

-Have checked for duplication postgres process/dbs = nothing

--------
How can I find the `source` of the duplicates? 

Really appreciate your help 

Diego/

Message has been deleted

Diego Briceno

unread,
Oct 12, 2017, 7:39:44 AM10/12/17
to Barman, Backup and Recovery Manager for PostgreSQL
As recommended I've checked duplicate files and compared the ssh auth file in search of the source

barman@barman:~/server1/errors$ ls -altrh

total 33M

drwxrwxr-x 8 barman barman 4.0K May 31 14:47 ..

-rw------- 1 barman barman  16M Oct 12 07:43 000000010000000C0000001B.20171012T074402Z.duplicate

-rw------- 1 barman barman  16M Oct 12 07:44 000000010000000C0000001C.20171012T074503Z.duplicate

drwxrwxr-x 2 barman barman 4.0K Oct 12 07:45 .

barman@barman:~/server1/errors$

----------------

Oct 12 07:42:35 barman sshd[2650]: Connection closed by 127.0.0.1 [preauth]

Oct 12 07:43:01 barman CRON[2654]: pam_unix(cron:session): session opened for user barman by (uid=0)

Oct 12 07:43:01 barman CRON[2654]: pam_unix(cron:session): session closed for user barman

Oct 12 07:43:03 barman sshd[2662]: Set /proc/self/oom_score_adj to 0

Oct 12 07:43:03 barman sshd[2662]: Connection from xx.xx.11.20 port 48113 on 172.31.7.130 port 22

Oct 12 07:43:03 barman sshd[2662]: Postponed publickey for barman from xx.xx.11.20 port 48113 ssh2 [preauth]

Oct 12 07:43:03 barman sshd[2662]: Accepted publickey for barman from xx.xx.11.20 port 48113 ssh2: RSA d1:4d:c5:20:f0:03:cb:e7:64:bc:90:b4:94:c7:f9:0f

Oct 12 07:43:03 barman sshd[2662]: pam_unix(sshd:session): session opened for user barman by (uid=0)

Oct 12 07:43:04 barman sshd[2662]: User child is on pid 2722

Oct 12 07:43:04 barman sshd[2722]: Starting session: command for barman from xx.xx.11.20 port 48113

Oct 12 07:43:04 barman sshd[2722]: Received disconnect from xx.xx.11.20: 11: disconnected by user

Oct 12 07:43:04 barman sshd[2662]: pam_unix(sshd:session): session closed for user barman

Oct 12 07:43:37 barman sshd[2744]: Set /proc/self/oom_score_adj to 0

Oct 12 07:43:40 barman sshd[2744]: Connection from 127.0.0.1 port 44802 on 127.0.0.1 port 22

Oct 12 07:43:40 barman sshd[2744]: Connection closed by 127.0.0.1 [preauth]

Oct 12 07:44:01 barman CRON[2757]: pam_unix(cron:session): session opened for user barman by (uid=0)

Oct 12 07:44:01 barman CRON[2757]: pam_unix(cron:session): session closed for user barman

Oct 12 07:44:15 barman sshd[2797]: Set /proc/self/oom_score_adj to 0

Oct 12 07:44:15 barman sshd[2797]: Connection from xx.xx.11.20 port 48120 on 172.31.7.130 port 22

Oct 12 07:44:15 barman sshd[2797]: Postponed publickey for barman from xx.xx.11.20 port 48120 ssh2 [preauth]

Oct 12 07:44:15 barman sshd[2797]: Accepted publickey for barman from xx.xx.11.20 port 48120 ssh2: RSA d1:4d:c5:20:f0:03:cb:e7:64:bc:90:b4:94:c7:f9:0f

Oct 12 07:44:15 barman sshd[2797]: pam_unix(sshd:session): session opened for user barman by (uid=0)

Oct 12 07:44:15 barman sshd[2797]: User child is on pid 2852

Oct 12 07:44:15 barman sshd[2852]: Starting session: command for barman from xx.xx.11.20 port 48120

Oct 12 07:44:15 barman sshd[2852]: Received disconnect from xx.xx.11.20: 11: disconnected by user

Oct 12 07:44:15 barman sshd[2797]: pam_unix(sshd:session): session closed for user barman

Oct 12 07:44:15 barman sshd[2855]: Set /proc/self/oom_score_adj to 0

Oct 12 07:44:15 barman sshd[2855]: Connection from xx.xx.11.20 port 48121 on 172.31.7.130 port 22

Oct 12 07:44:16 barman sshd[2855]: Postponed publickey for barman from xx.xx.11.20 port 48121 ssh2 [preauth]

Oct 12 07:44:16 barman sshd[2855]: Accepted publickey for barman from xx.xx.11.20 port 48121 ssh2: RSA d1:4d:c5:20:f0:03:cb:e7:64:bc:90:b4:94:c7:f9:0f

Oct 12 07:44:16 barman sshd[2855]: pam_unix(sshd:session): session opened for user barman by (uid=0)

Oct 12 07:44:16 barman sshd[2855]: User child is on pid 2910

Oct 12 07:44:16 barman sshd[2910]: Starting session: command for barman from 10.120.11.20 port 48121

Oct 12 07:44:16 barman sshd[2910]: Received disconnect from xx.xx.11.20: 11: disconnected by user

Oct 12 07:44:16 barman sshd[2855]: pam_unix(sshd:session): session closed for user barman

Oct 12 07:44:42 barman sshd[2943]: Set /proc/self/oom_score_adj to 0

Oct 12 07:44:42 barman sshd[2943]: Connection from 127.0.0.1 port 44829 on 127.0.0.1 port 22

Oct 12 07:44:42 barman sshd[2943]: Connection closed by 127.0.0.1 [preauth]

Oct 12 07:44:45 barman sshd[2988]: Set /proc/self/oom_score_adj to 0



The above tells me the two files come from my server1 *.*.11.20 , (barman is `172.31.7.130`)

But, if barman config & process have not been changed - restarted lately, Why server1 is sending more than 1 wal file creating duplicates?

Diego/

mohammad sherafat

unread,
Jul 6, 2020, 2:41:06 AM7/6/20
to Barman, Backup and Recovery Manager for PostgreSQL
Hi,
check create date of two files in errors and wals folders in barman, replace the newest, then restart replica.

--

Ayad Mohamed

unread,
Jul 6, 2020, 4:18:10 AM7/6/20
to pgba...@googlegroups.com
Delete the duplicates from the error folder and you are good to go.

--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages