failed backups using rsync

41 views
Skip to first unread message

Angelo Onofri

unread,
Nov 29, 2023, 6:30:11 AM11/29/23
to Barman, Backup and Recovery Manager for PostgreSQL
Hello all,

Sorry for the long email but I guess its needed.
I have some issues with barman configured for a rsync backup.

cluster patroni
I have a patroni cluster with 2 server and a proxy
I am using barman-wal-archive for  archive_command 
I tested with the option -t  DUMMY and it works fine 

the DB is not very big : about 14GB

barman
I used a configuration very similar to the template available with the installation of Barman
I tested the backup before adding data and I was working fine.
Below you can see the configuration

[ptest141v2]
description =  "Example of PostgreSQL Database (via SSH)"
;ssh_command = ssh postgres@pg
ssh_command = ssh -p 5021 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no post...@10.101.9.66 -q
;conninfo = host=pg user=barman dbname=postgres
conninfo = host=10.101.9.66 port=5000 user=barman dbname=postgres
backup_method = rsync
;COME DA  CONFIGURAZIONE TEMPLATE
; Incremental backup support: possible values are None (default), link or copy
;reuse_backup = link
; Identify the standard behavior for backup operations: possible values are
; exclusive_backup, concurrent_backup (default)
; concurrent_backup is the preferred method
backup_options = concurrent_backup

; Number of parallel workers to perform file copy during backup and recover
;parallel_jobs = 1

; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Continuous WAL archiving (via 'archive_command')
; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
archiver = on
;archiver_batch_size = 50
path_prefix = "/usr/lib/postgresql/14/bin"

I checked with barman check
barman check ptest141v2
Server ptest141v2:
        PostgreSQL: OK
        superuser or standard user with backup privileges: OK
        wal_level: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        backup minimum size: OK (0 B)
        wal maximum age: OK (no last_wal_maximum_age provided)
        wal size: OK (0 B)
        compression settings: OK
        failed backups: FAILED (there are 2 failed backups)
        minimum redundancy requirements: OK (have 0 backups, expected at least 0)
        ssh: OK (PostgreSQL server)
        systemid coherence: OK
        archive_mode: OK
        archive_command: OK
        continuous archiving: OK
        archiver errors: OK

I tested also switch-wal and it was able to  copy the wal on barman.

But ..I have the error below at the end of the backup ( and by the way is copying everything as it should do)

Copy step 5 of 5: [global] finished (duration: less than one second) copy remote pg_control file: /DBData/ptest141/global/pg_control
2023-11-29 11:31:36,270 [243318] barman.copy_controller INFO: Copy finished (safe before None)
2023-11-29 11:31:36,276 [243318] barman.backup_executor INFO: Copy done (time: 2 minutes, 15 seconds)
2023-11-29 11:31:36,278 [243318] barman.backup_executor INFO: This is the first backup for server ptest141v2
2023-11-29 11:31:36,297 [243318] barman.backup_executor INFO:   00000013000000030000009A from server ptest141v2 has been removed
2023-11-29 11:31:36,297 [243318] barman.backup_executor INFO: Asking PostgreSQL server to finalize the backup.
2023-11-29 11:31:36,299 [243318] barman.backup ERROR: Backup failed issuing stop backup command (native concurrent).
DETAILS: Connection lost, reconnection not allowed
 
Any idea of what I am doing wrong?

Below  barman diagnose

{
    "global": {
        "config": {
            "barman_home": "/var/lib/barman",
            "barman_user": "barman",
            "configuration_files_directory": "/etc/barman.d",
            "errors_list": [],
            "log_file": "/var/log/barman/barman.log",
            "log_level": "DEBUG"
        },
        "system_info": {
            "barman_ver": "3.4.0",
            "kernel_ver": "xxx 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux",
            "python_ver": "",
            "release": "Distributor ID:\tUbuntu\nDescription:\tUbuntu 22.04.3 LTS\nRelease:\t22.04\nCodename:\tjammy",
            "rsync_ver": "rsync  version 3.2.7  protocol version 31",
            "ssh_ver": "",
            "timestamp": "2023-11-29T12:05:52.939249+01:00"
        }
    },
    "servers": {
        "ptest141v2": {
            "backups": {
                "20231129T111950": {
                    "backup_id": "20231129T111950",
                    "backup_label": null,
                    "begin_offset": 40,
                    "begin_time": "2023-11-29T11:19:50.829148+01:00",
                    "begin_wal": "000000130000000300000099",
                    "begin_xlog": "3/99000028",
                    "compression": null,
                    "config_file": "/DBData/ptest141/postgresql.conf",
                    "copy_stats": {
                        "analysis_time": 1.135665,
                        "analysis_time_per_item": {
                            "pgdata": 1.135665
                        },
                        "copy_time": 182.070253,
                        "copy_time_per_item": {
                            "pg_control": 0.393027,
                            "pgdata": 181.672892
                        },
                        "number_of_workers": 1,
                        "serialized_copy_time": 182.030243,
                        "serialized_copy_time_per_item": {
                            "pg_control": 0.393027,
                            "pgdata": 181.637216
                        },
                        "total_time": 183.258903
                    },
                    "deduplicated_size": null,
                    "end_offset": null,
                    "end_time": null,
                    "end_wal": null,
                    "end_xlog": null,
                    "error": "failure issuing stop backup command (native concurrent) (Connection lost, reconnection not allowed)",
                    "hba_file": "/DBData/ptest141/pg_hba.conf",
                    "ident_file": "/DBData/ptest141/pg_ident.conf",
                    "included_files": [
                        "/DBData/ptest141/postgresql.base.conf"
                    ],
                    "mode": "rsync-concurrent",
                    "pgdata": "/DBData/ptest141",
                    "server_name": "ptest141v2",
                    "size": null,
                    "status": "FAILED",
                    "systemid": "7297665707889587131",
                    "tablespaces": null,
                    "timeline": 19,
                    "version": 140009,
                    "xlog_segment_size": 16777216
                },
                "20231129T112918": {
                    "backup_id": "20231129T112918",
                    "backup_label": null,
                    "begin_offset": 40,
                    "begin_time": "2023-11-29T11:29:18.705449+01:00",
                    "begin_wal": "00000013000000030000009B",
                    "begin_xlog": "3/9B000028",
                    "compression": null,
                    "config_file": "/DBData/ptest141/postgresql.conf",
                    "copy_stats": {
                        "analysis_time": 1.686852,
                        "analysis_time_per_item": {
                            "pgdata": 1.686852
                        },
                        "copy_time": 135.502667,
                        "copy_time_per_item": {
                            "pg_control": 0.319637,
                            "pgdata": 135.178352
                        },
                        "number_of_workers": 1,
                        "serialized_copy_time": 135.469481,
                        "serialized_copy_time_per_item": {
                            "pg_control": 0.319637,
                            "pgdata": 135.149844
                        },
                        "total_time": 137.250439
                    },
                    "deduplicated_size": null,
                    "end_offset": null,
                    "end_time": null,
                    "end_wal": null,
                    "end_xlog": null,
                    "error": "failure issuing stop backup command (native concurrent) (Connection lost, reconnection not allowed)",
                    "hba_file": "/DBData/ptest141/pg_hba.conf",
                    "ident_file": "/DBData/ptest141/pg_ident.conf",
                    "included_files": [
                        "/DBData/ptest141/postgresql.base.conf"
                    ],
                    "mode": "rsync-concurrent",
                    "pgdata": "/DBData/ptest141",
                    "server_name": "ptest141v2",
                    "size": null,
                    "status": "FAILED",
                    "systemid": "7297665707889587131",
                    "tablespaces": null,
                    "timeline": 19,
                    "version": 140009,
                    "xlog_segment_size": 16777216
                }
            },
            "config": {
                "active": true,
                "archiver": true,
                "archiver_batch_size": 0,
                "backup_compression": null,
                "backup_compression_format": null,
                "backup_compression_level": null,
                "backup_compression_location": null,
                "backup_compression_workers": null,
                "backup_directory": "/var/lib/barman/ptest141v2",
                "backup_method": "rsync",
                "backup_options": "concurrent_backup",
                "bandwidth_limit": null,
                "barman_home": "/var/lib/barman",
                "barman_lock_directory": "/var/lib/barman",
                "basebackup_retry_sleep": 30,
                "basebackup_retry_times": 0,
                "basebackups_directory": "/var/lib/barman/ptest141v2/base",
                "check_timeout": 30,
                "compression": null,
                "conninfo": "host=10.101.9.66 port=5000 user=barman dbname=postgres",
                "create_slot": "manual",
                "custom_compression_filter": null,
                "custom_compression_magic": null,
                "custom_decompression_filter": null,
                "description": "Example of PostgreSQL Database (via SSH)",
                "disabled": false,
                "errors_directory": "/var/lib/barman/ptest141v2/errors",
                "forward_config_path": false,
                "immediate_checkpoint": false,
                "incoming_wals_directory": "/var/lib/barman/ptest141v2/incoming",
                "last_backup_maximum_age": null,
                "last_backup_minimum_size": null,
                "last_wal_maximum_age": null,
                "max_incoming_wals_queue": null,
                "minimum_redundancy": 0,
                "msg_list": [],
                "name": "ptest141v2",
                "network_compression": false,
                "parallel_jobs": 1,
                "path_prefix": "/usr/lib/postgresql/14/bin",
                "post_archive_retry_script": null,
                "post_archive_script": null,
                "post_backup_retry_script": null,
                "post_backup_script": null,
                "post_delete_retry_script": null,
                "post_delete_script": null,
                "post_recovery_retry_script": null,
                "post_recovery_script": null,
                "post_wal_delete_retry_script": null,
                "post_wal_delete_script": null,
                "pre_archive_retry_script": null,
                "pre_archive_script": null,
                "pre_backup_retry_script": null,
                "pre_backup_script": null,
                "pre_delete_retry_script": null,
                "pre_delete_script": null,
                "pre_recovery_retry_script": null,
                "pre_recovery_script": null,
                "pre_wal_delete_retry_script": null,
                "pre_wal_delete_script": null,
                "primary_conninfo": null,
                "primary_ssh_command": null,
                "recovery_options": "",
                "recovery_staging_path": null,
                "retention_policy": null,
                "retention_policy_mode": "auto",
                "reuse_backup": null,
                "slot_name": null,
                "snapshot_disks": null,
                "snapshot_gcp_project": null,
                "snapshot_instance": null,
                "snapshot_provider": null,
                "snapshot_zone": null,
                "ssh_command": "ssh -p 5021 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no post...@10.101.9.66 -q",
                "streaming_archiver": false,
                "streaming_archiver_batch_size": 0,
                "streaming_archiver_name": "barman_receive_wal",
                "streaming_backup_name": "barman_streaming_backup",
                "streaming_conninfo": "host=10.101.9.66 port=5000 user=barman dbname=postgres",
                "streaming_wals_directory": "/var/lib/barman/ptest141v2/streaming",
                "tablespace_bandwidth_limit": null,
                "wal_retention_policy": "main",
                "wals_directory": "/var/lib/barman/ptest141v2/wals"
            },
            "status": {
                "archive_command": "  barman-wal-archive 10.101.9.66 ptest141v2 %p ",
                "archive_mode": "on",
                "archive_timeout": 0,
                "archived_count": 42,
                "checkpoint_timeout": 300,
                "config_file": "/DBData/ptest141/postgresql.conf",
                "current_archived_wals_per_second": 9.483588293059674e-05,
                "current_lsn": "3/9D000148",
                "current_size": 8383936097.0,
                "current_xlog": "00000013000000030000009D",
                "data_checksums": "on",
                "data_directory": "/DBData/ptest141",
                "failed_count": 339,
                "has_backup_privileges": true,
                "hba_file": "/DBData/ptest141/pg_hba.conf",
                "hot_standby": "on",
                "ident_file": "/DBData/ptest141/pg_ident.conf",
                "included_files": [
                    "/DBData/ptest141/postgresql.base.conf"
                ],
                "is_archiving": true,
                "is_in_recovery": false,
                "is_superuser": true,
                "last_archived_time": "2023-11-29T10:55:05.309974+00:00",
                "last_archived_wal": "00000013000000030000009C",
                "last_failed_time": "2023-11-28T22:51:00.668595+00:00",
                "last_failed_wal": "00000013000000030000008F",
                "max_replication_slots": "10",
                "max_wal_senders": "10",
                "postgres_systemid": "7297665707889587131",
                "replication_slot": null,
                "replication_slot_support": true,
                "server_txt_version": "14.9",
                "stats_reset": "2023-11-24T08:04:45.866678+00:00",
                "synchronous_standby_names": [
                    "patronitest-n2"
                ],
                "version_supported": true,
                "wal_compression": "off",
                "wal_keep_size": "128MB",
                "wal_level": "replica",
                "xlog_segment_size": 16777216
            },
            "system_info": {
                "kernel_ver": "xxx 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux",
                "python_ver": "",
                "release": "Distributor ID:\tUbuntu\nDescription:\tUbuntu 22.04.3 LTS\nRelease:\t22.04\nCodename:\tjammy",
                "rsync_ver": "rsync  version 3.2.7  protocol version 31",
                "ssh_ver": ""
            },
            "wals": {
                "last_archived_wal_per_timeline": {
                    "00000013": {
                        "compression": null,
                        "name": "00000013000000030000009C",
                        "size": 16777216,
                        "time": 1701255303.3355255
                    }
                }
            }
        }
    }
}

Mike Wallace

unread,
Nov 29, 2023, 7:14:10 AM11/29/23
to pgba...@googlegroups.com
Hi Angelo,

The error "Backup failed issuing stop backup command (native concurrent)." usually occurs because the PostgreSQL connection used by Barman to coordinate the backup was lost at some point during the backup.

This connection is used by Barman to tell PostgreSQL about the start and stop of the backup copy - the PostgreSQL low-level backup API requires that the backup is stopped using the same connection which was used to start the backup, so if the connection is lost while the backup is ongoing then the backup cannot be completed. This is described in more detail in issue #665 [1].

Given you are using Barman 3.4.0 it is possible that the connection is being killed due to exceeding the value of `idle_session_timeout` - this issue was resolved in Barman 3.5.0 [2] so upgrading to a more recent version of Barman may help here.

If this doesn't help then you can try reducing the value of tcp_keepalives_idle in PostgreSQL which will cause TCP keepalives to be sent more frequently on the idle connection held open by Barman - this should reduce the chances of devices on the network killing the connection due to inactivity.

Hope this helps,

Mike


--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pgbarman/c720292a-2c55-4306-9a2a-869c1c915a65n%40googlegroups.com.

Angelo Onofri

unread,
Nov 29, 2023, 10:02:10 AM11/29/23
to pgba...@googlegroups.com
Hello all,

Unfortunately  I have the same error even with the version 3.9
Is it possible that ubuntu 22 is not supported?

I am happy to test other options. Thank you for your help!

Angelo Onofri

Angelo Onofri

unread,
Nov 29, 2023, 11:02:58 AM11/29/23
to Barman, Backup and Recovery Manager for PostgreSQL
Hello,

how can I modify   tcp_keepalives_idle  on Patroni? it seems different from the configuration on a single node.

Thanks,

Angelo Onofri

unread,
Nov 30, 2023, 11:27:00 AM11/30/23
to pgba...@googlegroups.com
Hello all,

A quick update.
I deleted some DBs on the istance and after that the backup was fine.

I discovered that there is nothing wrong with the configuration in barman but it is more related to the one in haproxy.
Playing with some parameters in it  I am now able to backup without errors.

I need to understand why.
A good book or website about it? :-)

Thank you again for your help

ciao,
Angelo






You received this message because you are subscribed to a topic in the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pgbarman/d-JaP35ztNI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pgbarman+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pgbarman/66e24b18-78bf-45db-baff-4173d509ee0cn%40googlegroups.com.

Martin Marques

unread,
Dec 3, 2023, 12:55:20 PM12/3/23
to pgba...@googlegroups.com
Hi Angelo,

> A quick update.
> I deleted some DBs on the istance and after that the backup was fine.

Did deleting those DBs shrink the instance PGDATA size significantly?
Maybe Barman was able to rsync the whole instance before the any need
for keepalive to be sent to the idle connections.

> I discovered that there is nothing wrong with the configuration in barman but it is more related to the one in haproxy.

Could you expand a bit on why you believe this to be related to
haproxy? Are you routing the conninfo through haproxy? Said
differently, what is running on host=10.101.9.66 port=5000?

This might be in part responsible for your disconnections.

In any case the recommendations Michael gave you should help with the
issue. Regarding that.....

> how can I modify tcp_keepalives_idle on Patroni? it seems different from the configuration on a single node.

You modify it the same way you would modify any other postgres
configuration parameter via `patronictl edit-config` [1].

[1]: https://patroni.readthedocs.io/en/latest/dynamic_configuration.html

Cheers, Martín

Angelo Onofri

unread,
Dec 4, 2023, 8:59:16 AM12/4/23
to pgba...@googlegroups.com
hello Martin,

Did deleting those DBs shrink the instance PGDATA size significantly?
Maybe Barman was able to rsync the whole instance before the any need
for keepalive to be sent to the idle connections.
A:
yes from about 10GB to 1GB.
At the beginning, when the cluster was empty (only postgres database), I could run the backups without issues

Could you expand a bit on why you believe this to be related to
haproxy? Are you routing the conninfo through haproxy? Said
differently, what is running on host=10.101.9.66 port=5000?
A:
this is test I prepared to setup the configuration rsync so I have barman and proxy on the same VM
I resolved the error adding on default in haproxy.cfg a timeout of 30m ( 30 minutes)
haproxy.cfg below:
..
defaults
        log     global
        mode    tcp
#       option  httplog
        retries 2
        option  dontlognull
        timeout connect 4s
        timeout client  30m
        timeout server  30m
        timeout check   5s
..
frontend ft_postgresql
        bind *:5000 interface ens18
        option clitcpka
        acl client_allowed src -f /etc/haproxy/whitelist
        tcp-request connection reject if !client_allowed
        default_backend bk_db
backend bk_db
        option  httpchk
        option  srvtcpka
        server patronitest-n1 10.101.9.64:5432 maxconn 800 check port 8008
        server patronitest-n2 10.101.9.65:5432 maxconn 800 check port 8008

I have seen on the Internet that someone is using listen instead of backend and frontend but I haven't tested yet.
I dont want to go off the topic here but I have seen recent pages like the one below using the option "listen":

global
..
default
..
listen postgres_write
    bind *:5432
    mode            tcp
    option httpchk
    http-check expect status 200
    default-server inter 10s fall 3 rise 3 on-marked-down shutdown-sessions
    server postgresql_1 postgresql_1_ip:5432 check port 8008
    server postgresql_2 postgresql_2_ip:5432 check port 8008

listen postgres_read
    bind *:5433
    mode            tcp
    balance leastconn
    option pgsql-check user admin
    default-server inter 10s fall 3 rise 3 on-marked-down shutdown-sessions
    server postgresql_1 postgresql_1_ip:5432
    server postgresql_2 postgresql_2_ip:5432



A:
ok I will check also edit-config and add tcp_keepalives_idle on postresql parameters.
I still have to understand many things about patroni.


--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages