Backup via rsync stalls indefinitely

244 views
Skip to first unread message

Brandan Lennox

unread,
Aug 7, 2017, 5:51:21 PM8/7/17
to Barman, Backup and Recovery Manager for PostgreSQL
Hi,

I have two servers configured with barman 2.2, A and B. I'm able to use "backup_method = postgres" successfully for both servers, and I'm able to use "backup_method = rsync" and "reuse_backup = link" with server A, but not with server B. It stalls indefinitely until rsync times out. I see this message in barman.log:

2017-07-24 16:16:30,052 [5951] barman.copy_controller WARNING: Failure executing rsync on remote PGDATA directory: /var/lib/pgsql/9.6/data/ (attempt 0)

2017-07-24 16:16:30,053 [5951] barman.copy_controller WARNING: Retrying in 30 seconds


I don't see any errors on server B itself in pg_log. I see rsync processes on the barman server and "rsync sender" processes on server B, but they never consume CPU or RAM or IO according to top.

Servers A and B are identical installations of Postgres 9.6.2 on CentOS 7.0. WAL archiving seems to be working on server B (the one that's failing to backup):

        last_archived_time: 2017-08-07 20:47:54.728929+00:00

        last_archived_wal: 00000001000000C4000000B1

        last_backup_maximum_age: 3 days (latest backup: 18 hours, 29 minutes, 49 seconds )

        last_failed_time: 2017-07-19 19:32:41.399935+00:00

        last_failed_wal: 00000001000000B200000093


Configuration is the same for both servers:

archiver = on

backup_method = rsync

reuse_backup = link


Server A is about 40 GB. Server B is about 80 GB.

Please let me know anything else that will help diagnose the problem.

Thanks!

Brandan Lennox

unread,
Aug 9, 2017, 11:50:32 AM8/9/17
to Barman, Backup and Recovery Manager for PostgreSQL
Eventually I tried running the raw rsync command that Barman was running (by looking at top and /proc while it was running) and adding "-vvv -n", and I noticed it was still hanging even when it wasn't trying to send the files themselves. This thread suggested that a similar problem was because of jumbo frames, so I set my MTU to 1500 on both machines, and I'm now seeing the files transfer correctly.

For reference, something like:

# ifconfig eth0 mtu 1500


on both servers to test, and setting up permanent config in /etc/dhcp (depending on your OS).
Reply all
Reply to author
Forward
0 new messages