Hello Team,
This is regarding our Barman backup host, which seems like hung
for 5 days with no backup, streaming, cron etc happening.
We have checkmk monitoring that notifies us if any one of the
parameters of Barman check fail, but we havent hot any events from
checkmk as barman check was showing all OK and no FAILED.
What do you explain this situation in Barman.
To provide further details about the issue:
- We user Data Domain as barman backup home for storage.
- On 5th November data domain endpoints is not connect but the
mount was still up on the host. We still dont know from the data
domain side what caused this issue.
- Barman log has errors : [2800226] barman.cli ERROR: [Errno
107] Transport endpoint is not connected:
'/nfs/DDbackup/pgsql-****-****/.active-model.auto'
- If the end point was missing barman should fail completely and
the checks should fail as well, Which in turn would have helped
us to identify the issue immediately.
- The only way we figured out that there was an issue, on
November 11th one of barman backup maximum age parameter failed
saying that weeks backup dint happen.
- Upon checking why the backups failed etc, I went to barman
logs and found out that all this time until the endpoint issue
resolved barman was not doing the barman checks or barman check
was hung at a state
Below are the configuration details:
- Barman version : 3.10.0 Barman by EnterpriseDB
(www.enterprisedb.com)
- Postgresql version: (PostgreSQL) 16.9
- Number of hosts configured: 26
- Red Hat Enterprise Linux release 8.10
Thanks,
Sri