> To resolve this issue we stop the standby cluster, remove the barman replication slot from PGDATA/pg_replslot, restart the standby and wait for a restartpoint to clear the WAL files from the standby. Is this the expected behaviour or have we missed a step?
That's the expected behavior in this case.
Based on your description, you are streaming WALs from the primary to the Barman server using a replication slot.
While Barman is able to create the replication slot in the node it's currently configured to stream WALs from, Barman is not a cluster-aware tool.
The models implementation allows you to integrate Barman with HA tools, but it doesn't do all the homework as Barman is aware of a single node at a given point in time.
So, as part of your automation you should have something like this:
- Run barman config-switch to point the Barman configuration to the new primary.
- Run pg_drop_replication_slot in the old primary to drop the slot which is no longer used. This is preferred over the approach you described because:
- It is executed against a running Postgres instance, so no need for restarts.
- The task is performed by Postgres in the way it expects to do that, so it is less error prone.
Best regards,
Israel.