pg_receivewal error

2,062 views
Skip to first unread message

Dan N

unread,
Jul 26, 2021, 7:40:13 AM7/26/21
to Barman, Backup and Recovery Manager for PostgreSQL
Hi,

so I had to reconfigure some parameters in PostgreSQL and restert the database.
After that I immediately got alert that Barman failed.

replication slot: FAILED (slot 'barman' not initialised: is 'receive-wal' running?)

I tried dropping the slot and creating a new one, and 
barman receive-wal pgsql gives me this error: 
Barman:~# barman receive-wal pgsql
Starting receive-wal for server pgsql
pgsql: could not change directory to "/root": Permission denied
pgsql: pg_receivewal: starting log streaming at 4AF/6000000 (timeline 37)
pgsql: pg_receivewal: error: unexpected termination of replication stream: ERROR:  requested WAL segment 00000025000004AF00000006 has already been removed
pgsql: pg_receivewal: error: disconnected

Luca Ferrari

unread,
Jul 26, 2021, 8:49:46 AM7/26/21
to Barman, Backup and Recovery Manager for PostgreSQL
On Mon, Jul 26, 2021 at 1:40 PM Dan N <anonyopse...@gmail.com> wrote:
> I tried dropping the slot and creating a new one, and
> barman receive-wal pgsql gives me this error:
> Barman:~# barman receive-wal pgsql
> Starting receive-wal for server pgsql
> pgsql: could not change directory to "/root": Permission denied

This first error is because you are running as root, not as the barman
dedicated user.
But this is not so important, according to me.

> pgsql: pg_receivewal: starting log streaming at 4AF/6000000 (timeline 37)
> pgsql: pg_receivewal: error: unexpected termination of replication stream: ERROR: requested WAL segment 00000025000004AF00000006 has already been removed
> pgsql: pg_receivewal: error: disconnected

Seems when you deleted the slot and created a new one the server has
recycled a wal segment and the backup machine didn't catch up fast
enough.
In short, I suspect your backup is screwed and need to be done over again.

Someone from the barman team could comment better on this situation.

Luca

Dan N

unread,
Jul 26, 2021, 8:55:28 AM7/26/21
to Barman, Backup and Recovery Manager for PostgreSQL
I got that error message before I deleted and re-created an slot, if that helps.
Let's see if someone from barman team will help me out. 

Thanks for reply!

Abhijit Menon-Sen

unread,
Jul 26, 2021, 11:40:50 AM7/26/21
to pgba...@googlegroups.com
On Mon, Jul 26, 2021 at 6:25 PM Dan N <anonyopse...@gmail.com> wrote:
>
> I got that error message before I deleted and re-created an slot, if that helps.
> Let's see if someone from barman team will help me out.

If there's WAL missing, there's unfortunately not much the barman team
or anyone else can do to help.

When you saw the «replication slot: FAILED (slot 'barman' not
initialised: is 'receive-wal' running?)», you should have probably
started receive-wal, not dropped and recreated the slot (which could
indeed cause WAL to be lost).

Now I think you'll need to start over with `barman receive-wal
--reset` and take a new backup.

-- Abhijit

Luca Ferrari

unread,
Jul 26, 2021, 11:55:24 AM7/26/21
to Barman, Backup and Recovery Manager for PostgreSQL
On Mon, Jul 26, 2021 at 2:55 PM Dan N <anonyopse...@gmail.com> wrote:
>
> I got that error message before I deleted and re-created an slot, if that helps.

As far as I remember, PostgreSQL double check for existings slots in
order to prevent you to dummy destroying them (e.g., set
max_replication_slots to 0). Therefore, I suspect you manually called
pg_drop_replication_slot(), then the server recycled too much WAL (or
at least one!) segments, and boom.
I think you either waited too much to re-create the slot, or you did
the deletion near to a checkpoint that recycled the wals.

I'm not able to see how the message could have been available before
you deleted the slot, unless you changed the slot name in the barman
configuration before creating the slot on PostgreSQL side. Could you
please provide any other information about what you did?

By the way, as already confirmed by Abhikit, you need to start a
backup from scratch.

Luca

Dan N

unread,
Jul 27, 2021, 2:17:09 AM7/27/21
to Barman, Backup and Recovery Manager for PostgreSQL
Thanks, 
barman receive-wal pgsql --reset
Nothing to do. Position of receive-wal is aligned.
Does this mean it will start working fine or ? 

Dan N

unread,
Jul 27, 2021, 2:50:34 AM7/27/21
to Barman, Backup and Recovery Manager for PostgreSQL
What I did is changed 
max_locks_per_transaction from 64 to 128 and restarted the PostreSQL cluster from patroni.
And then I got the alert that Barman is not receiving wal.
I tried everything from starting the wal again aka barman receive-wal pgsql
and couple other suggestion on found on net and nothing helped.
As I replied to Abhijit; `barman receive-wal --reset` is giving me "Nothing to do. Position of receive-wal is aligned."

D

Abhijit Menon-Sen

unread,
Jul 30, 2021, 12:05:11 AM7/30/21
to pgba...@googlegroups.com
On Tue, Jul 27, 2021 at 12:20 PM Dan N <anonyopse...@gmail.com> wrote:
>
> What I did is changed
> max_locks_per_transaction from 64 to 128 and restarted the PostreSQL cluster from patroni.
> And then I got the alert that Barman is not receiving wal.
> I tried everything from starting the wal again aka barman receive-wal pgsql
> and couple other suggestion on found on net and nothing helped.

This doesn't really explain the problem fully. Restarting the server
after changing max_locks_per_transaction should not have caused any
more of a disruption than just having to restart `barman receive-wal`,
which `barman cron` would have done automatically for you anyway.

> As I replied to Abhijit; `barman receive-wal --reset` is giving me "Nothing to do. Position of receive-wal is aligned."

I tried to reproduce your situation while waiting for breakfast to
finish cooking:

barman@unarmed:~$ barman receive-wal --stop uptight
Stopped process receive-wal(6196)
barman@unarmed:~$ barman receive-wal --drop-slot uptight
Dropping physical replication slot 'backup_unarmed' on server 'uptight'
Replication slot 'backup_unarmed' dropped

(Here I went to Postgres and set max_wal_size to a lower value and
executed some checkpoint/pg_switch_wal to advance the WAL position and
recycle old segments.)

barman@unarmed:~$ barman receive-wal --create-slot uptight
Creating physical replication slot 'backup_unarmed' on server 'uptight'
Replication slot 'backup_unarmed' created
barman@unarmed:~$ barman receive-wal uptight
Starting receive-wal for server uptight
uptight: pg_receivewal.orig: starting log streaming at 0/66000000 (timeline 1)
uptight: pg_receivewal.orig: unexpected termination of replication
stream: ERROR: requested WAL segment 000000010000000000000066 has
already been removed
uptight: pg_receivewal.orig: disconnected
ERROR: ArchiverFailure:pg_receivexlog terminated with error code: 1

Right, this is what I expected: I dropped the slot and recycled WAL,
so barman receive-wal can't start at the same place.

barman@unarmed:~$ barman receive-wal --reset uptight
Resetting receive-wal directory status
Creating status file
/var/lib/barman/uptight/streaming/000000010000000000000081.partial
barman@unarmed:~$ barman receive-wal uptight
Starting receive-wal for server uptight
uptight: pg_receivewal.orig: starting log streaming at 0/81000000 (timeline 1)

…and we're back to normal. So there's definitely something more that
must have happened on your server, but I have no idea what.

You could try removing whatever is in your streaming/ directory and
run --reset again and see if that helps restart receive-wal.

-- Abhijit
Reply all
Reply to author
Forward
0 new messages