Basic Backup/Recover works, not able to get PITR working.. and just generally confused.

230 views
Skip to first unread message

Light O'Matic

unread,
Jun 21, 2017, 5:53:48 PM6/21/17
to Barman, Backup and Recovery Manager for PostgreSQL


Hi, so I've set up barman on my backup machine (vm), and set up an instance of postgres 9.5 on another machine as a test "production" server..  And I've also got postgres 9.5 set up on the backup machine as an instance into which I can restore things (also to provide the binaries that barman requires for WAL streaming).

And I've set everything up for a streaming backup, I've also set things as synchronous just to see how well that would work out.. though, if it slows performance I might go back to async.  I also set up an ssh connection because some commands didn't work without it, also I realized I couldn't restore back to the original server without it.

But anyway, it all works.. I can run backups,  and then after I run backups, I see the wal count go up on those backups as I write to the database.  If I do a recover, it recovers my backup.  Eg:

barman list-backup pg1
pg1 20170621T140218 - Wed Jun 21 14:02:18 2017 - Size: 2.4 GiB - WAL Size: 8.2 GiB
pg1 20170621T124442 - Wed Jun 21 12:44:42 2017 - Size: 2.4 GiB - WAL Size: 32.0 MiB

barman show-backup pg1 20170621T140218
Backup 20170621T140218:
  Server Name            : pg1
  Status                 : DONE
  PostgreSQL Version     : 90507
  PGDATA directory       : /var/lib/postgresql/9.5/main

  Base backup information:
    Disk usage           : 2.4 GiB (2.4 GiB with WALs)
    Incremental size     : 2.4 GiB (-0.00%)
    Timeline             : 1
    Begin WAL            : 000000010000000000000043
    End WAL              : 000000010000000000000043
    WAL number           : 1
    Begin time           : 2017-06-21 14:02:20-04:00
    End time             : 2017-06-21 14:02:18.258312-04:00
    Begin Offset         : 40
    End Offset           : 0
    Begin XLOG           : 0/43000028
    End XLOG             : 0/44000000

  WAL information:
    No of files          : 524
    Disk usage           : 8.2 GiB
    WAL rate             : 544.24/hour
    Last available       : 00000001000000020000004F

  Catalog information:
    Retention Policy     : VALID
    Previous Backup      : 20170621T124442
    Next Backup          : - (this is the latest base backup)

Starting at 3PM, I started a cron job that, once per minute, inserts a row containing the current time, so I would know what time was restored.  So, running:

barman recover --target-time "2017-06-21 15:20:00" pg1 20170621T140218 ~barman/recover

Actually, it is a bit more complicated than that.. since I am not restoring on the original machine, but to a postgres instance on the barman machine, as root I have to do the following:

root@bareos-salt:/var/lib/barman# service postgresql stop
root@bareos-salt:/var/lib/barman# chown -R barman recover

Now I run the barman recover command as user barman, and set the postgresql.conf data_directory to point to ~barman/recover.  Once that finishes, I start up the server.


root@bareos-salt:/var/lib/barman# chown -R postgres recover
root@bareos-salt:/var/lib/barman# service postgresql start

It takes quite a few minutes for postgres to come up.. but it does.  And I would expect the table to show all the rows from 3  to 3:20PM.  But it doesn't.. my newest record in the recover is 3PM.

So, my question is, what's the correct procedure for PITR with the streaming setup?
And, am I going about it the wrong way with the permissions and such?

Should I even bother with this streaming setup?  It seemed a more reliable way of doing things, and I was enticed by the immediacy of the backup (we don't have huge write volume usually, so the WAL turnover is fairly slow).  But if rsync works better I'll use it. 

Same question with sync vs async streaming?

Also, I tried recovering back to the same server I was backing up.. and it worked (but not with PITR.. when I tried that it broke the postgres instance).  But I realized after I had done that, that it was probably a bad idea because now I'd effectively put the postgres instance back in the past, while barman was in the now.. therefore the wal numbers would be all screwed up...  So, can someone explain to me what the correct procedure would be to resume backups after a recover?

Thanks!
Ben

PS:  In case this helps:

barman@bareos-salt:~$ barman check pg1
Server pg1:
        PostgreSQL: OK
        superuser: OK
        PostgreSQL streaming: OK
        wal_level: OK
        replication slot: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (interval provided: 2 days, latest backup age: 3 hours, 49 minutes, 4 seconds)
        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 2 backups, expected at least 1)
        pg_basebackup: OK
        pg_basebackup compatible: OK
        pg_basebackup supports tablespaces mapping: OK
        pg_receivexlog: OK
        pg_receivexlog compatible: OK
        receive-wal running: OK
        archiver errors: OK

barman@bareos-salt:~$ barman show-server pg1
Server pg1:
        active: True
        archiver: False
        archiver_batch_size: 0
        backup_directory: /vtl/backup/postgres/pg1/backup
        backup_method: postgres
        backup_options: BackupOptions(['concurrent_backup'])
        bandwidth_limit: None
        barman_home: /var/lib/barman
        barman_lock_directory: /var/lib/barman
        basebackup_retry_sleep: 30
        basebackup_retry_times: 0
        basebackups_directory: /vtl/backup/postgres/pg1/basebackups
        check_timeout: 30
        compression: None
        config_file: /etc/postgresql/9.5/main/postgresql.conf
        connection_error: None
        conninfo: host=pg1.sweetsams.com port=5432 dbname=postgres user=streaming_barman password=haveanother1
        current_size: 4703029932
        current_xlog: 000000010000000200000050
        custom_compression_filter: None
        custom_decompression_filter: None
        data_directory: /var/lib/postgresql/9.5/main
        description: pg1.sweetsams.com
        disabled: False
        errors_directory: /vtl/backup/postgres/pg1/backup/errors
        hba_file: /etc/postgresql/9.5/main/pg_hba.conf
        ident_file: /etc/postgresql/9.5/main/pg_ident.conf
        immediate_checkpoint: False
        incoming_wals_directory: /vtl/backup/postgres/pg1/backup/incoming
        is_superuser: True
        last_backup_maximum_age: 2 days (latest backup: 3 hours, 49 minutes, 39 seconds )
        minimum_redundancy: 1
        msg_list: []
        name: pg1
        network_compression: False
        path_prefix: None
        pg_basebackup_bwlimit: True
        pg_basebackup_compatible: True
        pg_basebackup_installed: True
        pg_basebackup_path: /usr/bin/pg_basebackup
        pg_basebackup_tbls_mapping: True
        pg_basebackup_version: 9.5.7
        pg_receivexlog_compatible: True
        pg_receivexlog_installed: True
        pg_receivexlog_path: /usr/bin/pg_receivexlog
        pg_receivexlog_supports_slots: True
        pg_receivexlog_synchronous: True
        pg_receivexlog_version: 9.5.7
        pgespresso_installed: False
        post_archive_retry_script: None
        post_archive_script: None
        post_backup_retry_script: None
        post_backup_script: None
        pre_archive_retry_script: None
        pre_archive_script: None
        pre_backup_retry_script: None
        pre_backup_script: None
        recovery_options: RecoveryOptions([])
        replication_slot: Record(slot_name='barman', active=True, restart_lsn='2/50E66E88')
        replication_slot_support: True
        retention_policy: RECOVERY WINDOW OF 36 MONTHS
        retention_policy_mode: auto
        reuse_backup: None
        server_txt_version: 9.5.7
        slot_name: barman
        ssh_command: ssh postgres@pg1
        streaming: True
        streaming_archiver: True
        streaming_archiver_batch_size: 0
        streaming_archiver_name: barman_receive_wal
        streaming_backup_name: pg1_streaming_backup
        streaming_conninfo: host=pg1.sweetsams.com port=5432 dbname=postgres user=streaming_barman password=haveanother1
        streaming_supported: True
        streaming_wals_directory: /vtl/backup/postgres/pg1/streaming_wals
        synchronous_standby_names: ['barman_receive_wal']
        systemid: 6431646314206615116
        tablespace_bandwidth_limit: None
        timeline: 1
        wal_level: hot_standby
        wal_retention_policy: MAIN
        wals_directory: /vtl/backup/postgres/pg1/wals
        xlogpos: 2/50E66E88








Light O'Matic

unread,
Jun 22, 2017, 5:59:02 PM6/22/17
to Barman, Backup and Recovery Manager for PostgreSQL

Hi again, hope someone's listening...
So PITR is working for me now.  Actually I'm not sure why it wasn't working before, I think I must have done something wrong with the settings earlier and it affected my backup.   For example, at one point I tried to recover to a running postgres instance into the data directory that was in use.. I now understand that I can't do that.  I have to recover into an empty directory, with postgres not running.. change permissions on the files (if not via ssh), and start postgres.. then wait for postgres to recover.  Is that right?

I still have a few questions:

1. If I have a problem with my server, let's say either the server disk crashed.. or maybe I accidentally modified data and need to go back.  Ok, so I recover my data from an earlier time and restart the server.  But now, the numbering of the WAL files will overlap existing backup WAL's, won't it?  What if I restore from a time before my last full backup?? (For example, let's say I make a bad mistake at 6PM, go home, backup runs at midnight, and at 8AM I realize I need to recover to before 6PM)...  Then, is there anything I need to do with barman to get things synchronized and have my backups make sense again?

2.  WAL files take up a lot of space, so we don't necessarily want to keep them all.  Let's say that I want to keep all WAL's for 3 months.. with a full backup every week.  Then after the 3 months, I don't care about the WAL's anymore I just want the full backups.  I know that barman can do this but not totally confident of the right settings.  Will barman delete the unwanted WAL's with its cron job.. while keeping the WAL's that were generated during the base backup that have to (I think) be kept with the base backup?  And then, once that happens, how do you tell barman to restore only the base backup ID and not to try to play back to the present time?

Anyway it's a cool program, thanks.  We program in python here, maybe we can make some contributions?

One thing I would like is to have better recover automation, for example I have postgres installed on the barman machine and can use this for temporary restores.  But I have to run several commands as root: to stop postgres, prepare the data directory (delete old data, make it writeable by barman).  Then as barman run the barman recover command,  then back to root, change data permissions to postgres, restart postgres....  It would be great if this could be simplified to one command.

One idea I have for this would be to allow barman server configurations not just for backups but for recovery too.  The conf already has the hostname, data directory, ssh command, connect information for the server.. so by giving the recover command just the name of the server you backed up, the time you want to restore (if any) and the name of the server you want to restore to, barman can do everything else via ssh and bring the server up for you from nothing or replacing (even checking to make sure there's enough space).

If recovering to a local server, barman could use ssh anyway, or prompt for the posgres password to take care of things.

And if you choose to recover to a server that's currently configured for backup, it would handle that also.. doing whatever is necessary.. stop the server, finish any remaining backup operations, recover the server, resync restart backups....

Anyway, thanks again for your work on this.  Hope you can answer my questions...

thanks!
Ben

Angelo Onofri

unread,
Jun 29, 2022, 7:02:17 AM6/29/22
to Barman, Backup and Recovery Manager for PostgreSQL
Hello There,

I hope you are well.

Thank you for you post. This is something I have being looking for for a while now.
Have you managed to store the wals for 3 months? 

I do backups with barman using dayly period but I am not sure if the wals in barman's folder are enought to restore PITR within the 24 hours as the database is quite big.
For what i understood, barman store the wals necessary from the begin to the end of the base backup of barman

How can i add more wals so I can restore a wider interval PITR?


Please let me know.

Thanks,
Angelo

Luca Ferrari

unread,
Jun 29, 2022, 7:09:24 AM6/29/22
to Barman, Backup and Recovery Manager for PostgreSQL
On Wed, Jun 29, 2022 at 1:02 PM Angelo Onofri <angelo...@gmail.com> wrote:
>
> Hello There,
>
> I hope you are well.
>
> Thank you for you post. This is something I have being looking for for a while now.
> Have you managed to store the wals for 3 months?

When you execute a `barman backup` a base backup is taken. Meanwhile,
barman (or better, the machine) is receiving WALs via either archiving
or streaming.
Therefore you can apply any PITR from the backup going further.
Ideally, you would survive with a single backup and all WALs
automatically archived, but the problem is that the far in the past
the backup is, the more WALs need to be replayed to get you on a
"recent" period.
In any case, if archiving/streaming is working, you have all the
pieces for rebuilding your database from the last backup.
Otherwise, please explain better the problem you are facing.

Luca

Angelo Onofri

unread,
Jun 29, 2022, 8:30:03 AM6/29/22
to Barman, Backup and Recovery Manager for PostgreSQL
Hello,

thank you for your quick reply.
Our issue is that we are trying to store backup with retention one week without having 7 base backup weekly.


barman@barmanelaastic:~$ barman check elaastic-v12
Server elaastic-v12:
        PostgreSQL: OK
        superuser or standard user with backup privileges: OK

        PostgreSQL streaming: OK
        wal_level: OK
        replication slot: OK
        directories: OK
        retention policy settings: OK
        backup maximum age: OK (no last_backup_maximum_age provided)
        backup minimum size: OK (428.8 GiB)
        wal maximum age: OK (no last_wal_maximum_age provided)
        wal size: OK (90.0 KiB)

        compression settings: OK
        failed backups: OK (there are 0 failed backups)
        minimum redundancy requirements: OK (have 2 backups, expected at least 0)

        pg_basebackup: OK
        pg_basebackup compatible: OK
        pg_basebackup supports tablespaces mapping: OK
        systemid coherence: OK

        pg_receivexlog: OK
        pg_receivexlog compatible: OK
        receive-wal running: OK
        archiver errors: OK

barman list-backup all
elaastic-v12 20220628T180002 - Tue Jun 28 18:20:39 2022 - Size: 428.8 GiB - WAL Size: 90.0 KiB
elaastic-v12 20220627T180001 - Mon Jun 27 18:20:17 2022 - Size: 428.8 GiB - WAL Size: 405.7 KiB

It is not clear for me what is happening if I choose the option "retention_policy= 2" if I have the parameter max_wal_size = 1gB. Whould it be enough to cover PITR of 24 hours?

Tonight I am going to test the retention_policy = recovery window of 7 days
I post below the configuration  in barman for this cluster:

[elaastic-v12]
description = "Cluster Elaastic-v12"
backup_method = postgres
backup_options = concurrent_backup
conninfo = host=10.101.72.x port=5012 user=barman dbname=postgres
streaming_conninfo = host=10.101.72.x port=5012 user=stream_barman dbname=postgres
streaming_archiver = on
slot_name = barman
create_slot = auto
#retention_policy = REDUNDANCY 2
retention_policy = RECOVERY WINDOW OF 7 DAYS

Thanks,
Angelo 

Luca Ferrari

unread,
Jun 29, 2022, 9:28:10 AM6/29/22
to Barman, Backup and Recovery Manager for PostgreSQL
On Wed, Jun 29, 2022 at 2:30 PM Angelo Onofri <angelo...@gmail.com> wrote:
> It is not clear for me what is happening if I choose the option "retention_policy= 2" if I have the parameter max_wal_size = 1gB. Whould it be enough to cover PITR of 24 hours?
>

max_wal_size is on the PostgreSQL side, and tells PostgreSQL to try, I
mean "try", to not store more than 1GB of wals if possible. This means
that if the archiving/streaming is done, and PostgreSQL does not need
anymore the exceeding wals, it will recycle (or delete) them because
_it is safe_ to do.
The retention policy, as far as I remember (it is a while I don't use
barman) forces barman to not nuke backups when running cron if those
backup are still in the recovery window. It means, barman will not
purge backups if they will be newer than your recovery window.
Therefore, I suspect if you don't schedule at least one backup per
week, you will end up with a deleted backup situation.
I suggest also to set the minimum_redundancy setting to a good value
(at least 1 in your case), and if you for any reason are not going to
do a scheduled backup, set the retention policy to redundancy.

Luca
Reply all
Reply to author
Forward
0 new messages