Restore from Azure Blob Storage

374 views
Skip to first unread message

Raja Rajenderan

unread,
Sep 8, 2021, 9:38:29 AM9/8/21
to Postgres Operator, Sriram Sr
Hi,

I'm taking backup only to Azure Blob Storage and not using PV.

- name: repo2
azure:
container: pgo-test
schedules:
full: "0 4 * * *"
incremental: "0 */2 * * *"

Backup works fine. Both Full and Incremental. But when I try to restore, I see the following exception in the "elephant-pgbackrest-restore" pod log. This is the pod that gets invoked when I try to run the PITR manifest.

WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg13' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
WARN: automatic backup set selection cannot be performed with provided time '2021-09-08 12:10:00 UTC', latest backup set will be used
      HINT: time format must be YYYY-MM-DD HH:MM:SS with optional msec and optional timezone (+/- HH or HHMM or HH:MM) - if timezone is omitted, local time is assumed (for UTC use +00)
waiting for server to start....2021-09-08 12:26:52.247 GMT [15] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-09-08 12:26:52.248 GMT [15] LOG:  listening on IPv6 address "::1", port 5432
2021-09-08 12:26:52.248 GMT [15] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2021-09-08 12:26:52.271 GMT [15] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-09-08 12:26:52.276 GMT [15] LOG:  hostssl record cannot match because SSL is disabled
2021-09-08 12:26:52.276 GMT [15] HINT:  Set ssl = on in postgresql.conf.
2021-09-08 12:26:52.276 GMT [15] CONTEXT:  line 4 of configuration file "/pgdata/pg13/pg_hba.conf"
2021-09-08 12:26:52.276 GMT [15] LOG:  hostssl record cannot match because SSL is disabled
2021-09-08 12:26:52.276 GMT [15] HINT:  Set ssl = on in postgresql.conf.
2021-09-08 12:26:52.276 GMT [15] CONTEXT:  line 5 of configuration file "/pgdata/pg13/pg_hba.conf"
2021-09-08 12:26:52.276 GMT [15] LOG:  hostssl record cannot match because SSL is disabled
2021-09-08 12:26:52.276 GMT [15] HINT:  Set ssl = on in postgresql.conf.
2021-09-08 12:26:52.276 GMT [15] CONTEXT:  line 7 of configuration file "/pgdata/pg13/pg_hba.conf"
2021-09-08 12:26:52.282 GMT [16] LOG:  database system was interrupted; last known up at 2021-09-08 12:00:04 GMT
2021-09-08 12:26:52.623 GMT [16] LOG:  starting point-in-time recovery to 2021-09-08 12:10:00+00
2021-09-08 12:26:53.046 GMT [16] LOG:  restored log file "00000001000000000000002A" from archive
2021-09-08 12:26:53.150 GMT [16] LOG:  redo starts at 0/2A000028
2021-09-08 12:26:53.155 GMT [16] LOG:  consistent recovery state reached at 0/2A000100
2021-09-08 12:26:53.155 GMT [15] LOG:  database system is ready to accept read only connections
 done
server started
2021-09-08 12:26:53.601 GMT [16] LOG:  redo done at 0/2A000100
2021-09-08 12:26:53.601 GMT [16] FATAL:  recovery ended before configured recovery target was reached
2021-09-08 12:26:53.603 GMT [15] LOG:  startup process (PID 16) exited with exit code 1
2021-09-08 12:26:53.603 GMT [15] LOG:  terminating any other active server processes
2021-09-08 12:26:53.606 GMT [15] LOG:  database system is shut down
psql: error: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
psql: error: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

Regards,
Raja

Jonathan S. Katz

unread,
Sep 9, 2021, 12:31:14 PM9/9/21
to Raja Rajenderan, Postgres Operator, Sriram Sr
- What version of PGO are you running? Can you provide the container tag?
- What platform of Kubernetes / version are you running on?
- Do you have the full manifest for how you created the Postgres cluster? Please do not include the Secrets.
- How did you initilaize th restore?
- Do you have any additional info from the PostgreSQL logs that indicate why the server is not starting? These are located in the "pg_log" directory, and would indicate why psql is erroring.

Jonathan S. Katz
VP Platform Engineering

Crunchy Data
Enterprise PostgreSQL 


Raja Rajenderan

unread,
Sep 9, 2021, 1:44:28 PM9/9/21
to Jonathan S. Katz, Postgres Operator, Sriram Sr
- PGO Version: 5.0.2
- Azure Kubernetes Service v1.21.2
- Attached (Filename: postgres.yaml)
- Point In Time Recovery (Attached filename: pitr.yaml)
- Attached logs (kubectl-logs and pg_log)

In the pitr.yaml we do not have any reference to Azure Blob Storage which is where the backup is stored.

Regards,
Raja
--
Regards,
Raja Rajenderan
pitr.yaml
postgres.yaml
pg_log
kubectl-logs

Raja Rajenderan

unread,
Sep 14, 2021, 2:59:30 PM9/14/21
to Postgres Operator, Raja Rajenderan, Postgres Operator, Sriram Rajaram, jonath...@crunchydata.com
Any suggestion on this?

Raja Rajenderan

unread,
Oct 5, 2021, 10:16:38 AM10/5/21
to Postgres Operator, Raja Rajenderan, Postgres Operator, Sriram Rajaram, jonath...@crunchydata.com
Deleted everything and tried again. Here are the logs in sequence.

1. Log for fetching data from the repo
---------------
/tmp/db-restore.log
2021-10-05 11:03:56.208 P01   INFO: restore file /pgdata/pg13/base/1/13296 (0B, 100%)
2021-10-05 11:03:56.212 P01   INFO: restore file /pgdata/pg13/base/1/13291 (0B, 100%)
2021-10-05 11:03:56.416 P01   INFO: restore file /pgdata/pg13/base/1/13286 (0B, 100%)
2021-10-05 11:03:56.417 P00   INFO: write updated /pgdata/pg13/postgresql.auto.conf
2021-10-05 11:03:56.432 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2021-10-05 11:03:56.536 P00   INFO: restore command end: completed successfully (73706ms)

cat postgres.restore.conf
unix_socket_directories = '/tmp'
archive_command = 'false'
archive_mode = 'on'

cat /pgdata/pg13/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.

# Recovery settings generated by pgBackRest restore on 2021-10-05 11:03:56
restore_command = 'pgbackrest --pg1-path=/pgdata/pg13 --repo=2 --stanza=db archive-get %f "%p"'
recovery_target_time = '2021-10-04 11:00:00'
recovery_target_action = 'promote'

cat /pgdata/pg13/log/postgresql-Tue.log
2021-10-05 07:49:09.188 UTC [84] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-10-05 07:49:09.189 UTC [84] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-10-05 07:49:09.189 UTC [84] LOG:  listening on IPv6 address "::", port 5432
2021-10-05 07:49:09.201 UTC [84] LOG:  listening on Unix socket "/tmp/postgres/.s.PGSQL.5432"
2021-10-05 07:49:09.212 UTC [87] LOG:  database system was shut down at 2021-10-05 07:49:08 UTC
2021-10-05 07:49:09.227 UTC [84] LOG:  database system is ready to accept connections
2021-10-05 07:54:48.378 UTC [652] FATAL:  password authentication failed for user "postgres"
2021-10-05 07:54:48.378 UTC [652] DETAIL:  Connection matched pg_hba.conf line 7: "hostssl all all all md5"

2. kubectl log from the backup pod
---------------------
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg13' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
WARN: unable to find backup set with stop time less than '2021-10-04 11:00:00', repo2: latest backup set will be used
waiting for server to start....2021-10-05 11:03:56.580 GMT [21] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-10-05 11:03:56.581 GMT [21] LOG:  listening on IPv6 address "::1", port 5432
2021-10-05 11:03:56.581 GMT [21] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2021-10-05 11:03:56.598 GMT [21] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-10-05 11:03:56.604 GMT [21] LOG:  hostssl record cannot match because SSL is disabled
2021-10-05 11:03:56.604 GMT [21] HINT:  Set ssl = on in postgresql.conf.
2021-10-05 11:03:56.604 GMT [21] CONTEXT:  line 4 of configuration file "/pgdata/pg13/pg_hba.conf"
2021-10-05 11:03:56.604 GMT [21] LOG:  hostssl record cannot match because SSL is disabled
2021-10-05 11:03:56.604 GMT [21] HINT:  Set ssl = on in postgresql.conf.
2021-10-05 11:03:56.604 GMT [21] CONTEXT:  line 5 of configuration file "/pgdata/pg13/pg_hba.conf"
2021-10-05 11:03:56.604 GMT [21] LOG:  hostssl record cannot match because SSL is disabled
2021-10-05 11:03:56.604 GMT [21] HINT:  Set ssl = on in postgresql.conf.
2021-10-05 11:03:56.604 GMT [21] CONTEXT:  line 7 of configuration file "/pgdata/pg13/pg_hba.conf"
2021-10-05 11:03:56.610 GMT [22] LOG:  database system was interrupted; last known up at 2021-10-05 11:00:04 GMT
2021-10-05 11:03:57.028 GMT [22] LOG:  starting point-in-time recovery to 2021-10-04 11:00:00+00
2021-10-05 11:03:57.413 GMT [22] LOG:  restored log file "00000001000000000000000B" from archive
2021-10-05 11:03:57.513 GMT [22] LOG:  redo starts at 0/B000028
2021-10-05 11:03:57.518 GMT [22] LOG:  consistent recovery state reached at 0/B000100
2021-10-05 11:03:57.519 GMT [21] LOG:  database system is ready to accept read only connections
 done
server started
2021-10-05 11:03:57.878 GMT [22] LOG:  redo done at 0/B000100
2021-10-05 11:03:57.878 GMT [22] FATAL:  recovery ended before configured recovery target was reached
2021-10-05 11:03:57.880 GMT [21] LOG:  startup process (PID 22) exited with exit code 1
2021-10-05 11:03:57.880 GMT [21] LOG:  terminating any other active server processes
2021-10-05 11:03:57.883 GMT [21] LOG:  database system is shut down
psql: error: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
psql: error: could not connect to server: No such file or directory


Does this information help? 

Regards,
Raja

Raja Rajenderan

unread,
Oct 5, 2021, 10:16:41 AM10/5/21
to Postgres Operator, Raja Rajenderan, Postgres Operator, Sriram Rajaram, jonath...@crunchydata.com
I tried this once again and see the same error in pg log.

-rw------- 1 postgres postgres 1398 Oct  4 16:42 postgresql-Mon.log
bash-4.4$ more postgresql-Mon.log
2021-10-04 16:33:14.569 UTC [84] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-10-04 16:33:14.570 UTC [84] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-10-04 16:33:14.570 UTC [84] LOG:  listening on IPv6 address "::", port 5432
2021-10-04 16:33:14.582 UTC [84] LOG:  listening on Unix socket "/tmp/postgres/.s.PGSQL.5432"
2021-10-04 16:33:14.591 UTC [87] LOG:  database system was shut down at 2021-10-04 16:33:13 UTC
2021-10-04 16:33:14.605 UTC [84] LOG:  database system is ready to accept connections
ERROR: [103]: unable to find a valid repository:
       repo2: [LockAcquireError] raised from remote-0 protocol on 'hippo-repo-host-0.hippo-pods.postgres-operator.svc.cluster.local.': unable to acquire lock on file '
/tmp/pgbackrest/db-archive.lock': Resource temporarily unavailable
       HINT: is another pgBackRest process running?
2021-10-04 16:33:22.230 UTC [92] LOG:  archive command failed with exit code 103
2021-10-04 16:33:22.230 UTC [92] DETAIL:  The failed archive command was: pgbackrest --stanza=db archive-push "pg_wal/000000010000000000000001"
2021-10-04 16:42:57.081 UTC [954] FATAL:  password authentication failed for user "amn-qa-testdb"
2021-10-04 16:42:57.081 UTC [954] DETAIL:  Role "amn-qa-testdb" does not exist.
Connection matched pg_hba.conf line 7: "hostssl all all all md5"

Jonathan S. Katz

unread,
Oct 6, 2021, 1:32:35 PM10/6/21
to Raja Rajenderan, Postgres Operator, Sriram Rajaram
What do the Postgres logs say?

From the original log you posted, I saw this:

==
ERROR: [103]: unable to find a valid repository:
       repo2: [FileMissingError] unable to load info file '/pgbackrest/postgres-operator/hippo/repo2/archive/db/archive.info' or '/pgbackrest/postgres-operator/hippo/r
epo2/archive/db/archive.info.copy':
       FileMissingError: raised from remote-0 protocol on 'hippo-repo-host-0.hippo-pods.postgres-operator.svc.cluster.local.': unable to open missing file '/pgbackrest
/postgres-operator/hippo/repo2/archive/db/archive.info' for read
       FileMissingError: raised from remote-0 protocol on 'hippo-repo-host-0.hippo-pods.postgres-operator.svc.cluster.local.': unable to open missing file '/pgbackrest
/postgres-operator/hippo/repo2/archive/db/archive.info.copy' for read
==

which seems to indicate that you are trying to restore from a repository that does not have any backup data in it.

Did you confirm that the backups were stored to azure?

Additionally, it seems that the PITR target time is invalid. This is indicated by:

WARN: unable to find backup set with stop time less than '2021-10-04 11:00:00', repo2: latest backup set will be used

Followed by:

2021-10-05 11:03:57.878 GMT [22] FATAL:  recovery ended before configured recovery target was reached
2021-10-05 11:03:57.880 GMT [21] LOG:  startup process (PID 22) exited with exit code 1
2021-10-05 11:03:57.880 GMT [21] LOG:  terminating any other active server processes
2021-10-05 11:03:57.883 GMT [21] LOG:  database system is shut down



Jonathan S. Katz
VP Platform Engineering

Crunchy Data
Enterprise PostgreSQL 


jonath...@crunchydata.com

unread,
Oct 22, 2021, 5:15:55 PM10/22/21
to Postgres Operator, jonath...@crunchydata.com, Postgres Operator, Sriram Rajaram, Raja Rajenderan
The error can be traced to here:

      options:
      - --type=time
      - --target="2021-09-08 12:10:00 UTC"

specifically, the use of the named timezone in the PITR target.

This is likely due to a documentation error, which was corrected in this PR:


Thanks,

Jonathan

Reply all
Reply to author
Forward
0 new messages