DB in constant recovery mode

339 views
Skip to first unread message

Bartosz Styrczewski

unread,
Jan 18, 2022, 9:07:07 AM1/18/22
to Postgres Operator
Hi all,

I am running pgo in version 4.6.0 
I have set up my cluster with fallowing commands

pgo create cluster --metrics my-cluster --database my-db --username my-user
pgo -n pgo scale my-cluster
pgo test --all
pgo restart my-cluster
pgo create schedule my-cluster --schedule="15 3 * * 3" --schedule-type=pgbackrest  --pgbackrest-backup-type=full
pgo create schedule my-cluster --schedule="18 4 * * *" --schedule-type=pgbackrest  --pgbackrest-backup-type=incr

After that I have edited the config map and added following lines 

apiVersion: v1
data:
  db-dcs-config: |
    postgresql:
      parameters:
        max_connections: 500
        shared_buffers: 2048MB
        temp_buffers: 16MB
        work_mem: 8MB
        max       log_min_duration_sample: 6000
        wal_segmen_wal_size: 20480

 t_size: 18777216
        wal_keep_size: 1280MB


Basically DB since created is constantly in recovery mode. 
 query SELECT pg_is_in_recovery() is returning true

And my liquibase setup is throwing me errors like
liquibase.exception.DatabaseException: ERROR: cannot execute CREATE TABLE in a read-only transaction

Any help would be appreciated.

BR
Bartek Styrczewski

Jonathan S. Katz

unread,
Jan 18, 2022, 9:31:25 AM1/18/22
to Bartosz Styrczewski, Postgres Operator
Once you have stabilized, I would first recommend running the latest version of 4.6 as that has several bug fixes.

That being said, while the general repro steps help, the logs (both kubectl logs or /pgdata/<clustername>/pg_logs) would give more information about your current operational state.

Thanks,

Jonathan

Bartosz Styrczewski

unread,
Jan 27, 2022, 3:01:01 PM1/27/22
to Postgres Operator, Jonathan S. Katz, Postgres Operator, Bartosz Styrczewski
Hi  Jonathan,

kubectl logs and logs from pg_log directory. Both from primary and replica. What I did was restart the cluster and try to run liquibase scripts again.
Hope that that the logs will be helpful.

kubect replica:
2022-01-27 08:15:39,510 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-6f4cd5bb7d-gxkwn
2022-01-27 08:15:39,510 INFO: does not have lock
2022-01-27 08:15:39,512 INFO: no action.  i am a secondary and i am following a leader
2022-01-27 08:15:48,325 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-6f4cd5bb7d-gxkwn
2022-01-27 08:15:48,381 INFO: Waiting for master to release us from synchronous standby
2022-01-27 08:15:49,514 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-6f4cd5bb7d-gxkwn
2022-01-27 08:15:49,515 INFO: restart in progress
2022-01-27 08:15:50,439 INFO: closed patroni connection to the postgresql cluster
2022-01-27 08:15:50.685 UTC [5225] LOG:  pgaudit extension initialized
2022-01-27 08:15:50,688 INFO: postmaster pid=5225
/tmp:5432 - no response
2022-01-27 08:15:50.753 UTC [5225] LOG:  redirecting log output to logging collector process
2022-01-27 08:15:50.753 UTC [5225] HINT:  Future log output will appear in directory "pg_log".
/tmp:5432 - rejecting connections
/tmp:5432 - rejecting connections
/tmp:5432 - accepting connections
2022-01-27 08:15:52,751 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-6f4cd5bb7d-gxkwn
2022-01-27 08:15:52,751 INFO: does not have lock
2022-01-27 08:15:52,752 INFO: establishing a new patroni connection to the postgres cluster
2022-01-27 08:15:52,814 INFO: no action.  i am a secondary and i am following a leader
2022-01-27 08:15:55,419 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-6f4cd5bb7d-gxkwn
2022-01-27 08:15:55,419 INFO: does not have lock
2022-01-27 08:15:55,473 INFO: no action.  i am a secondary and i am following a leader

kubect primary:
2022-01-27 08:15:19,515 INFO: no action.  i am the leader with the lock
2022-01-27 08:15:29,463 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-cnvd-85784499bf-8x89k
2022-01-27 08:15:29,513 INFO: no action.  i am the leader with the lock
2022-01-27 08:15:39,463 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-cnvd-85784499bf-8x89k
2022-01-27 08:15:39,512 INFO: no action.  i am the leader with the lock
2022-01-27 08:15:49,463 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-cnvd-85784499bf-8x89k
2022-01-27 08:15:49,515 INFO: Updating synchronous privilege temporarily from ['my-app-db-6f4cd5bb7d-gxkwn'] to []
2022-01-27 08:15:49,562 INFO: Assigning synchronous standby status to []
server signaled
2022-01-27 08:15:49,572 INFO: no action.  i am the leader with the lock
2022-01-27 08:15:53,411 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-cnvd-85784499bf-8x89k
2022-01-27 08:15:54,067 INFO: closed patroni connection to the postgresql cluster
2022-01-27 08:15:54.296 UTC [78400] LOG:  pgaudit extension initialized
2022-01-27 08:15:54,306 INFO: postmaster pid=78400
/tmp:5432 - no response
2022-01-27 08:15:54.374 UTC [78400] LOG:  redirecting log output to logging collector process
2022-01-27 08:15:54.374 UTC [78400] HINT:  Future log output will appear in directory "pg_log".
/tmp:5432 - accepting connections
/tmp:5432 - accepting connections
2022-01-27 08:15:55,364 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-cnvd-85784499bf-8x89k
2022-01-27 08:15:55,364 INFO: establishing a new patroni connection to the postgres cluster
2022-01-27 08:15:55,483 INFO: no action.  i am the leader with the lock
2022-01-27 08:16:05,863 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-cnvd-85784499bf-8x89k
2022-01-27 08:16:05,913 INFO: Assigning synchronous standby status to ['my-app-db-6f4cd5bb7d-gxkwn']
server signaled
2022-01-27 08:16:07,975 INFO: Synchronous standby status assigned to ['my-app-db-6f4cd5bb7d-gxkwn']
2022-01-27 08:16:07,976 INFO: no action.  i am the leader with the lock
2022-01-27 08:16:15,863 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-cnvd-85784499bf-8x89k
2022-01-27 08:16:15,966 INFO: no action.  i am the leader with the lock
2022-01-27 08:16:25,863 INFO: Lock owner: my-app-db-cnvd-85784499bf-8x89k; I am my-app-db-cnvd-85784499bf-8x89k
2022-01-27 08:16:25,913 INFO: no action.  i am the leader with the lock

pg_logs primary:
2022-01-27 08:15:50.387 UTC [80696] LOG:  received fast shutdown request
2022-01-27 08:15:50.388 UTC [80696] LOG:  aborting any active transactions
2022-01-27 08:15:50.388 UTC [80714] FATAL:  terminating connection due to administrator command
2022-01-27 08:15:50.388 UTC [80710] FATAL:  terminating walreceiver process due to administrator command
2022-01-27 08:15:50.392 UTC [80707] LOG:  shutting down
2022-01-27 08:15:50.422 UTC [80696] LOG:  database system is shut down
2022-01-27 08:15:50.753 UTC [5225] LOG:  starting PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), 64-bit
2022-01-27 08:15:50.753 UTC [5225] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-01-27 08:15:50.762 UTC [5225] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-01-27 08:15:50.780 UTC [5228] LOG:  database system was shut down in recovery at 2022-01-27 08:15:50 UTC
2022-01-27 08:15:51.088 UTC [5228] LOG:  entering standby mode
2022-01-27 08:15:51.393 UTC [5228] LOG:  restored log file "00000004.history" from archive
2022-01-27 08:15:51.538 UTC [5257] FATAL:  the database system is starting up
2022-01-27 08:15:51.716 UTC [5259] FATAL:  the database system is starting up
2022-01-27 08:15:51.729 UTC [5261] FATAL:  the database system is starting up
2022-01-27 08:15:51.774 UTC [5228] LOG:  restored log file "00000004000000040000008A" from archive
2022-01-27 08:15:51.827 UTC [5228] LOG:  redo starts at 4/8A0000D8
2022-01-27 08:15:52.136 UTC [5228] LOG:  consistent recovery state reached at 4/8B000060
2022-01-27 08:15:52.136 UTC [5228] LOG:  record with incorrect prev-link 4/37000028 at 4/8B000060
2022-01-27 08:15:52.137 UTC [5225] LOG:  database system is ready to accept read only connections
2022-01-27 08:15:52.147 UTC [5267] LOG:  started streaming WAL from primary at 4/8B000000 on timeline 4
2022-01-27 08:15:53.684 UTC [5267] LOG:  replication terminated by primary server
2022-01-27 08:15:53.684 UTC [5267] DETAIL:  End of WAL reached on timeline 4 at 4/8C0000A0.
2022-01-27 08:15:53.684 UTC [5267] FATAL:  could not send end-of-streaming message to primary: no COPY in progress
2022-01-27 08:15:54.298 UTC [5228] LOG:  invalid resource manager ID 88 at 4/8C0000A0
2022-01-27 08:15:54.302 UTC [5297] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "10.244.1.193" and accepting
                TCP/IP connections on port 5432?
2022-01-27 08:15:59.325 UTC [5302] LOG:  started streaming WAL from primary at 4/8C000000 on timeline 4
2022-01-27 08:16:10.077 UTC [5362] ERROR:  relation "my-app.databasechangeloglock" does not exist at character 22
2022-01-27 08:16:10.077 UTC [5362] STATEMENT:  SELECT COUNT(*) FROM "my-app".databasechangeloglock
2022-01-27 08:16:10.317 UTC [5362] ERROR:  cannot execute CREATE TABLE in a read-only transaction
2022-01-27 08:16:10.317 UTC [5362] STATEMENT:  CREATE TABLE "my-app".databasechangeloglock (ID INTEGER NOT NULL, LOCKED BOOLEAN NOT NULL, LOCKGRANTED TIMESTAMP WITHOUT TIME ZONE, LOCKEDBY VARCHAR(255), CONSTRAINT DATABASECHANGELOGLOCK_PKEY PRIMARY KEY (ID))
2022-01-27 08:16:11.234 UTC [5382] LOG:  could not receive data from client: Connection reset by peer
2022-01-27 08:16:11.256 UTC [5381] LOG:  could not receive data from client: Connection reset by peer

pg_logs replica:
2022-01-27 08:15:49.571 UTC [72172] LOG:  received SIGHUP, reloading configuration files
2022-01-27 08:15:49.572 UTC [72172] LOG:  parameter "synchronous_standby_names" removed from configuration file, reset to default
2022-01-27 08:15:53.454 UTC [72172] LOG:  received fast shutdown request
2022-01-27 08:15:53.461 UTC [72172] LOG:  aborting any active transactions
2022-01-27 08:15:53.461 UTC [72198] FATAL:  terminating connection due to administrator command
2022-01-27 08:15:53.464 UTC [72172] LOG:  background worker "logical replication launcher" (PID 72182) exited with exit code 1
2022-01-27 08:15:53.465 UTC [72176] LOG:  shutting down
2022-01-27 08:15:54.056 UTC [72172] LOG:  database system is shut down
2022-01-27 08:15:54.374 UTC [78400] LOG:  starting PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), 64-bit
2022-01-27 08:15:54.374 UTC [78400] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-01-27 08:15:54.397 UTC [78400] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-01-27 08:15:54.432 UTC [78403] LOG:  database system was shut down at 2022-01-27 08:15:53 UTC
2022-01-27 08:15:54.448 UTC [78400] LOG:  database system is ready to accept connections
2022-01-27 08:16:05.922 UTC [78400] LOG:  received SIGHUP, reloading configuration files
2022-01-27 08:16:05.923 UTC [78400] LOG:  parameter "synchronous_standby_names" changed to ""my-app-db-6f4cd5bb7d-gxkwn""
2022-01-27 08:16:09.329 UTC [78431] LOG:  standby "my-app-db-6f4cd5bb7d-gxkwn" is now a synchronous standby with priority 1

BR,
Bartek

Andrew L'Ecuyer

unread,
Jan 27, 2022, 9:12:09 PM1/27/22
to Postgres Operator, Bartosz Styrczewski, Postgres Operator
Is synchronous mode enabled for the cluster?  Messages such as the following indicate as much:

2022-01-27 08:15:49,515 INFO: Updating synchronous privilege temporarily from ['my-app-db-6f4cd5bb7d-gxkwn'] to []
2022-01-27 08:15:49,562 INFO: Assigning synchronous standby status to []
server signaled


And while I do not see the "--sync-replication" flag referenced in the in the "pgo create cluster" command you provided, it is also possible to enable synchronous replication globally by setting "Cluster.SyncReplication" to "true" in the pgo.yaml configuration file.

Can you therefore provide a copy of your pgo.yaml configuration to ensure we have a proper view of your full configuration?

Thanks,
Andrew

Bartosz Styrczewski

unread,
Jan 28, 2022, 9:25:30 AM1/28/22
to Andrew L'Ecuyer, Postgres Operator, Krzysztof Krajewski, Rafał Turkiewicz, Dawid Kaszuba
Hi Andrew, 

you can find my pgo.yaml file in the attachment.

BR,
Bartosz Styrczewski
Pretius | Senior Java Developer
mobile: +48 692 647 634
site: www.pretius.com


Wolny od wirusów. www.avg.com
pgo.yaml
Reply all
Reply to author
Forward
0 new messages