Database instance not starting

205 views
Skip to first unread message

Yuvraj Vansure

unread,
Aug 17, 2023, 8:21:01 AM8/17/23
to Postgres Operator
Postgres cluster created using PostgresCluster CR becomes unhealthy after worker nodes were drained and restarted.
We are seeing following logs in database Instance Pod

2023-07-27 17:01:58,052 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-07-27 17:01:58,072 WARNING: Postgresql is not running.
2023-07-27 17:01:58,073 INFO: Lock owner: None; I am instrumentationdb-instance1-zttf-0
2023-07-27 17:01:58,077 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7209530190925107282
  Database cluster state: shut down in recovery
  pg_control last modified: Thu Jul 27 16:31:44 2023
  Latest checkpoint location: 15/2A000028
  Latest checkpoint's REDO location: 15/2A000028
  Latest checkpoint's REDO WAL file: 0000007D000000150000002A
  Latest checkpoint's TimeLineID: 125
  Latest checkpoint's PrevTimeLineID: 125
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:45422
  Latest checkpoint's NextOID: 188416
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 478
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Thu Jul 13 19:18:39 2023
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 15/2A0000A0
  Min recovery ending loc's timeline: 125
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: afec4ac2d2d78c649caa0234cf9eaa6be0c85273f288b81884d878cbf295f8d8

2023-07-27 17:01:58,092 INFO: Lock owner: None; I am instrumentationdb-instance1-zttf-0
2023-07-27 17:01:58,252 INFO: starting as a secondary
2023-07-27 17:01:58,481 INFO: postmaster pid=102
/tmp/postgres:5432 - no response
2023-07-27 17:01:58.495 UTC [102] LOG:  pgaudit extension initialized
2023-07-27 17:01:58.516 UTC [102] LOG:  redirecting log output to logging collector process
2023-07-27 17:01:58.516 UTC [102] HINT:  Future log output will appear in directory "log".
/tmp/postgres:5432 - accepting connections
/tmp/postgres:5432 - accepting connections
2023-07-27 17:01:59,589 INFO: establishing a new patroni connection to the postgres cluster
2023-07-27 17:01:59,667 INFO: My wal position exceeds maximum replication lag
2023-07-27 17:01:59,788 INFO: following a different leader because i am not the healthiest node
2023-07-27 17:02:10,090 INFO: My wal position exceeds maximum replication lag
2023-07-27 17:02:10,099 INFO: following a different leader because i am not the healthiest node
2023-07-27 17:02:20,090 INFO: My wal position exceeds maximum replication lag
2023-07-27 17:02:20,100 INFO: following a different leader because i am not the healthiest node
2023-07-27 17:02:30,090 INFO: My wal position exceeds maximum replication lag

PostgresCluster CR
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: instrumentationdb
  namespace: ibm-common-services
  finalizers:
    - postgres-operator.crunchydata.com/finalizer
spec:
  backups:
    pgbackrest:
      image: >-
        registry.connect.redhat.com/crunchydata/crunchy-pgbackrest@sha256:efe775d3208befb2b7f026ef5fee3b03b306a9ba773709ec5c4c3391880ee60b
      repoHost: {}
      repos:
        - name: repo1
          schedules:
            incremental: '@daily'
          volume:
            volumeClaimSpec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 10G
              storageClassName: ocs-storagecluster-ceph-rbd
      restore:
        enabled: true
        options:
          - '--type=time'
          - '--target="2023-08-10 04:59:02"'
        repoName: repo1
  image: >-
    registry.connect.redhat.com/crunchydata/crunchy-postgres@sha256:6b570ee2922281eedc5c267c50ad30a895fbb4e8a132c3e2c3a38e29fe3d6f6a
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 10G
        storageClassName: ocs-storagecluster-ceph-rbd
      name: instance1
      replicas: 1
  openshift: true
  port: 5432
  postgresVersion: 13
  users:
    - databases:
        - instrumentationdb
      name: udsdbuser
    - name: postgres
status:
  observedGeneration: 21677
  usersRevision: '858744695'
  monitoring:
    exporterConfiguration: 559c4c97d6
  proxy:
    pgBouncer:
      postgresRevision: 5c9966f6bc
  pgbackrest:
    repoHost:
      apiVersion: apps/v1
      kind: StatefulSet
      ready: true
    repos:
      - bound: true
        name: repo1
        replicaCreateBackupComplete: true
        stanzaCreated: true
        volume: pvc-ddf1d026-6f33-4d57-bb29-c4ee07cce5aa
    scheduledBackups:
      - completionTime: '2023-07-11T00:00:11Z'
        cronJobName: instrumentationdb-repo1-incr
        repo: repo1
        startTime: '2023-07-11T00:00:00Z'
        succeeded: 1
        type: incr
      - completionTime: '2023-07-12T00:00:10Z'
        cronJobName: instrumentationdb-repo1-incr
        repo: repo1
        startTime: '2023-07-12T00:00:00Z'
        succeeded: 1
        type: incr
      - completionTime: '2023-07-13T00:00:10Z'
        cronJobName: instrumentationdb-repo1-incr
        repo: repo1
        startTime: '2023-07-13T00:00:00Z'
        succeeded: 1
        type: incr
      - cronJobName: instrumentationdb-repo1-incr
        failed: 7
        repo: repo1
        startTime: '2023-08-10T00:00:00Z'
        type: incr
  databaseRevision: 55c88b8fdb
  conditions:
    - lastTransitionTime: '2023-08-01T08:34:05Z'
      message: pgBackRest dedicated repository host is ready
      observedGeneration: 21677
      reason: RepoHostReady
      status: 'True'
      type: PGBackRestRepoHostReady
    - lastTransitionTime: '2023-03-12T05:39:26Z'
      message: pgBackRest replica create repo is ready for backups
      observedGeneration: 21677
      reason: StanzaCreated
      status: 'True'
      type: PGBackRestReplicaRepoReady
    - lastTransitionTime: '2023-03-12T05:40:40Z'
      message: pgBackRest replica creation is now possible
      observedGeneration: 21677
      reason: RepoBackupComplete
      status: 'True'
      type: PGBackRestReplicaCreate
  patroni:
    systemIdentifier: '7209530190925107282'
  instances:
    - name: instance1
      readyReplicas: 1
      replicas: 1
      updatedReplicas: 1

Environment

Please provide the following details:

  • Platform: OpenShift
  • Platform Version: 4.12
  • Postgres Version 13
  • Operator Version: postgresoperator.v5.4.1
  • Storage: ocs-storagecluster-ceph-rbd
Reply all
Reply to author
Forward
0 new messages