pg13_wal directory filling up quickly | Please give me some suggestions to avoid filling this directory

Santosh Yerram

unread,

Apr 12, 2023, 6:30:17 AM4/12/23

to Postgres Operator

Hi All,

I am using Crunchy Postgres for Kubernetes V5.2.0 in RedHat OpenShift Container V4.8.25.

I am facing issues with the mount point (/pgdata) is filling up quickly and it is out of diskspace in a day or two as I am allocating 10 GB of disk space for this. Pg13_wal directory is the one consuming 95% of the disk space and this one keep filling up quickly.

I could see below message only in logs.

I cannot attach the complete log file due to restrictions in my organisation.

Error Message:

2023-04-12 00:06:05.779 UTC [542] DETAIL: Role "ccp_monitoring" does not exist.
Connection matched pg_hba.conf line 8: "host all "ccp_monitoring" "::1/128" md5"
ERROR: [103]: unable to find a valid repository:
repo1: [ArchiveMismatchError] PostgreSQL version 13, system-id 7220947104560214105 do not match repo1 stanza version 13, system-id 7220570976637997146
HINT: are you archiving to the correct stanza?
2023-04-12 00:06:07.511 UTC [117] LOG: archive command failed with exit code 103
2023-04-12 00:06:07.511 UTC [117] DETAIL: The failed archive command was: pgbackrest --stanza=db archive-push "pg_wal/000000010000000000000001"
2023-04-12 00:06:07.526 UTC [546] LOG: connection received: host=100.103.0.149 port=35960

ERROR: [104]: archive-push command encountered error(s):
repo1: [FileReadError] timeout after 60000ms waiting for read from 's3.openshift-storage.svc:443'
2023-04-12 00:16:48.919 UTC [117] LOG: archive command failed with exit code 104
2023-04-12 00:16:48.919 UTC [117] DETAIL: The failed archive command was: pgbackrest --stanza=db archive-push "pg_wal/000000010000000000000007"

Thanks,

Santosh.

Maksym Babenko

unread,

Apr 13, 2023, 9:15:24 AM4/13/23

to Postgres Operator, Santosh Yerram

I think you don't have any backup configured (like S3 or GCP) so it can't upload archives from WAL

Also you can use separate volume for WAL

using
walVolumeClaimSpec https://access.crunchydata.com/documentation/postgres-operator/v5/references/crd/#postgresclusterspecinstancesindexwalvolumeclaimspec

instead of keeping it with pgdata

Santosh Yerram

unread,

Apr 13, 2023, 8:30:46 PM4/13/23

to Postgres Operator, Maksym Babenko, Santosh Yerram

I can allocate separate volume for WAL but even then it will fill up quickly. Currently I am allocating 15 GB in which 14.5 GB is consumed by PG13_WAL directory. I would like to know why it is filling up quickly. any configuration setting to be done to avoid this?

Maksym Babenko

unread,

Apr 14, 2023, 6:55:52 AM4/14/23

to Postgres Operator, Santosh Yerram, Maksym Babenko

repo1: [FileReadError] timeout after 60000ms waiting for read from 's3.openshift-storage.svc:443'
2023-04-12 00:16:48.919 UTC [117] LOG: archive command failed with exit code 104

as you can see there there is some configuration issue with you backup repo

I guess it's firewall rules (check network policies for your cluster)

also you may need to use some parameters for pg_backrest for repo1 related to S3 storage https://pgbackrest.org/configuration.html

So the problem there that archiving is not working and WAL is not archiving and upload from you storage to S3 compatible storage

drew.s...@crunchydata.com

unread,

Apr 19, 2023, 7:32:09 PM4/19/23

to Postgres Operator, Maksym Babenko, Santosh Yerram

Santosh,

I agree with Maksym's assessment: WAL isn't archiving which is causing your WAL directory to fill up. Attempt to reach your s3 storage is timing out... Were you able to get this issue resolved?

Regards,

Drew

Santosh Yerram

unread,

Apr 21, 2023, 10:23:56 AM4/21/23

to Postgres Operator, drew.s...@crunchydata.com, Maksym Babenko, Santosh Yerram

Thanks Drew and Maksym for your help on this. We could not find the root cause of S3 storage failure instead We opted for file system backup for now and looks fine so far.

Reply all

Reply to author

Forward