Postgres db disk size is growing during initial snapshot caused incident

250 views
Skip to first unread message

SK biswal

unread,
Sep 19, 2021, 12:39:39 PM9/19/21
to debezium
Hi Folks
we need some help here 
- we are using dbezium connector with our postgres db which is fairly big ( around 2 TB ) 
- Debezium conector version 1.5.2.Final
- Postgres 11
Issue we noticed while the initial snapshot is, db disk space was increased from 50 % to around 80% in two days.  we had to stop the connector and delete the replication slot to avoid a downtime . 
one other issue we noticed is while snapshot is progressing the replication slot is not active not sure if both the issues are related 

select * from pg_replication_slots 
shows 

active  | f
temporary| f

here is our debezium connector configurations 
{"name": "{{ GROUP_ID }}","config": {    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"plugin.name": "pgoutput",
"slot.name": "dbzm_slot_1",
"publication.name": "dbzm_publication_1",
"snapshot.mode": "initial",
"database.hostname": "{{ DB_HOST }}",
"database.port": "{{ DB_PORT }}",
"database.user": "{{ SECRET_USERNAME }}",
"database.password": "{{ SECRET_PASSWORD }}",
"database.dbname": "{{ DB_DBNAME }}",
"schema.include.list": “my_schema”,
"table.exclude.list": ""  }}

please help if any one has seen this kind of issue or apologize is this is something already discussed 

Chris Cranford

unread,
Sep 20, 2021, 10:55:43 AM9/20/21
to debe...@googlegroups.com, SK biswal
Hi -

Both of these behaviors are expected.

When the connector is first started, a replication slot is created with an LSN reference to the current change point.  During the snapshot, the replication slot will cause PostgreSQL to retain WAL entries that occur after the LSN reference.  If the snapshot takes a long time, a high volume of changes during the snapshot, or a combination of both then you can experience a WAL growth rate until the snapshot concludes and the connector can begin to stream changes.  This is also why you don't see the replication slot as active because we're not actually streaming any real-time changes during the snapshot phase, we've only secured the starting LSN reference and nothing more.

As an alternative, you could consider upgrading to Debezium 1.6 and give Incremental Snaphots a try instead.  It combines the idea that streaming and snapshots happen concurrently and shouldn't cause such high WAL retention growth like you experienced.

HTH,
CC
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/5071f317-62cd-4fca-b447-093c5c3c6309n%40googlegroups.com.

SK biswal

unread,
Sep 20, 2021, 11:46:02 AM9/20/21
to debezium
Thank you other wise we can increase the size of the disk space temporarily so that we account for the initial  snapshot. we will certainly explore   
Reply all
Reply to author
Forward
0 new messages