Debezium PostGreSQL multiple connectors to the same database

1,833 views
Skip to first unread message

Ruben Dejaegere

unread,
Feb 27, 2022, 12:22:26 PM2/27/22
to debezium
Hello,

Currently I'm looking setting up Debezium to capture changes in a PostGres database in GCP Cloud SQL. We are using the debezium mainly to stream new changes done.
The challenge I'm facing however is that currently we require multiple connectors in the same database as we would like to have a connector per schema. This to have a bit more redundancy when deploying it to production. I've been trying certain configurations (including using the publication name and the include schema lsit) but I keep running into the issue that the replication slots are not being consumed correctly, causing the wal files to pile up with it finally causing either a massive expansion of the storage or the server crashin.

This is the latest iteration (this still needs to be tested but I wanted to check if I was on the right track before I implemented it). The idea I'm currently thinking of executing is to consume all changes and use the filter functionality to only pass on the messages that are from the replication slots.:
    {
  "name": "postgresql-connector",  
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "HOST_HOLDER",
    "database.port": "5432",
    "database.user": "REPLICATION_USER_PLACEHOLDER",
    "database.password":  "PLACEHOLDER" ,
    "database.dbname" : "DB",
    "database.server.name": "DB",
    "plugin.name": "pgoutput",
    "snapshot.mode": "never", # Prevent it from taking snapshots as we don't need them. We only want to stream the new changes
    "event.processing.failure.handling.mode": "warn" #  warn logs the offset of the problematic event, skips that event, and continues processing.
    "slot.name": "dbz_{schema}",
    "slot.drop.on.stop": True,
    "transforms": "filter",
    "transforms.filter.type": "io.debezium.transforms.Filter",
    "transforms.filter.language": "jsr223.groovy",
    "transforms.filter.condition": "topic.startsWith("DB.{schema}")"
  }
}

Chris Cranford

unread,
Mar 1, 2022, 9:43:52 AM3/1/22
to debe...@googlegroups.com
Hi Ruben -

So the most important thing here is that you likely need to supply both the "slot.name" and "publication.name" configuration properties.  Since you likely want to allow PostgreSQL to do some pre-filtering of the events before they are sent over the replication stream, having separate publication names allows that to happen; otherwise you'll need to have a single publication that does no filtering, i.e. uses ALL TABLES.  But if we assume that you have the unique values set for "database.server.name", "slot.name", and "publication.name"; are you seeing any errors either in the database logs or the connector logs? 

Thanks,
CC
**** DISCLAIMER **** 
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/f21a1216-5e5b-4a7a-b91c-ffcbd585c407n%40googlegroups.com.

Ruben Dejaegere

unread,
Mar 1, 2022, 10:32:37 AM3/1/22
to debezium
No, the only thing I noticed was that the replication slots were showing extended as the wal_status and restarting the replication slots seemed to remove the additional wal files.
You are indeed correct that I was using unique slot names and database server name. For the publication name, before I was indeed working with 1 publication but I will adjust this according to your feedback.
In the mean time, I realized that I missed configuring the heartbeat so this is the latest configuration I'm testing it with the heartbeat setting set to 100ms which seems to be working better.
One thing I am noticing now however is that it seems like the CDC is lagging behind on the changes which is causing the replication_slot responsible of tracking the changes in the updated tables to again have the extended wal_status.

Current configuration
{
    "name":"{schema}-postgresql-connector",
    "config":{
        "connector.class":"io.debezium.connector.postgresql.PostgresConnector",
        "database.hostname":"127.0.0.1",

        "database.port":"5432",
        "database.user":"REPLICATION_USER_PLACEHOLDER",
        "database.password":"PASSWORD_PLACEHOLDER",

        "database.dbname":"DB",
        "database.server.name":"DB",
        "plugin.name":"pgoutput",
        "snapshot.mode":"never",
        "event.processing.failure.handling.mode":"warn",
        "publication.autocreate.mode":"disabled",
        "slot.name":"SLOT_NAME",
        "schema.include.list":"{schema}.*",
        "publication.name":"dbz_{schema}_publication",
        "slot.drop.on.stop":"true",
        "heartbeat.interval.ms":"100",
        "name":"schema-postgresql-connector"
    }
}

Chris Cranford

unread,
Mar 1, 2022, 10:49:44 AM3/1/22
to debe...@googlegroups.com
Hi Ruben -

I believe the more replication slots you have the more of a strain it becomes on the WAL sender processes, although I'm no PG expert.  You mentioned that you're deploying a connector per schema, how many schemas exactly? 

Ruben Dejaegere

unread,
Mar 1, 2022, 11:20:32 AM3/1/22
to debezium
Currently I'm testing it with 3. I took some screenshots to show what is happening while a script is running that's doing an insert/update of 250 000 records at a rate of around 100 records/second:

output of pg_replication_slots
Selection_004.png

Evolution WAL size
Selection_006.png

Ruben Dejaegere

unread,
Mar 1, 2022, 11:21:02 AM3/1/22
to debezium
Also it is possible that we'll have even more connectors than the 3 I'm currently working with.
Reply all
Reply to author
Forward
0 new messages