Connector getting Inactive due to replication slot PID already exists ERROR

6,012 views
Skip to first unread message

Pavan Manda

unread,
Feb 14, 2022, 12:24:01 PM2/14/22
to debezium
Hi Team,
I'm getting the below error even with 1 k8 pod and task.max=1. Few days backwhen we got this issue, we have scale-down the pods to 1 and restarted the connector BUT still seeing the same issue today though no# of pods are 1 and task.max=1 in connector configuration. Could you please suggest.

Caused by: io.debezium.DebeziumException: Failed to start replication stream at LSN{B9/DD22FBF0};
when setting up multiple connectors for the same database host,
please make sure to use a distinct replication slot name for each.
at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startStreaming(PostgresReplicationConnection.java:309)
at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:165)

Caused by: org.postgresql.util.PSQLException:
ERROR: replication slot \"indv_pref_atr_connector_slot\" is active for PID 36604

Pavan Manda

unread,
Feb 15, 2022, 12:34:45 AM2/15/22
to debezium
Hi Team,
Any clue on the below request? Please note that we have deployed connectors on unique "slots" only. No other connector is using this slot. But still we are seeing this error. Could you please suggest.

Sohaib Omar

unread,
Jul 20, 2022, 4:26:27 PM7/20/22
to debezium
Hi,
Any clue what caused the above? The connector task failed suddenly and was reporting the same error as you mentioned "ERROR: replication slot "debezium" is active for PID 19530". we have not been able to identify what caused this so far.

debezium-connect version: 1.8
connector configs: 
'{
"name": "resource-svc-outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"plugin.name": "pgoutput",
"database.hostname": "postgres",
"database.port": "5432",
"database.user": "postgres",
"database.password": "1234",
"database.dbname": "resource_db",
"database.server.name": "resource-outbox-server",
"tombstones.on.delete": "false",
"table.whitelist": "public.resources_outbox",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.table.expand.json.payload": "true",
"value.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"transforms.outbox.route.by.field": "resource_type",
"transforms.outbox.route.topic.replacement": "dev-agent-event",
"transforms.outbox.table.field.event.key" : "event_type",
"transforms.outbox.table.fields.additional.placement" : "tracingid:header",
}
}'

logs at the time when this happened:
[2022-07-20 08:59:27,300] INFO WorkerSourceTask{id=resource-svc-outbox-connector-0} Finished commitOffsets successfully in 1 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:574)
[2022-07-20 08:59:44,029] ERROR Producer failure (io.debezium.pipeline.ErrorHandler:31)
org.postgresql.util.PSQLException: Database connection failed when writing to copy
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1091)
    at org.postgresql.core.v3.CopyDualImpl.flushCopy(CopyDualImpl.java:30)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.updateStatusInternal(V3PGReplicationStream.java:195)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.timeUpdateStatus(V3PGReplicationStream.java:186)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:128)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:82)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.readPending(PostgresReplicationConnection.java:473)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.processMessages(PostgresStreamingChangeEventSource.java:205)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:167)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:40)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:166)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:127)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.SocketException: Connection or outbound has closed
    at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1190)
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
    at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
    at org.postgresql.core.PGStream.flush(PGStream.java:665)
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1089)
    ... 16 more
[2022-07-20 08:59:44,031] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 08:59:44,031] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 08:59:44,031] INFO Finished streaming (io.debezium.pipeline.ChangeEventSourceCoordinator:167)
[2022-07-20 08:59:44,031] INFO Connected metrics set to 'false' (io.debezium.pipeline.metrics.StreamingChangeEventSourceMetrics:70)
[2022-07-20 08:59:44,184] WARN Going to restart connector after 10 sec. after a retriable exception (io.debezium.connector.common.BaseSourceTask:238)
[2022-07-20 08:59:44,184] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 08:59:44,185] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 08:59:44,185] WARN WorkerSourceTask{id=resource-svc-outbox-connector-0} failed to poll records from SourceTask. Will retry operation. (org.apache.kafka.connect.runtime.WorkerSourceTask:291)
org.apache.kafka.connect.errors.RetriableException: An exception occurred in the change event producer. This connector will be restarted.
    at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:38)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:170)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:40)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:166)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:127)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.postgresql.util.PSQLException: Database connection failed when writing to copy
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1091)
    at org.postgresql.core.v3.CopyDualImpl.flushCopy(CopyDualImpl.java:30)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.updateStatusInternal(V3PGReplicationStream.java:195)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.timeUpdateStatus(V3PGReplicationStream.java:186)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:128)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:82)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.readPending(PostgresReplicationConnection.java:473)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.processMessages(PostgresStreamingChangeEventSource.java:205)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:167)
    ... 8 more
Caused by: java.net.SocketException: Connection or outbound has closed
    at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1190)
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
    at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
    at org.postgresql.core.PGStream.flush(PGStream.java:665)
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1089)
    ... 16 more
[2022-07-20 08:59:44,185] INFO Awaiting end of restart backoff period after a retriable error (io.debezium.connector.common.BaseSourceTask:214)
[2022-07-20 08:59:46,185] INFO Awaiting end of restart backoff period after a retriable error (io.debezium.connector.common.BaseSourceTask:214)
[2022-07-20 08:59:48,186] INFO Awaiting end of restart backoff period after a retriable error (io.debezium.connector.common.BaseSourceTask:214)
[2022-07-20 08:59:50,186] INFO Awaiting end of restart backoff period after a retriable error (io.debezium.connector.common.BaseSourceTask:214)
[2022-07-20 08:59:52,186] INFO Awaiting end of restart backoff period after a retriable error (io.debezium.connector.common.BaseSourceTask:214)
[2022-07-20 08:59:54,186] WARN Using configuration property "table.whitelist" is deprecated and will be removed in future versions. Please use "table.include.list" instead. (io.debezium.config.Configuration:2164)

[2022-07-20 08:59:54,187] INFO Starting PostgresConnectorTask with configuration: (io.debezium.connector.common.BaseSourceTask:127)

[2022-07-20 09:00:32,185] INFO First LSN 'LSN{112/842173C8}' received (io.debezium.connector.postgresql.connection.WalPositionLocator:60)
[2022-07-20 09:00:32,193] INFO Received COMMIT LSN 'LSN{112/84217750}' larger than than last stored commit LSN 'LSN{112/84201208}' (io.debezium.connector.postgresql.connection.WalPositionLocator:90)
[2022-07-20 09:00:32,193] INFO Will restart from LSN 'LSN{112/842173C8}' that is start of the first unprocessed transaction (io.debezium.connector.postgresql.connection.WalPositionLocator:102)
[2022-07-20 09:00:32,193] INFO WAL resume position 'LSN{112/842173C8}' discovered (io.debezium.connector.postgresql.PostgresStreamingChangeEventSource:313)
[2022-07-20 09:00:32,194] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 09:00:32,194] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 09:00:32,237] INFO Initializing PgOutput logical decoder publication (io.debezium.connector.postgresql.connection.PostgresReplicationConnection:133)
[2022-07-20 09:00:32,240] WARN Failed to start replication stream at LSN{112/84201208}, waiting for 10000 ms and retrying, attempt number 1 over 6 (io.debezium.connector.postgresql.connection.PostgresReplicationConnection:315)
[2022-07-20 09:00:42,242] WARN Failed to start replication stream at LSN{112/84201208}, waiting for 10000 ms and retrying, attempt number 2 over 6 (io.debezium.connector.postgresql.connection.PostgresReplicationConnection:315)
[2022-07-20 09:00:52,243] WARN Failed to start replication stream at LSN{112/84201208}, waiting for 10000 ms and retrying, attempt number 3 over 6 (io.debezium.connector.postgresql.connection.PostgresReplicationConnection:315)
[2022-07-20 09:01:02,245] WARN Failed to start replication stream at LSN{112/84201208}, waiting for 10000 ms and retrying, attempt number 4 over 6 (io.debezium.connector.postgresql.connection.PostgresReplicationConnection:315)
[2022-07-20 09:01:12,247] WARN Failed to start replication stream at LSN{112/84201208}, waiting for 10000 ms and retrying, attempt number 5 over 6 (io.debezium.connector.postgresql.connection.PostgresReplicationConnection:315)
[2022-07-20 09:01:22,249] WARN Failed to start replication stream at LSN{112/84201208}, waiting for 10000 ms and retrying, attempt number 6 over 6 (io.debezium.connector.postgresql.connection.PostgresReplicationConnection:315)
[2022-07-20 09:01:27,302] INFO WorkerSourceTask{id=resource-svc-outbox-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:478)
[2022-07-20 09:01:27,302] INFO WorkerSourceTask{id=resource-svc-outbox-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
[2022-07-20 09:01:27,302] ERROR WorkerSourceTask{id=resource-svc-outbox-connector-0} Exception thrown while calling task.commit() (org.apache.kafka.connect.runtime.WorkerSourceTask:586)
org.apache.kafka.connect.errors.ConnectException: org.postgresql.util.PSQLException: Database connection failed when writing to copy
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.commitOffset(PostgresStreamingChangeEventSource.java:386)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.commitOffset(ChangeEventSourceCoordinator.java:172)
    at io.debezium.connector.common.BaseSourceTask.commit(BaseSourceTask.java:284)
    at org.apache.kafka.connect.runtime.WorkerSourceTask.commitSourceTask(WorkerSourceTask.java:584)
    at org.apache.kafka.connect.runtime.WorkerSourceTask.commitOffsets(WorkerSourceTask.java:528)
    at org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter.commit(SourceTaskOffsetCommitter.java:113)
    at org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter.access$000(SourceTaskOffsetCommitter.java:47)
    at org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter$1.run(SourceTaskOffsetCommitter.java:86)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.postgresql.util.PSQLException: Database connection failed when writing to copy
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1091)
    at org.postgresql.core.v3.CopyDualImpl.flushCopy(CopyDualImpl.java:30)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.updateStatusInternal(V3PGReplicationStream.java:195)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.forceUpdateStatus(V3PGReplicationStream.java:113)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.doFlushLsn(PostgresReplicationConnection.java:511)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.flushLsn(PostgresReplicationConnection.java:504)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.commitOffset(PostgresStreamingChangeEventSource.java:379)
    ... 13 more
Caused by: java.io.IOException: Stream closed
    at java.base/sun.nio.cs.StreamEncoder.ensureOpen(StreamEncoder.java:45)
    at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
    at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:254)
    at org.postgresql.core.PGStream.flush(PGStream.java:663)
    at org.postgresql.core.v3.QueryExecutorImpl.flushCopy(QueryExecutorImpl.java:1089)
    ... 19 more
[2022-07-20 09:01:32,251] ERROR Producer failure (io.debezium.pipeline.ErrorHandler:31)
io.debezium.DebeziumException: Failed to start replication stream at LSN{112/84201208}; when setting up multiple connectors for the same database host, please make sure to use a distinct replication slot name for each.
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startStreaming(PostgresReplicationConnection.java:312)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:163)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:40)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:166)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:127)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.postgresql.util.PSQLException: ERROR: replication slot "debezium" is active for PID 19530
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2552)
    at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1211)
    at org.postgresql.core.v3.QueryExecutorImpl.startCopy(QueryExecutorImpl.java:893)
    at org.postgresql.core.v3.replication.V3ReplicationProtocol.initializeReplication(V3ReplicationProtocol.java:60)
    at org.postgresql.core.v3.replication.V3ReplicationProtocol.startLogical(V3ReplicationProtocol.java:44)
    at org.postgresql.replication.fluent.ReplicationStreamBuilder$1.start(ReplicationStreamBuilder.java:38)
    at org.postgresql.replication.fluent.logical.LogicalStreamBuilder.start(LogicalStreamBuilder.java:41)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startPgReplicationStream(PostgresReplicationConnection.java:582)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.createReplicationStream(PostgresReplicationConnection.java:417)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startStreaming(PostgresReplicationConnection.java:304)
    ... 9 more
[2022-07-20 09:01:32,252] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 09:01:32,252] INFO Finished streaming (io.debezium.pipeline.ChangeEventSourceCoordinator:167)
[2022-07-20 09:01:32,252] INFO Connected metrics set to 'false' (io.debezium.pipeline.metrics.StreamingChangeEventSourceMetrics:70)
[2022-07-20 09:01:32,398] INFO WorkerSourceTask{id=resource-svc-outbox-connector-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:478)
[2022-07-20 09:01:32,398] INFO WorkerSourceTask{id=resource-svc-outbox-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
[2022-07-20 09:01:32,398] ERROR WorkerSourceTask{id=resource-svc-outbox-connector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:187)
org.apache.kafka.connect.errors.ConnectException: An exception occurred in the change event producer. This connector will be stopped.
    at io.debezium.pipeline.ErrorHandler.setProducerThrowable(ErrorHandler.java:42)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:170)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:40)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.streamEvents(ChangeEventSourceCoordinator.java:166)
    at io.debezium.pipeline.ChangeEventSourceCoordinator.lambda$start$0(ChangeEventSourceCoordinator.java:127)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: io.debezium.DebeziumException: Failed to start replication stream at LSN{112/84201208}; when setting up multiple connectors for the same database host, please make sure to use a distinct replication slot name for each.
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startStreaming(PostgresReplicationConnection.java:312)
    at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:163)
    ... 8 more
Caused by: org.postgresql.util.PSQLException: ERROR: replication slot "debezium" is active for PID 19530
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2552)
    at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1211)
    at org.postgresql.core.v3.QueryExecutorImpl.startCopy(QueryExecutorImpl.java:893)
    at org.postgresql.core.v3.replication.V3ReplicationProtocol.initializeReplication(V3ReplicationProtocol.java:60)
    at org.postgresql.core.v3.replication.V3ReplicationProtocol.startLogical(V3ReplicationProtocol.java:44)
    at org.postgresql.replication.fluent.ReplicationStreamBuilder$1.start(ReplicationStreamBuilder.java:38)
    at org.postgresql.replication.fluent.logical.LogicalStreamBuilder.start(LogicalStreamBuilder.java:41)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startPgReplicationStream(PostgresReplicationConnection.java:582)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.createReplicationStream(PostgresReplicationConnection.java:417)
    at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.startStreaming(PostgresReplicationConnection.java:304)
    ... 9 more
[2022-07-20 09:01:32,398] ERROR WorkerSourceTask{id=resource-svc-outbox-connector-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:188)
[2022-07-20 09:01:32,398] INFO Stopping down connector (io.debezium.connector.common.BaseSourceTask:241)
[2022-07-20 09:01:32,399] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 09:01:32,400] INFO Connection gracefully closed (io.debezium.jdbc.JdbcConnection:965)
[2022-07-20 09:01:32,400] INFO [Producer clientId=connector-producer-resource-svc-outbox-connector-0] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer:1189)
[2022-07-20 09:01:32,403] DEBUG [Producer clientId=connector-producer-resource-svc-outbox-connector-0] Kafka producer has been closed (org.apache.kafka.clients.producer.KafkaProducer:1241)

Chris Cranford

unread,
Jul 20, 2022, 11:59:31 PM7/20/22
to debe...@googlegroups.com, Sohaib Omar
Hi Sohaib -

It really depends entirely on a host of reasons why this can happen. 

One common problem is AWS's Gateway Load Balancer as it has an internal timeout setting that when it terminates the connection between the external source and the database, it has a high probability not to properly inform the server & client properly and both sides think the connection still exists although it isn't and gives the appearance of a stuck connector.  Another possibility is if you've set the wal_receiver_timeout or wal_sender_timeout settings too low that this can lead to issues where the work on the WAL sender side takes too long and the connection to the client is dropped but the sender process takes time to successfully terminate.  When Kafka Connect attempts to restart the connector too quickly, the database sees that the slot is still active although the client has disconnected. 

Hope that gives you some ideas.
Chris
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/0b8ea2c8-08b7-4394-95d2-66d991ba2c79n%40googlegroups.com.

Sohaib Omar

unread,
Jul 28, 2022, 4:14:55 AM7/28/22
to Chris Cranford, debe...@googlegroups.com
Hi Chris,
thanks for getting back to me. I looked into the RDS Postgres logs and found the below log

2022-07-20 07:02:16 UTC:172.20.2.92(42840):rdsrepladmin@[unknown]:[1106]:LOG: terminating walsender process due to replication timeout

You were right about wal_receiver_timeout or wal_sender_timeout settings being too low, I have increased the timeout values to be 30-40 seconds and the connector task seems to be working fine now.
Furthermore, I am planning to write some corn job to auto-restart the connecter task when it gets into the state of manually restarting it.

Thanks

Chris Cranford

unread,
Jul 28, 2022, 8:55:47 AM7/28/22
to Sohaib Omar, debe...@googlegroups.com
Hi Sohaib -

Thanks for replying and sharing that information, that's great news to know that the PG log shares this information so going forward we too have another resource to query to determine if that is indeed the problem.  Great detective work & I'm happy it's working.

Chris
Reply all
Reply to author
Forward
0 new messages