SQL Server: Ghost data download by connectors

123 views
Skip to first unread message

linga...@gmail.com

unread,
Sep 30, 2021, 4:28:59 AM9/30/21
to debezium
Hi Folks,

We have debezium running in production. Our main use case is to replicate data on Snowflake warehouse.

In order to reduce the load on the source database we have set up the connectors in such a way that the polling interval is 5 seconds when the connector is registered. Once the initial snapshot is completed we update the connector configuration and increase the polling interval to 60 seconds. This way the initial snapshot doesn't take too long and we reduce the load on the database once the snapshot is completed.

We didn't notice any issues at first. Then we saw that for some of the tables(very few) the topic count was twice as much as the initial snapshot rowcount. At least for a few tables the snapshot was re-executed. This was not a big deal as the data replication job at the destination is idempotent and having duplicate events does not affect the final data. 

The main issue we are facing now is that when the connect worker restarts a lot of data seems to be getting pulled from the source database. There is no evidence this happening in the logs, however the connect worker container's NET I/O increases at a steady rate until(30-40 minutes) it reaches the max heap size(16GB) then the connect worker becomes unresponsive. We restart the connect worker and the same process repeats.

Using tcptrack and nethogs I was able to confirm that connect worker is pulling data from our source database at ~15MB/s. This is happening even when the capture job on the source database is stopped, that means no new change events are processed from the transaction log. Only the debezium source connectors connect to our production database so no other service could be responsible for this traffic. 

We are at a loss for what to do next as the reliability of our entire pipeline depends on debezium and kafka connect. I would highly appreciate any advise on finding a root cause for this issue. Please let me if there any particular logs/metrics I can gather to further troubleshoot this.

Regards,
Shiva Lingala

Gunnar Morling

unread,
Sep 30, 2021, 7:53:57 AM9/30/21
to debezium
Hi,

Which connector is this about? Have you tried to increase the log level of the Debezium connector (see https://debezium.io/documentation/reference/1.6/operations/logging.html) so to see what it acually is doing? JMX monitoring may also give you insight about the current state, in particular whether it still is taking a snapshot (in which case it will restart with the snapshot if it's not completed when stopping the connector), or whether it's streaming from the log.

--Gunnar

Shiva

unread,
Sep 30, 2021, 3:41:09 PM9/30/21
to debe...@googlegroups.com
Hi Gunnar,

Thanks for your response. We are using SQL Server connectors. I increased the log level to 'TRACE' but there was nothing in the logs that could explain the NET I/O figures on the kafka-connect container. This is what the container stats look like when kafka-connect is restarted. 

CONTAINER ID   NAME         CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O   PIDS
8636a7d8b852   connect-01   0.56%     1.961GiB / 62.82GiB   3.12%     21.7MB / 1.13MB   0B / 41kB   290


In 30-40 minutes the NET I/O on the container increases to ~17 GB (if max heap size is 16GB) or ~8.5GB (if max heap size is 8GB). At this point the kafka-connect REST API is no longer responsive and only then we start seeing some exceptions in the logs. When this happens the CPU usage across all 16 cores on the machine reaches 90-100%.

CONTAINER ID   NAME         CPU %      MEM USAGE / LIMIT     MEM %     NET I/O          BLOCK I/O   PIDS
8636a7d8b852   connect-01   1252.65%   17.35GiB / 62.82GiB   27.62%    17.1GB / 140MB   0B / 41kB   296


Meanwhile the logs look like this.

[2021-09-30 19:17:11,165] INFO WorkerSourceTask{id=Source_SourceDB_PG_0-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,165] INFO WorkerSourceTask{id=Source_SourceDB_PG_5-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,165] INFO WorkerSourceTask{id=Source_SourceDB_PG_30-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,169] INFO WorkerSourceTask{id=Source_SourceDB_PG_6-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,175] INFO WorkerSourceTask{id=Source_SourceDB_PG_1-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,180] INFO WorkerSourceTask{id=Source_SourceDB_PG_4-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,191] INFO WorkerSourceTask{id=Source_SourceDB_PG_2-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,191] INFO WorkerSourceTask{id=Source_SourceDB_PG_3-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,204] INFO WorkerSourceTask{id=Source_SourceDB_PG_10-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,214] INFO WorkerSourceTask{id=Source_SourceDB_PG_29-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,266] INFO WorkerSourceTask{id=Source_SourceDB_PG_16-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,266] INFO WorkerSourceTask{id=Source_SourceDB_PG_11-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,287] INFO WorkerSourceTask{id=Source_SourceDB_PG_12-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,287] INFO WorkerSourceTask{id=Source_SourceDB_PG_14-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,287] INFO WorkerSourceTask{id=Source_SourceDB_PG_13-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,290] INFO WorkerSourceTask{id=Source_SourceDB_PG_15-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,315] INFO WorkerSourceTask{id=Source_SourceDB_PG_17-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,426] INFO WorkerSourceTask{id=Source_SourceDB_PG_26-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,433] INFO WorkerSourceTask{id=Source_SourceDB_PG_8-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,442] INFO WorkerSourceTask{id=Source_SourceDB_PG_27-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,447] INFO WorkerSourceTask{id=Source_SourceDB_PG_23-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,447] INFO WorkerSourceTask{id=Source_SourceDB_PG_22-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,450] INFO WorkerSourceTask{id=Source_SourceDB_PG_20-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,458] INFO WorkerSourceTask{id=Source_SourceDB_PG_9-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,471] INFO WorkerSourceTask{id=Source_SourceDB_PG_25-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,473] INFO WorkerSourceTask{id=Source_SourceDB_PG_21-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,500] INFO WorkerSourceTask{id=Source_SourceDB_PG_19-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:11,500] INFO WorkerSourceTask{id=Source_SourceDB_PG_24-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2021-09-30 19:17:15,958] TRACE Max transaction lsn is 00001f5c:0000dde0:0003 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:15,959] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,481] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,481] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,498] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,498] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,515] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,515] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,530] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,530] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,531] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,531] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,543] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,543] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,543] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,543] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,547] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,547] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,547] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,547] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,547] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,547] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,559] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,559] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,564] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,564] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,564] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,564] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,565] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,565] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,565] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,565] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,565] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,565] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,567] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,567] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,591] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,591] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,595] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,595] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,596] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,596] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,596] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,596] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,596] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,596] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,596] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,596] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,596] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,596] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,596] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,596] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)
[2021-09-30 19:17:18,912] TRACE Max transaction lsn is 000d8dcf:000216c0:0045 (io.debezium.connector.sqlserver.SqlServerConnection)
[2021-09-30 19:17:18,912] DEBUG No change in the database (io.debezium.connector.sqlserver.SqlServerStreamingChangeEventSource)

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | PRDB02_SourceDB-dbhistory"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "kafka-producer-network-thread | PRDB02_SourceDB-dbhistory"


I do not have JMX monitoring setup, I will do that now and report back.

Regards,
Shiva Lingala

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/20785fe9-fe67-4b41-9323-da6910cee99dn%40googlegroups.com.

jiri.p...@gmail.com

unread,
Oct 8, 2021, 1:43:27 AM10/8/21
to debezium
Hi,

what version are you on? This sounds like you hit the issue https://issues.redhat.com/browse/DBZ-3486

You can solve it via `max.iteration.transactions` config option (docs is missing, I'm adding a Jira for that).

J.

Shiva

unread,
Oct 8, 2021, 2:15:56 AM10/8/21
to debe...@googlegroups.com
Hi Jiri,

That's great news and it does explain what's happening in our case. To reduce load on the source database we pause the capture job during business hours and resume it afterwards. Naturally, there are a lot of pending cdc events coming in at once when the capture job starts processing transaction log.

I think it would help a lot if the connectors log these actions(like fetching events, buffering, writing to topic) when using log level DEBUG or TRACE.

What would you suggest is a good default for 'max.iteration.transactions' config?

Thank you!
-Shiva

Shiva

unread,
Oct 11, 2021, 8:24:40 AM10/11/21
to debe...@googlegroups.com
Hi Jiri,

We are on version 1.6.0.Final. Going by the information included in https://issues.redhat.com/browse/DBZ-3486 it appears 1.6.0 should have the fix for this issue. 

I increased the max.queue.size to 32764 and that have helped. The connect-worker hasn't crashed in two weeks, although now it seems the queue size may not be related to this issue at all. I would appreciate some more information on setting 'max.iteration.transactions'.

Regards,
-Shiva
Reply all
Reply to author
Forward
0 new messages