Performance Issues Debezium SQL Server Connector

579 views
Skip to first unread message

Niels Berglund

unread,
Apr 30, 2024, 8:21:44 AM4/30/24
to debe...@googlegroups.com
Hi all!

We have just taken a system into production testing. We have three Kafka Connect nodes that run 9 Debezium SQL Server connectors, each targeting an individual SQL Server. 

One of the connectors targets a very busy database where the table DBZ is reading off, sees 1k - 1.5 k events/sec. This connector cannot keep up - and we are seeing a big lag (up to 15 - 20 minutes) between the event hitting the CDC table, and the event in the Kafka topic. Due to lag, we also see dropped events.

So, I wonder what "knobs" we can turn to make the performance better? We started with a config:

"max.iteration.transactions": "500",
"max.poll.records": "50"

was then changed to:

"max.iteration.transactions": "5000",
"max.poll.records": "500"

The changes had no impact. I believe we are running with a 5s polling interval, as we need as low latency as possible.

Grateful for any ideas/suggestions!

Niels

Chris Cranford

unread,
Apr 30, 2024, 8:45:49 AM4/30/24
to debe...@googlegroups.com
Hi Niels -

Have you checked the QueueRemainingCapacity to see if its reaching 0? If so, you may want to increase the queue/batch size accordingly.  Additionally, you could reduce poll.interval.ms to increase the poll frequency.

Chris
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/CAL0uK6BjGrpsmCompfguWdX9RZRWp3_wqkVeMoQYqt0dH6HAFg%40mail.gmail.com.

Niels Berglund

unread,
May 1, 2024, 9:50:07 AM5/1/24
to debe...@googlegroups.com
Hi Chris!

I made some changes to batch and queue size, but no luck.

Quick question: We deploy on one node when deploying a DBZ configuration to a Kafka Connect cluster. Will that particular configuration run on only that node, or is it distributed over multiple nodes and running on multiple nodes? Theoretically, if it runs on multiple nodes, would adding mode nodes help?

Thanks!

Niels 

Chris Cranford

unread,
May 1, 2024, 2:08:31 PM5/1/24
to debe...@googlegroups.com
Hi Niels -

If `database.names` has multiple values, then yes the connector will attempt to spread the work for each tenant across multiple nodes.  If you only have a single value in `database.names` then no, it will perform all the task work on a single worker node.

Chris

Niels Berglund

unread,
May 1, 2024, 10:46:14 PM5/1/24
to debe...@googlegroups.com
Thanks Chris!

So, if 'database.names' has different values that would indicate different databases, yes? So if I have one DBZ config pointing to one database and one table, there is not way the load can be distributed across nodes?

Thanks again!

Niels


Chris Cranford

unread,
May 2, 2024, 9:17:56 AM5/2/24
to debe...@googlegroups.com
Hi Niels -

From a distribution perspective, that's correct.  If you are streaming from a single table in a single tenant on SQL Server, then that config will only ever take advantage of a single task.  If you had multiple tenants configured, that's a completely different story as the configuration can be distributed across multiple tasks with Debezium for SQL Server 2.x+.

Chris

Niels Berglund

unread,
May 5, 2024, 11:34:17 PM5/5/24
to debe...@googlegroups.com
Thanks for that Chris.

Having digged a bit further into the issue we have using the Debezium SQL Server connector:
  • We have highly transactional databases where we want to expose certain tables as streams.
  • The tables and databases are CDC enabled.
  • We have Debezium connectors running in a 3 node Kafka Connect cluster sitting in the same vLan as the Kafka brokers.
  • The individual DBX connectors targets one database/one table each.
  • We are seeing an issue for one of the connectors which targets a table with high volumes (1,000 - 1,500 events/sec).
  • CDC keeps up, and DBZ keeps up with reading from the table.
  • However, publishing to the Kafka topic does not keep up when we hit ~1,250 events/sec. At that time publishing starts to lag (we have apps who publishes the same number of events to other topics without an issue), and eventually we see a lag up to 30 minutes and also a drop of events.
So, that's the problem. My assumption would be that DBZ would easily handle the volumes mentioned, so we must be doing somethig wrong. Oh, the DBZ version is 1.97.

Anyone out there experiencing anything similar.

Thanks!

Niels


Chris Cranford

unread,
May 6, 2024, 10:09:46 AM5/6/24
to debe...@googlegroups.com
Hi Niels -

Have you checked the QueueRemainingCapacity metric and verified it isn't reaching 0? If it is, then this tells us that the reading from SQL Server is being blocked because the delivery between Kafka Connect and the broker may need to be tuned to support higher throughput.  Perhaps there are inadequate partitions, etc. If that metric isn't reaching 0 during these peak periods, then we need to focus on the read side of the connector as it's between Debezium and SQL Server.

Thanks,
Chris
Reply all
Reply to author
Forward
0 new messages