Help with performance - Debezium with SQL Server

Bhaski

unread,

Mar 7, 2024, 2:02:58 PM3/7/24

to debezium

Hi,

I am new to Debezium and doing a POC right now. I am using an old version (1.9) and plan to upgrade soon.

I am running Debezium with SQL Server on AWS MSK. I have around 140 tables that I need to replicate and currently, I have 1 connector running around 20 tables. I decided to add the remaining 120 tables and I see that the performance is very slow. All tables included, the size is less than 40 GB and the tables I added are less than 10 GB combined. Due to limitations with Glue, I am currently using only JSON with schema turned on but given the size of my dataset, I thought it shouldn't be a huge problem.

Metrics:

1. I see the processing is mostly about 1 table at a time. Max, I see about 3-4 topics in parallel.

2. The throughput however appears to be very low. I am seeing on average less than 4 MB per second received from SQL server.

3. Incoming messages per second is on average 220.

I am sure I am missing things here that I should consider for better performance. I appreciate any feedback. Thanks!

Regards,

Bhaski

Oren Elias

unread,

Mar 8, 2024, 4:04:42 PM3/8/24

to debe...@googlegroups.com

Are you referring to Initial sync or change streaming?

If you turn off the JSON schema (for comparison), does it make a big difference? The size of the Schema can reach a few K per message.

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/dd4c3702-d89a-4af6-a423-dbe5df136d7bn%40googlegroups.com.

Bhaski

unread,

Mar 9, 2024, 8:20:15 PM3/9/24

to debezium

Hi Oren,

Thanks for your reply. Initial sync was done for the 20 tables. Then I tried adding the remaining tables using the signal tables. To give an update, it was very slow when I provided the signal using incremental mode. Then I stopped and gave the signal to use blocking mode. That made it much better. To answer your question, yes I intend to remove the schema and see the results. For now, I just want to wrap this up first.

Still have questions on few things:

1. What are the general parameters to use to control transfer rates?

2. I believe "snapshot.max.threads" controls how many threads can be processed at a given time during initial snapshot. Does that setting work when I use the signal tables as well because technically this is not an initial snapshot?

Thanks!

Bhaski

Chris Cranford

unread,

Mar 10, 2024, 9:50:32 PM3/10/24

to debe...@googlegroups.com

Hi Bhaski -

The "snapshot.max.threads" is only used during the initial snapshot and if you use the ad-hoc blocking snapshot feature. The ad-hoc incremental snapshot works very differently and does not take advantage of the "snapshot.max.threads" configuration option.

Thanks,
Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/7cb385fe-1963-4123-8251-17af63838aabn%40googlegroups.com.

Mario Fiore Vitale

unread,

Mar 11, 2024, 4:19:18 AM3/11/24

to debezium

Hi,

regarding the incremental snapshot, you can try also to fine tune the incremental-snapshot-chunk-size[1]

[1]https://debezium.io/documentation/reference/2.5/connectors/sqlserver.html#sqlserver-property-incremental-snapshot-chunk-size

Bhaski

unread,

Mar 11, 2024, 12:56:11 PM3/11/24

to debezium

thanks all! appreciate it!

Reply all

Reply to author

Forward