Help with performance - Debezium with SQL Server

235 views
Skip to first unread message

Bhaski

unread,
Mar 7, 2024, 2:02:58 PM3/7/24
to debezium
Hi,

I am new to Debezium and doing a POC right now. I am using an old version (1.9) and plan to upgrade soon.

I am running Debezium with SQL Server on AWS MSK. I have around 140 tables that I need to replicate and currently, I have 1 connector running around 20 tables. I decided to add the remaining 120 tables and I see that the performance is very slow. All tables included, the size is less than 40 GB and the tables I added are less than 10 GB combined. Due to limitations with Glue, I am currently using only JSON with schema turned on but given the size of my dataset, I thought it shouldn't be a huge problem.

Metrics:
1. I see the processing is mostly about 1 table at a time. Max, I see about 3-4 topics in parallel.
2. The throughput however appears to be very low. I am seeing on average less than 4 MB per second received from SQL server.
3. Incoming messages per second is on average 220.

I am sure I am missing things here that I should consider for better performance. I appreciate any feedback. Thanks!

Regards,
Bhaski

Oren Elias

unread,
Mar 8, 2024, 4:04:42 PM3/8/24
to debe...@googlegroups.com
Are you referring to Initial sync or change streaming?
If you turn off the JSON schema (for comparison), does it make a big difference? The size of the Schema can reach a few K per message.


--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/dd4c3702-d89a-4af6-a423-dbe5df136d7bn%40googlegroups.com.

Bhaski

unread,
Mar 9, 2024, 8:20:15 PM3/9/24
to debezium
Hi Oren,

Thanks for your reply. Initial sync was done for the 20 tables. Then I tried adding the remaining tables using the signal tables. To give an update, it was very slow when I provided the signal using incremental mode. Then I stopped and gave the signal to use blocking mode. That made it much better. To answer your question, yes I intend to remove the schema and see the results. For now, I just want to wrap this up first.

Still have questions on few things:
1. What are the general parameters to use to control transfer rates?
2. I believe "snapshot.max.threads" controls how many threads can be processed at a given time during initial snapshot. Does that setting work when I use the signal tables as well because technically this is not an initial snapshot?

Thanks!
Bhaski

Chris Cranford

unread,
Mar 10, 2024, 9:50:32 PM3/10/24
to debe...@googlegroups.com
Hi Bhaski -

The "snapshot.max.threads" is only used during the initial snapshot and if you use the ad-hoc blocking snapshot feature.  The ad-hoc incremental snapshot works very differently and does not take advantage of the "snapshot.max.threads" configuration option.

Thanks,
Chris

Mario Fiore Vitale

unread,
Mar 11, 2024, 4:19:18 AM3/11/24
to debezium
Hi, 

regarding the incremental snapshot, you can try also to fine tune the incremental-snapshot-chunk-size[1] 

Bhaski

unread,
Mar 11, 2024, 12:56:11 PM3/11/24
to debezium
thanks all! appreciate it!
Reply all
Reply to author
Forward
0 new messages