Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Debezium server performance

523 views
Skip to first unread message

Daniel Cardinha

unread,
Jul 4, 2022, 3:26:37 AM7/4/22
to debezium
Hi all,

I am evaluating the possible usage of debezium server but cannot find a way to scale it up, is that possible?

If that is not possible, is there anywhere documentation about the maximum performance we can get with one instance of debezium server?

Thank you

Daniel

Jailton Silva

unread,
Jul 4, 2022, 9:07:02 AM7/4/22
to debe...@googlegroups.com

Hi Daniel,
I share with you two options;

Kafka/Debezium  in the cloud
Kafka/Debezium  in Openshift - container


--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/dd23c710-c014-44a3-aead-85165ad2dca8n%40googlegroups.com.

Jeff Chao

unread,
Jul 4, 2022, 2:22:57 PM7/4/22
to debezium
Hi Daniel, we haven't used Debezium Server in particular, but I can offer some guidance on Debezium based on Kafka Connect, which should generally be equivalent.

Performance is going to be highly-dependent on many factors, such as, but not limited to:
  1. Event size
  2. Intra vs inter-region network traversal
  3. How many SMTs you're using
  4. Serialization format
Generally what we've seen is performance being serialization-bound where it's bottlenecked on CPU. The serialization here is taking the raw bytes from the usptream datastore and performing semantic/literal conversions to the Debezium format with Kafka Connect types.

Given this, you can only scale up a single instance so much as CPU clock speed for a single core will be the limiting factor for a task.

Other than that, there are 2 options:
  1. Tune various Kafka Connect producer/consumer configs such as linger.ms and others; and tune your specific Debezium connector configs such as queue or batch size.
    1. Trade-off: These go hand-in-hand with amount of system resources you would have and is use-case dependent and you may end up tuning over and over again.
  2. Depending on your connector, you may be able to bypass the Debezium serialization/deserialization and only pass through the raw bytes; no serialization hit. Then, downstream of Debezium, you would use some stream processing framework like Flink that can be used for massively-parallelizable transformations for the SMTs or enveloping.
    1. Trade-off: You lose out on the nice declarative SMT features of Debezium and a unified envelope, which makes interoperability with other systems harder. There's also more things to build and maintain, but you'll no longer be serialization-bound for a single task.

Chris Cranford

unread,
Jul 5, 2022, 9:27:40 AM7/5/22
to debe...@googlegroups.com, Daniel Cardinha
Hi Daniel -

What Jeff provided below is a wonderful write-up but I want to add to this from the perspective of Debezium Server.

It's worth keeping in mind that Debezium's relational connectors are inherently single-threaded (i.e. 1 task in Kafka Connect speak).  This means that from a Debezium Server perspective, if you want to parallelize the workload, you have to look at horizontal scaling by starting multiple server instances streaming changes from a subset of tables, much like you would with multiple connector deployments in a Kafka Connect world with similar configurations.  But keep in mind, while you can parallelize the workload in this horizontal way, it's equally important to think about the load this may put on the source database.  Multiple connectors have different impacts on the source database depending on the database platform you're streaming changes from.

Hope that helps.
Chris
Reply all
Reply to author
Forward
0 new messages