Hi Daniel, we haven't used Debezium Server in particular, but I can offer some guidance on Debezium based on Kafka Connect, which should generally be equivalent.
Performance is going to be highly-dependent on many factors, such as, but not limited to:
- Event size
- Intra vs inter-region network traversal
- How many SMTs you're using
- Serialization format
Generally what we've seen is performance being serialization-bound where it's bottlenecked on CPU. The serialization here is taking the raw bytes from the usptream datastore and performing semantic/literal conversions to the Debezium format with Kafka Connect types.
Given this, you can only scale up a single instance so much as CPU clock speed for a single core will be the limiting factor for a task.
Other than that, there are 2 options:
- Tune various Kafka Connect producer/consumer configs such as linger.ms and others; and tune your specific Debezium connector configs such as queue or batch size.
- Trade-off: These go hand-in-hand with amount of system resources you would have and is use-case dependent and you may end up tuning over and over again.
- Depending on your connector, you may be able to bypass the Debezium serialization/deserialization and only pass through the raw bytes; no serialization hit. Then, downstream of Debezium, you would use some stream processing framework like Flink that can be used for massively-parallelizable transformations for the SMTs or enveloping.
- Trade-off: You lose out on the nice declarative SMT features of Debezium and a unified envelope, which makes interoperability with other systems harder. There's also more things to build and maintain, but you'll no longer be serialization-bound for a single task.