Why is Kafka Connect recommended for Debezium over Debezium Server?

Richard Fussenegger

unread,

Nov 3, 2021, 5:07:08 AM11/3/21

to debezium

Hello 😊

The documentation page for Debezium Server states the following:

“For streaming change events to Apache Kafka, it is recommended to deploy the Debezium connectors via Kafka Connect.”

However, no reasoning or explanation is given why.

The background why I am asking is that we are thinking about replacing our Debezium Connectors with Debezium Server to improve engineering experience. The dependency on the Kafka Connect cluster means that they cannot run and test their Debezium locally without setting up a Kafka Connect cluster. While this task can be simplified with some tooling, it also puts serious load on their machines (on top of the already running Kafka cluster, and what not).

Considering that Debezium Server is a standard, stand-alone Java application it sounds to us as if all these issues are gone. It would also allow our engineers to simply deploy Debezium Server as a standard Kubernetes deployment, controlling fail-over and what not with all the concepts they are super familiar with. In fact, it even allows for fancy setups with stand-by instances with a fast fail-over.

Is there anything I'm missing here that makes Debezium Connect better than Debezium Server?

Chris Cranford

unread,

Nov 3, 2021, 11:25:11 AM11/3/21

to debe...@googlegroups.com, Richard Fussenegger

Hi Richard -

So Debezium Server's initial goal was to enable Debezium connectors to send change events to a growing number of other messaging platforms like Kinesis, PubSub, EventHubs, etc. Of course Debezium Server has grown over the last 18 months and even supports a Kafka sink as well. Whether you wish to use Debezium Server over Kafka Connect really depends entirely on what your needs are and what features your pipeline requires.

While you can use Debezium Server and the Kafka sink as a replacement for Kafka Connect, the feature set is only a subset of what Kafka Connect offers. We recommend you examine your organization needs and if you rely on Apache Kafka and want to take advantage of all the enterprise-like features of Kafka Connect, you should continue to rely on that stack.

In terms of feature differences, a few come to mind. First, Debezium Server doesn't offer multi-task support. We want to look at how that would be imagined at some point in our implementation (perhaps multiple instances on K8 orchestrated by an operator), but its definitely something to consider if you need multi-task support for your pipelines. Secondly, Debezium Server does not support advanced features like topic creation settings or rate limiting (KC discussed adding this but we're not sure yet if this made it into KC proper just yet). There are likely other small nuances that may or may not be important to your pipeline that you should strongly consider before making the final decision.

If you don't see any impacting differences, you're welcomed to use Debezium Server. We simply want to make sure that you are informed about the architectural and nuanced differences between Kafka Connect and Debezium Server so that you don't think it's a complete 1-to-1 drop-in replacement with the same full feature set.

Now with all that said, we are actively working on Debezium Server. We want to build out and expand its feature set over time, so if you have any feedback on how we can improve it or if you identify any key features that are missing that would a nice addition, please let us know.

Thanks,
CC

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/ef40fe17-02b4-4af2-bf19-52fd2f592821n%40googlegroups.com.

Richard Fussenegger

unread,

Nov 3, 2021, 4:36:29 PM11/3/21

to debezium

Hi Chris,

Many thanks for the insights. 😊 We will continue using Kafka Connect, but a microservice architecture comes with many stores to CDC and outboxes. This translates into serious money with the various managed Kafka Connect offerings. Plus, obviously, the other things mentioned initially regarding engineering experience that are even more important. I have a few additional questions based on what you wrote if you don't mind.

You mention multi-task, this is something we were actually wondering about. Some have started deploying a Debezium Connector per table, instead of per database. While I can see how this might prove useful if the processing step in Debezium is the actual overhead (which can very well be if the built-in transform functionality is being used), it actually does not seem to offer any performance gains for standard stores like MySQL and PostgreSQL; simply because of their architecture. Evidence collected for this so far by yours truly includes this discussion where Gunnar stated exactly that for MySQL and the MySQL as well as PostgreSQL documentation where it is stated that the tasks.max property is ignored and forced to 1. Am I deriving correctly from this that this functionality is of no use to us?

I have read the blog article on automatic topic creation in Debezium Connect (written by my ex trivago colleague René 👋) with auto.create.topics.enable=false but am assuming that it would be maximally trivial to add this particular functionality to Debezium Server (even for us from the outside if need be until your patch hits mainline, simply because it contains kafka-clients and thus Admin already in its dependencies).

Rate limiting on the client side is sadly not trustworthy, we thus want to start using quotas to control the client's of our engineers.

Anything else that comes to mind in terms of features? My search-fu was sadly not very helpful so far, that's what led me here. 😜

Gunnar Morling

unread,

Nov 4, 2021, 9:05:16 AM11/4/21

to debezium

Hey Richard,

richard.f...@gmail.com schrieb am Mittwoch, 3. November 2021 um 21:36:29 UTC+1:

Hi Chris,

Many thanks for the insights. 😊 We will continue using Kafka Connect, but a microservice architecture comes with many stores to CDC and outboxes. This translates into serious money with the various managed Kafka Connect offerings. Plus, obviously, the other things mentioned initially regarding engineering experience that are even more important. I have a few additional questions based on what you wrote if you don't mind.

You mention multi-task, this is something we were actually wondering about. Some have started deploying a Debezium Connector per table, instead of per database. While I can see how this might prove useful if the processing step in Debezium is the actual overhead (which can very well be if the built-in transform functionality is being used), it actually does not seem to offer any performance gains for standard stores like MySQL and PostgreSQL;

In fact, having multiple connectors for different tables of one database may even create more overhead, as it means the log needs to be read/streamed multiple times.

simply because of their architecture. Evidence collected for this so far by yours truly includes this discussion where Gunnar stated exactly that for MySQL and the MySQL as well as PostgreSQL documentation where it is stated that the tasks.max property is ignored and forced to 1. Am I deriving correctly from this that this functionality is of no use to us?

Yes, the Postgres and MySQL connectors currently won't use more than one task. I.e. this won't make a difference in comparison to Debezium Server. This may change though in the future. The MongoDB connector already supports multiple tasks, and currently is work happening in the community to do the same for the SQL Server connector. We'll have to evaluate how to make this work with Debezium Server. Our current thinking is to have an external coordinator (say a K8s operator) which distributes these tasks between multiple Debezium Server instances.

I have read the blog article on automatic topic creation in Debezium Connect (written by my ex trivago colleague René 👋) with auto.create.topics.enable=false but am assuming that it would be maximally trivial to add this particular functionality to Debezium Server (even for us from the outside if need be until your patch hits mainline, simply because it contains kafka-clients and thus Admin already in its dependencies).

Yes, this shouldn't be too hard. Contributions always welcome!

Rate limiting on the client side is sadly not trustworthy, we thus want to start using quotas to control the client's of our engineers.

Anything else that comes to mind in terms of features? My search-fu was sadly not very helpful so far, that's what led me here. 😜

I think that's pretty much it. I would suggest to simply give it a try if Debezium Server better fits your workflow and report back if you run into any issues. In comparison to the other Debezium Server sinks, the Kafka one is relatively new (simply because the demand for having it was lower then for the others), so there may be some rough edges here and there.

Hth,

--Gunnar

Reply all

Reply to author

Forward

Message has been deleted