Hi Chris,
Many thanks for the insights. ๐ We will continue using Kafka Connect, but a microservice architecture comes with many stores to CDC and outboxes. This translates into serious money with the various managed Kafka Connect offerings. Plus, obviously, the other things mentioned initially regarding engineering experience that are even more important. I have a few additional questions based on what you wrote if you don't mind.
You mention multi-task, this is something we were actually wondering about. Some have started deploying a Debezium Connector per table, instead of per database. While I can see how this might prove useful if the processing step in Debezium is the actual overhead (which can very well be if the built-in transform functionality is being used), it actually does not
seem to offer any performance gains for standard stores like MySQL and PostgreSQL; simply because of their architecture. Evidence collected for this so far by yours truly includes
this discussion where Gunnar stated exactly that for MySQL and the
MySQL as well as
PostgreSQL documentation where it is stated that the
tasks.max property is ignored and forced to
1. Am I deriving correctly from this that this functionality is of no use to us?
I have read
the blog article on automatic topic creation in Debezium Connect (written by my ex trivago colleague Renรฉ ๐) with
auto.create.topics.enable=false but am assuming that it would be maximally trivial to add this particular functionality to Debezium Server (even for us from the outside if need be until your patch hits mainline, simply because it contains
kafka-clients and thus
Admin already in its dependencies).
Rate limiting on the client side is sadly not trustworthy, we thus want to start using quotas to control the client's of our engineers.
Anything else that comes to mind in terms of features? My search-fu was sadly not very helpful so far, that's what led me here. ๐