Debezium Engine with Kafka

159 views

Skip to first unread message

Clemens Diebold

unread,

Jan 22, 2021, 4:06:36 AM1/22/21

to debezium

Hey everyone :)

I have some questions regarding the Debezium Engine:

I saw a few blog posts about Debezium Engine in combination with Spring Boot. Do you have further experiences if it works fine?

On a platform like Cloudfoundry, you typically have multiple instances running of a microservice. I know there are solutions e. g. to restrict the execution of a job to one instance. This should also work to run the engine only in one of the instances to avoid problems. However, are there also some built-in capabilities that solve this or is it may possible to parallelize the engine execution over multiple instances?

I get to know Debezium coming from Kafka and would like to use it for the implementation of the outbox pattern. Unfortunately I have no Kafka Connect cluster available so I am searching for different options to run the Debezium Connector. I know the Debezium Engine is not made with the focus of sending events to Kafka but is it still possible? If yes, can I use the same transaction of the KafkaOffsetBackingStore to send my custom events for reaching the quality of the outbox pattern?

Are there other options to run the Debezium Connector without Connect Cluster? I read the Debezium Server is a standalone solution but only recommended for Kinesis and others but not Kafka.

Thanks a lot!

Clemens

Gunnar Morling

unread,

Jan 22, 2021, 4:34:58 AM1/22/21

to debezium

Hi Clemens,

Answers inline.

Best,

--Gunnar

deve...@clemensdiebold.de schrieb am Freitag, 22. Januar 2021 um 10:06:36 UTC+1:

Hey everyone :)

I have some questions regarding the Debezium Engine:

I saw a few blog posts about Debezium Engine in combination with Spring Boot. Do you have further experiences if it works fine?

I personally don't, but perhaps others here do. I would pretty much expect things to "just work (TM)". If you wanted to, you could e.g. set up a bean definition (or whatever the right term in Spring is), so to inject the embedded engine into other beans.

On a platform like Cloudfoundry, you typically have multiple instances running of a microservice. I know there are solutions e. g. to restrict the execution of a job to one instance. This should also work to run the engine only in one of the instances to avoid problems. However, are there also some built-in capabilities that solve this or is it may possible to parallelize the engine execution over multiple instances?

Ah, that's a very interesting point. Debezium doesn't have any built-in primitives for making sure only one instance of a set of of application instances runs the embedded engine. And I'd argue Debezium itself isn't the right place to implement that, as it is independent from any application stacks or platforms. So you should keep using what your stack or platform provides. That's the beauty of using Kafka, you'd this "for free" by means of setting up a consumer group.

In terms of parallelization, you can set up distinct filter configurations, so one connector instance captures tables A and B, and another instance C and D. Just keep in mind you're tailing the log multiple times, so there's a bit more work for the source database.

I get to know Debezium coming from Kafka and would like to use it for the implementation of the outbox pattern. Unfortunately I have no Kafka Connect cluster available so I am searching for different options to run the Debezium Connector. I know the Debezium Engine is not made with the focus of sending events to Kafka but is it still possible?

Yes, you have full flexibility in terms of what you do in the event handler method, so you could also send a message to Kafka.

If yes, can I use the same transaction of the KafkaOffsetBackingStore to send my custom events for reaching the quality of the outbox pattern?

Same transaction as what? Debezium reads from the TX log, there's no regular database TX involved here. Or do you mean transactions when writing to Kafka? If so, that's not something Debezium takes advantage of atm. Guarantees are "at least once" typically, i.e. offsets are committed in intervals, so after an unclean connector restart you may consume a few messages a second time.

Are there other options to run the Debezium Connector without Connect Cluster? I read the Debezium Server is a standalone solution but only recommended for Kinesis and others but not Kafka.

That would have been my next proposal: implementing an outbound adaptor for Debezium Server, akin to the ones for Kinesis, EventHubs etc. If you use MongoDB, keep in mind though you wouldn't be able to benefit from distributing tasks across multiple nodes, as it's the case with Kafka Connect.