Producing Transaction Aggregations with Debezium Transaction Markers

157 views
Skip to first unread message

Matteo Cimini

unread,
Nov 10, 2021, 6:26:54 AM11/10/21
to debe...@googlegroups.com
Dear All,
I have recently found very interesting the article named "Building Audit Logs with Change Data Capture and Stream Processing".

I wonder if it exists an article (or another kind of resource) which tackles the issue of developing a stream processing application which is required to buffer all the change events originating from one specific transaction. Only once the application has received all the events of that transaction, it may produce the final aggregate view and publish it to downstream consumers, avoiding the need of exposing intermediary aggregate views
As you can understand I am referring to the Transaction Markers introduced with Debezium 1.1.0.Beta1. 

I think I could use the same strategy of writing a Custom Transformer to buffer all incoming messages in a local state-store (in an aggregating fashion). Once the END transaction event has arrived and the count of messages in the aggregation is equal to the total number of change events originating from this transaction (we have this information in the Transaction Metadata Topic), I could emit the agrregation in a downstream topic. As suggested in the previous article I could save transactions information in a Global-state-store. 
However, I cannot figure out how to get the total number of events emitted PER PARTITION. This is a must in a distributed environment in which I have multiple replicas of my kafka-stream application. 

Do you have some resources you can share? Any kind of suggestions?

Thank you in advanced,
Matteo
---

Matteo Cimini
Innovation Leader
Quantyca - Data at Core


Quantyca s.r.l.
Corso Milano, 45
20900 Monza

Gunnar Morling

unread,
Nov 10, 2021, 4:21:25 PM11/10/21
to debezium
Hey Matteo,

Thanks for bringing this up; It's a very valid question, and I think we lack an answer atm. Can you log a Jira issue for capturing this need? We'd probably have to emit the number of events per key in the END event, as at the connector level itself, we don't know anything about partitions. 

--Gunnar
Reply all
Reply to author
Forward
Message has been deleted
0 new messages