Dear All,
I wonder if it exists an article (or another kind of resource) which tackles the issue of developing a stream processing application which is required to buffer all the change events originating from one specific transaction. Only once the application has received all the events of that transaction, it may produce the final aggregate view and publish it to downstream consumers, avoiding the need of exposing intermediary aggregate views.
As you can understand I am referring to the Transaction Markers introduced with Debezium 1.1.0.Beta1.
I think I could use the same strategy of writing a Custom Transformer to buffer all incoming messages in a local state-store (in an aggregating fashion). Once the END transaction event has arrived and the count of messages in the aggregation is equal to the total number of change events originating from this transaction (we have this information in the Transaction Metadata Topic), I could emit the agrregation in a downstream topic. As suggested in the previous article I could save transactions information in a Global-state-store.
However, I cannot figure out how to get the total number of events emitted PER PARTITION. This is a must in a distributed environment in which I have multiple replicas of my kafka-stream application.
Do you have some resources you can share? Any kind of suggestions?
Thank you in advanced,
Matteo
---
Matteo Cimini
Innovation Leader
Quantyca - Data at Core
Corso Milano, 45
20900 Monza