Kafka-Streams: KStream aggregation idempotence under at least once semantics
254 views
Skip to first unread message
Richard Hundt
unread,
Jun 10, 2017, 3:52:00 AM6/10/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Confluent Platform
Hi *
I've just started getting my feet wet with kafka-streams, so still trying to get my head around it. Got a puzzler now though.
I'm doing the following:
- I have a KTable of customers and a KStream of orders
- I do an inner join of orders onto customers and create a composite object of the two
- then I group the resulting stream by customerId so that I can collect the orders by customer into a list
I'm seeing the same order appear multiple times in the customer.orders list following restarts of my toy application.
So my question is: given that we have at least once semantics, should I expect aggregations on a KStream to produce sane output following a restart of the application?
If not, what's the idiomatic way of getting idempotence back for aggregations?
Thanks!
Richard Hundt
Matthias J. Sax
unread,
Jun 10, 2017, 4:13:27 AM6/10/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to confluent...@googlegroups.com
Hi Richard,
for a clean shutdown and restart you should not see any duplicates. Only
in case of failure, duplicates might occur. For this case, you would
need to de-duplicate within your application.
For your case, if orders do have an unique ID, you can check the list if
the order is already contained (or use a HashMap<OrderID,Order> instead
of a list).
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Confluent Platform
Hi Matthias,
Thanks for a quick response.
On Saturday, June 10, 2017 at 10:13:27 AM UTC+2, Matthias J. Sax wrote:
Hi Richard,
for a clean shutdown and restart you should not see any duplicates. Only
in case of failure, duplicates might occur. For this case, you would
need to de-duplicate within your application.
Okay so no guarantees (meaning for robustness I need to de-duplicate anyway), but I probably shouldn't be seeing this as much as I am.
Probably because I'm doing a stream.cleanUp() without a global reset in between. I'll keep poking at it.
Thanks again.
Eno Thereska
unread,
Jun 14, 2017, 9:42:42 AM6/14/17
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Confluent Platform
Worth pointing that exactly-once support is coming with the upcoming release this month.