Kafka-Streams: KStream aggregation idempotence under at least once semantics

254 views
Skip to first unread message

Richard Hundt

unread,
Jun 10, 2017, 3:52:00 AM6/10/17
to Confluent Platform
Hi *

I've just started getting my feet wet with kafka-streams, so still trying to get my head around it. Got a puzzler now though.

I'm doing the following:

- I have a KTable of customers and a KStream of orders
- I do an inner join of orders onto customers and create a composite object of the two
- then I group the resulting stream by customerId so that I can collect the orders by customer into a list

I'm seeing the same order appear multiple times in the customer.orders list following restarts of my toy application.

So my question is: given that we have at least once semantics, should I expect aggregations on a KStream to produce sane output following a restart of the application?
If not, what's the idiomatic way of getting idempotence back for aggregations?

Thanks!
Richard Hundt

Matthias J. Sax

unread,
Jun 10, 2017, 4:13:27 AM6/10/17
to confluent...@googlegroups.com
Hi Richard,

for a clean shutdown and restart you should not see any duplicates. Only
in case of failure, duplicates might occur. For this case, you would
need to de-duplicate within your application.

For your case, if orders do have an unique ID, you can check the list if
the order is already contained (or use a HashMap<OrderID,Order> instead
of a list).

Hope this helps.

-Matthias
> --
> You received this message because you are subscribed to the Google
> Groups "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to confluent-platf...@googlegroups.com
> <mailto:confluent-platf...@googlegroups.com>.
> To post to this group, send email to confluent...@googlegroups.com
> <mailto:confluent...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/confluent-platform/7a0e54ff-0368-4ba7-bad5-6b1990cfa99f%40googlegroups.com
> <https://groups.google.com/d/msgid/confluent-platform/7a0e54ff-0368-4ba7-bad5-6b1990cfa99f%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

signature.asc

Richard Hundt

unread,
Jun 10, 2017, 4:42:33 AM6/10/17
to Confluent Platform
Hi Matthias,

Thanks for a quick response.

On Saturday, June 10, 2017 at 10:13:27 AM UTC+2, Matthias J. Sax wrote:
Hi Richard,

for a clean shutdown and restart you should not see any duplicates. Only
in case of failure, duplicates might occur. For this case, you would
need to de-duplicate within your application.

Okay so no guarantees (meaning for robustness I need to de-duplicate anyway), but I probably shouldn't be seeing this as much as I am.

Probably because I'm doing a stream.cleanUp() without a global reset in between. I'll keep poking at it.
Thanks again.

Eno Thereska

unread,
Jun 14, 2017, 9:42:42 AM6/14/17
to Confluent Platform
Worth pointing that exactly-once support is coming with the upcoming release this month.

Eno
Reply all
Reply to author
Forward
0 new messages