Kafka Streams with varying application id and stateful computations

Rafał Nowak

unread,

Sep 19, 2017, 5:18:37 AM9/19/17

to Confluent Platform

Hi! In my organization we've been using Kafka Streams successfully for some time and recently we've decided to introduce some stateful computations on streams. We needed stream-stream joins and simple aggregations on them. According to docs: https://docs.confluent.io/current/streams/developer-guide.html#required-configuration-parameters it is recommended to change application id with every release, and we did it. Unfortunately it caused some problem for us and I would be very grateful for some advice about dealing with it.

1. Firstly, dynamic application id - we used commit hash added as a postfix - makes monitoring of our app a little bit problematic. For monitoring of consumers lag we use https://github.com/linkedin/Burrow because we want to check lags by some external tool for more reliability.

2. Dynamic application id changed after new release creates new consumer group with new id and such consumer group will start reading from last position instead of taking on where the previous consumer left. It will cause missing data during deploys or when application crashes or behaves abnormally.

3. Another option is leaving application id as is, without changing it and instead we can change name of internal topic for joins or aggregations by giving different names to window objects. But again, it is not elegant solution as it will create many unnecessary topics and will be very cumbersome to maintain and develop as developer will have to remember about giving these windows proper names.

The main question is: how to cope with such situations using stateful Kafka Streams computations and minimize downsides?

Branden Makana

unread,

Sep 19, 2017, 2:08:41 PM9/19/17

to Confluent Platform

Hi Rafal,

In my experience having a dynamic application ID like that will cause only problems (as you've found). We used to use a dynamic ID as well (appName + version) but in the end went back to just a static one.

Rafał Nowak

unread,

Sep 19, 2017, 4:14:19 PM9/19/17

to Confluent Platform

Thanks for your reply!

I am wondering how did you deal with internal topics on Kafka used for joins and maintaining state? If you are using constant application id these topics will have the same name between application versions.

So it would maintain state between versions and as a result new version can crash, let's say when our data format will change and different data will be stored in internal topics.

Damian Guy

unread,

Sep 20, 2017, 5:47:39 AM9/20/17

to Confluent Platform

Hi Rafal,

One options is to use the reset tool: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Application+Reset+Tool

This will reset the state of all internal topics and the committed offsets for the group. So next time your application starts it will consume from the beginning.

Thanks,

Damian

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/350fe69e-7b91-468b-af7f-c4c4472df158%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward