Hi! In my organization we've been using Kafka Streams successfully for some time and recently we've decided to introduce some stateful computations on streams. We needed stream-stream joins and simple aggregations on them. According to docs:
https://docs.confluent.io/current/streams/developer-guide.html#required-configuration-parameters it is recommended to change application id with every release, and we did it. Unfortunately it caused some problem for us and I would be very grateful for some advice about dealing with it.
1. Firstly, dynamic application id - we used commit hash added as a postfix - makes monitoring of our app a little bit problematic. For monitoring of consumers lag we use
https://github.com/linkedin/Burrow because we want to check lags by some external tool for more reliability.
2. Dynamic application id changed after new release creates new consumer group with new id and such consumer group will start reading from last position instead of taking on where the previous consumer left. It will cause missing data during deploys or when application crashes or behaves abnormally.
3. Another option is leaving application id as is, without changing it and instead we can change name of internal topic for joins or aggregations by giving different names to window objects. But again, it is not elegant solution as it will create many unnecessary topics and will be very cumbersome to maintain and develop as developer will have to remember about giving these windows proper names.
The main question is: how to cope with such situations using stateful Kafka Streams computations and minimize downsides?