Thanks for the input.
I decreased the retention time to 1 day which decreased the disk usage as expected. I had taken the app down for more a bit more than 48 hours and it caught up with all the queued messages in only a few minutes. Ultimately the app should never go down more than a few minute (as of now it is more of a PoC). Same with the repartition topic whose lag went to 0 in a few minutes.
I can definitely compact the output topic (it goes into a connect JDBC sink for now; thinking about PoCing interactive queries).
In case you can come up with a better stream design, here is a short description of what the app does.
Basically the app builds time buckets (like "year 2017", or "year 2017 month 1", or again "year 2017 month 1 day 20") and then counts the number of events that ended up in these buckets. A simplified schematic version of it (kind of building a MultiMap and counting the elements for each entry, with a value counting in several entries) would be the following:
new KStreamBuilder()
.stream(/*topics*/)
.flatMap((key, value)
-> Arrays.asList(yearBucket(value), // new KeyValue with key being the year of this value and its uid
monthBucket(value), // new KeyValue with key being the month of this value and its uid
dayBucket(value), // and so on...
hourBucket(value),
allTimeBucket(value),
globalYearBucket(value), // new KeyValue with key being only the year, so as to count all the events of this year on the system
globalMonthBucket(value), // and so on...
globalDayBucket(value),
globalHourBucket(value),
globalAllTimeBucket(value)))
.groupByKey()
.count("counters")
.toStream()
.map(toAvro)
.to(avroKeySerde, avroValueSerde, outputTopic);
Thanks for the help.