Kafka streams for processing Historical data rather than real time

36 views
Skip to first unread message

Adarsh Lal

unread,
May 17, 2017, 3:16:20 AM5/17/17
to Confluent Platform
Hi,

While going through Kafka streams documentation, the examples, and use cases addressed are more of real time streaming use cases. Is there any documents or examples describing the streaming of Historical data. 

Thanks and Regards,
Adarshlal S

Damian Guy

unread,
May 17, 2017, 5:07:33 AM5/17/17
to Confluent Platform
Hi,

What is it you are trying to do? When a kafka streams app first starts it will process all historical data as it will start from the beginning offsets for all input topics. There is not currently a method for telling it particular offsets to start and end at, though there is some discussion about adding this.

If you wan't to reprocess historical data there is a streams reset tool that will reset the offsets to the beginning so you can then process the data again.

Thanks,
Damian 

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/d7a91817-8d1f-4769-8786-ec9372ce795b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adarsh Lal

unread,
May 17, 2017, 8:03:03 AM5/17/17
to Confluent Platform
Hi,

Suppose we have Cassandra database with a large amount of time series data. 
Each time we may have requests for retrieving data for a specific time interval. This result from Casandra needs to be processed and returned. For achieving this with Kafka streams, we need to get the data from Cassandra to Kafka topics, that is the whole data in the table needs to be imported to Kafka topics. As we are already keeping the data in Cassandra, replicating the data in Kafka seems to be an overhead. 
Is there a way to dynamically retrieve data from Cassandra for each request based on the arguments.

Thanks and Regards,
Adarshlal

Eno Thereska

unread,
May 18, 2017, 4:45:46 AM5/18/17
to Confluent Platform
Hi Adarsh,

You could implement your logic using the low level Processor API and in that logic make external calls to Cassandra if you wish. However this is no magic there that automatically converts a Cassandra table into a KTable or such.

Eno
Reply all
Reply to author
Forward
0 new messages