Camus is great for doing partitioned ETL on a Kafka topic into HDFS.
Question is...does anyone know of a tool or way to perform the opposite? That is, "replay" data from HDFS back to a Kafka topic?
We apply stream transformations on Kafka feeds that don't translate very well to a Hadoop/MR type job. A common case is that:
1. Data comes in
2. We put it on a topic that Camus then stores, partitioned by ingest data
3. We process that same feed in real time and output it to another topic.
We currently don't have a great solution for pumping the data from HDFS back to Kafka (so that we can re-apply the transformation when we make modifications to that process.)
I wanted to see if a good solution for that existed, so we don't end up re-inventing the wheel.
- Sean