--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/acd72d8e-8046-4f96-a4e4-18a822859c7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Clemens,Does ingesting the Kafka topic to HDFS followed by running gobblin compaction (http://gobblin.readthedocs.io/en/latest/user-guide/Compaction/) solve your problem?We generally store the CDC on HDFS (e.g. ingested into /data/dbchanges) separate from the compacted snapshots (compaction applied to /data/dbchanges and published to /data/dbsnapshots)I'm not sure I follow why you would have to periodically re-consume the entire Kafka topic, versus just continually ingesting it into HDFS and compacting there.Shirshanka
On Wed, Apr 5, 2017 at 9:59 AM, Clemens Valiente <csieb...@gmail.com> wrote:
Hi,I have several kafka topics that are basically CDC changelogs. I would like gobblin to occasionally read that topic and materialize it to a new snapshot of the underlying table.for that I would need:- a kafka source that always starts at the beginning of the topic. I had a look and it seems like it is not possible to do it without a complete reimplementation of the KafkaSource since all the methods determining starting offset are private, or am I missing something here?- deduplication of records before they are written. While the Quality checker seemed like a good candidate I don't think I can use that - the row level qualitychecker only has individual rows and not the overall context, and the task level checker doesn't even do what the documentation says and just looks at the state of the task output.I was thinking of storing the rows in a rocksDB and store that one in the State.properties but the last part sounds ugly to me. Are there any other ways of doing this?
--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/b5e04e30-2753-4a5f-9bc3-ca1e5592b634%40googlegroups.com.