If I understand you correctly, your plan is to run JDBC source in
"bulk" mode and use the
poll.interval.ms config to control how
frequently you bulk load the data? and then have HDFS connector
"drain" the topic before dumping another snapshot?
I have to admit that orchestrating connectors with these requirements
sound a bit tricky...
Few things I'd do:
1. Make sure you have a significant interval between JDBC connector
polls (compared to the time the bulk load takes) to avoid a scenario
where a second bulk load starts before the first finishes.
2. I'd use the producer JMX metrics on the source connector tasks to
check when they stopped producing as an indicator of when the load
into Kafka finished.
3. HDFS Sink by default will just add the additional bulk-loads into
the same HDFS dir and Hive table (named after the topic in use), you
may need to hack a bit to get around that. Depending on the exact
requirements.
4. I may consider using an actual cron-job that starts stand-alone
connectors with the appropriate configuration for every bulk load
(both JDBC and HDFS sides) and then stop the connectors (using REST
APIs) when the bulk-load finishes (see point 1). I may even update the
configuration to write to a different topic every time, to avoid
mix-up of snapshots. Then you want another cron-job to clean up the
old topics...
As i said, a bit tricky.
We do have "batch mode" plans for connect and the "cron" configuration
looks like something that will fit in. We didn't spec the details yet
though. Feel free to give it some thought and perhaps contribute few
improvements to make your use-case easier.
Gwen
> --
> You received this message because you are subscribed to the Google Groups
> "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
confluent-platf...@googlegroups.com.
> To post to this group, send email to
confluent...@googlegroups.com.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/confluent-platform/6dbcb22b-b0e8-47c7-89bf-7eecc9a6c0ef%40googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.
--
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog