Kafka connect with Multi Threading

341 views
Skip to first unread message

bigdata....@gmail.com

unread,
May 8, 2017, 8:01:57 AM5/8/17
to Confluent Platform

I am trying to load data from RDBMS to kafka topic. I am using kafka jdbc connector provided by Confluent. My questions are


1. If I have millions of data in the database, How can I proceed with multi treading option ?

2. Can we use multiple broker in Kafka connect

3. How can we implement security in this offload from RDBMS to Kafka topic?

4. During data offload lets say my server goes down . How it will behave after kafka server restart?


Please help. This is urgent.

Ewen Cheslack-Postava

unread,
May 13, 2017, 6:06:23 PM5/13/17
to Confluent Platform
On Mon, May 8, 2017 at 5:01 AM, <bigdata....@gmail.com> wrote:

I am trying to load data from RDBMS to kafka topic. I am using kafka jdbc connector provided by Confluent. My questions are


1. If I have millions of data in the database, How can I proceed with multi treading option ?

That connector only parallelizes by table, i.e. 1 task (and there for 1 thread) per table. This is a fundamental restriction due to the limitations of accessing the data via queries -- there's not a way to parallelize this, guarantee ordering, and manage offsets to track what data has already been copied if you try to use multiple threads. 

2. Can we use multiple broker in Kafka connect

Kafka Connect works with any Kafka cluster, including one with multiple brokers. Note that Kafka Connect itself does not run on the brokers -- you should run it on separate servers.

If you're asking about multiple Kafka Connect workers, then yes, you can run multiple instances and have them coordinate. This is called "distributed mode": http://docs.confluent.io/current/connect/userguide.html#standalone-vs-distributed
 

3. How can we implement security in this offload from RDBMS to Kafka topic?

What type of security? For security between the connector and the database, you'd rely on whatever support the JDBC driver provides. For security between the connector and Kafka, the Kafka Connect framework supports any security modes supported in Kafka. See http://docs.confluent.io/current/connect/security.html for more details.
 

4. During data offload lets say my server goes down . How it will behave after kafka server restart?

Kafka Connect works similarly to Kafka Consumers. It commits offsets periodically, allowing it to recover after a failure (or clean shutdown) and pick up where it left off. In the result of a crash, you'll get at least once semantics, i.e. you may see a few duplicates but you won't be restarting from scratch.

-Ewen
 


Please help. This is urgent.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/6a0dcb33-5f27-4732-be4a-9ebd4f4a8c31%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages