Kafka Connect JDBC query parallelization

343 views
Skip to first unread message

peter...@gmail.com

unread,
Aug 19, 2016, 3:55:56 PM8/19/16
to Confluent Platform
Hello,

We're considering to use Kafka Connect to source our RDMSs.
According to the docs: 
"each Task must copy its subset of the data to or from Kafka
...
The data that a connector copies must be represented as a partitioned stream, similar to the model of a Kafka topic, 
where each partition is an ordered sequence of records with offsets.
...
"

How can we fetch data from a large table in a parallel manner? Should we assign each task (deployed connector?) a custom 
non-overlapping query i.e setting some ranges in the where clause?
For incremental loads this might not be an issue, but for a full table load it can.

BTW it's also unclear where to deploy Kafka Connect? Should we create a distinct Kafka Connect cluster which talks to the source system
and Kafka? Probably deploying them on the broker nodes would put lot of stress on them.

Any help would be great.
Thank you: Peter

Shikhar Bhushan

unread,
Aug 19, 2016, 8:26:21 PM8/19/16
to Confluent Platform
Hi Peter,

Should we assign each task (deployed connector?) a custom non-overlapping query i.e setting some ranges in the where clause?

That sounds like a good approach, however when using any of the incremental modes we currently don't support having a WHERE clause. A patch would be welcome to address the limitation, or please open a GH issue. 

One way to handle this could be if the query contains some placeholder text like "${incrementalClause}" we use that for adding timestamp/id column clause rather than appending a WHERE clause.

Best,

Shikhar

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/483a36e7-4ef1-4f31-abe9-8c7819abae80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

peter...@gmail.com

unread,
Aug 20, 2016, 4:41:00 PM8/20/16
to Confluent Platform
Hi Shikhar,

Thanks for the reply.
So for the incremental mode for DB ingestion do you support currently 1 task/thread for querying the data?
Do you mean that for full table extracts the user can specify a where clause but creating non-overlapping queries is a manual step?

Thanks: Peter
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

Shikhar Bhushan

unread,
Aug 20, 2016, 6:01:33 PM8/20/16
to Confluent Platform
Hi Peter,

So for the incremental mode for DB ingestion do you support currently 1 task/thread for querying the data?

For both bulk and incremental modes, it is 1 task/thread per table or custom query. Currently while you can have the connector configured to ingest from multiple tables, only a single custom query may be specified. So if you want to parallelize ingest for a large table with custom queries, you would need to configure a separate connector instance for each query. 

> Do you mean that for full table extracts the user can specify a where clause but creating non-overlapping queries is a manual step?

Yes, for full table extracts (bulk mode) a WHERE clause is currently possible. For incremental modes it is currently not possible since the implementation tries to append its own "WHERE .." clause and that leads to invalid SQL.

Best,

Shikhar

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

peter...@gmail.com

unread,
Aug 20, 2016, 6:14:58 PM8/20/16
to Confluent Platform
Thanks for your answer.
Just one more question: where do we need to install the connector instances? Can they be on the broker nodes or is it advised to move them to other machines 
not to overwhelm the kafka cluster? Probably we would install multiple connectors on one machine.

Peter
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

Gwen Shapira

unread,
Aug 21, 2016, 1:14:44 AM8/21/16
to confluent...@googlegroups.com
In Sqoop, we managed to customize the incremental queries as well. As
well as add a thread-per-partition mode (at least for some DBs).
Perhaps it is worthwhile to add "issues" for those improvements and
hopefully someone will be willing to contribute?

Gwen
>>>>> an email to confluent-platf...@googlegroups.com.
>>>>>
>>>>> To post to this group, send email to confluent...@googlegroups.com.
>>>>>
>>>>>
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/confluent-platform/483a36e7-4ef1-4f31-abe9-8c7819abae80%40googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Confluent Platform" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to confluent-platf...@googlegroups.com.
>>> To post to this group, send email to confluent...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/confluent-platform/7e3fc47f-28b2-4e5d-99a7-6db3aa0a4d1e%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to confluent-platf...@googlegroups.com.
> To post to this group, send email to confluent...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/confluent-platform/6aa9c17e-db27-428d-98fd-240a978b9eab%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

peter...@gmail.com

unread,
Aug 21, 2016, 3:14:30 PM8/21/16
to Confluent Platform
Thank you. First we'll investigate the single task mode and see whether we can improve that.

Peter
>>>>>
>>>>> To post to this group, send email to confluent...@googlegroups.com.
>>>>>
>>>>>
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/confluent-platform/483a36e7-4ef1-4f31-abe9-8c7819abae80%40googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Confluent Platform" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> To post to this group, send email to confluent...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/confluent-platform/7e3fc47f-28b2-4e5d-99a7-6db3aa0a4d1e%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Shikhar Bhushan

unread,
Aug 22, 2016, 4:02:44 PM8/22/16
to confluent...@googlegroups.com
Hi Peter,

If you have sufficient resources on the broker nodes, running connect alongside is probably going to be fine.

Best,

Shikhar

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

Shikhar Bhushan

unread,
Aug 22, 2016, 4:12:55 PM8/22/16
to confluent...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages