Custom Topic Partitioning with Kafka Connect JDBC connector

Sumit Arora

unread,

Mar 30, 2016, 9:20:36 PM3/30/16

to Confluent Platform

Hello,

We are planning to use Kafka Connect JDBC connector to extract data from our SQL server database and publish it to Kafka topics. However, we want to use a custom partitioning strategy (for example- using primary key of each table as the partitioning key so all updates for a particular key land in a specific partition) and not the default partitioner while publishing messages to Kafka topics. Is there a way to achieve this ?

I was browsing through KafkaConnect documentation and found that we can override producer configs like partitioner.class to define a custom partitoning strategy but how can we pass keys to the producer?

Thanks,

Sumit

Sumit Arora

unread,

Apr 1, 2016, 10:40:04 AM4/1/16

to Confluent Platform

Good Morning,

any ideas on this?

Thanks,

Sumit

Ewen Cheslack-Postava

unread,

Apr 1, 2016, 4:56:12 PM4/1/16

to Confluent Platform

Sumit,

This is a feature we want to support in the connector (e.g. something like use.primary.key=true), but the feature isn't there yet. This is information that is available via the DatabaseMetaData in JDBC, so it shouldn't be too hard to get the primary key then extract that and provide it as the key in the SinkRecord.

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/da7eca9d-195f-4f3c-a332-50939b593171%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Thanks,
Ewen

Sumit Arora

unread,

Apr 5, 2016, 6:23:36 PM4/5/16

to Confluent Platform

Thanks Ewen !

On Friday, April 1, 2016 at 2:56:12 PM UTC-6, Ewen Cheslack-Postava wrote:

Sumit,

This is a feature we want to support in the connector (e.g. something like use.primary.key=true), but the feature isn't there yet. This is information that is available via the DatabaseMetaData in JDBC, so it shouldn't be too hard to get the primary key then extract that and provide it as the key in the SinkRecord.

-Ewen

On Fri, Apr 1, 2016 at 7:40 AM, Sumit Arora <aroras...@gmail.com> wrote:

Good Morning,

any ideas on this?

Thanks,
Sumit

On Wednesday, March 30, 2016 at 7:20:36 PM UTC-6, Sumit Arora wrote:
Hello,

We are planning to use Kafka Connect JDBC connector to extract data from our SQL server database and publish it to Kafka topics. However, we want to use a custom partitioning strategy (for example- using primary key of each table as the partitioning key so all updates for a particular key land in a specific partition) and not the default partitioner while publishing messages to Kafka topics. Is there a way to achieve this ?

I was browsing through KafkaConnect documentation and found that we can override producer configs like partitioner.class to define a custom partitoning strategy but how can we pass keys to the producer?

Thanks,
Sumit

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/da7eca9d-195f-4f3c-a332-50939b593171%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Thanks,
Ewen

David Kalosi

unread,

Jun 6, 2016, 8:25:35 AM6/6/16

to Confluent Platform

Hi Ewen,

I have a similar question - I would need to populate the key with a predefined value - in our case I have multiple DBs with the same structure which are isolated per country. I would need to get let's say an ISO country code to the key of the topic so I don't have to create a dedicated topic for each DB but rather have one topic partitioned by the country code

is this somehow possible ?

thanks

david

Dňa piatok, 1. apríla 2016 22:56:12 UTC+2 Ewen Cheslack-Postava napísal(-a):

Sumit,

This is a feature we want to support in the connector (e.g. something like use.primary.key=true), but the feature isn't there yet. This is information that is available via the DatabaseMetaData in JDBC, so it shouldn't be too hard to get the primary key then extract that and provide it as the key in the SinkRecord.

-Ewen

On Fri, Apr 1, 2016 at 7:40 AM, Sumit Arora <aroras...@gmail.com> wrote:

Good Morning,

any ideas on this?

Thanks,
Sumit

On Wednesday, March 30, 2016 at 7:20:36 PM UTC-6, Sumit Arora wrote:
Hello,

We are planning to use Kafka Connect JDBC connector to extract data from our SQL server database and publish it to Kafka topics. However, we want to use a custom partitioning strategy (for example- using primary key of each table as the partitioning key so all updates for a particular key land in a specific partition) and not the default partitioner while publishing messages to Kafka topics. Is there a way to achieve this ?

I was browsing through KafkaConnect documentation and found that we can override producer configs like partitioner.class to define a custom partitoning strategy but how can we pass keys to the producer?

Thanks,
Sumit

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/da7eca9d-195f-4f3c-a332-50939b593171%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Thanks,
Ewen

Ewen Cheslack-Postava

unread,

Jun 7, 2016, 1:33:54 AM6/7/16

to confluent...@googlegroups.com

David,

There's not a way to do that with the JDBC connector today. That would be an unusual key to use and doesn't really have to do with the JDBC connector, so wouldn't make much sense to add as a feature specific to the connector. We are considering a feature to do pluggable transformations on single messages as a lightweight way to modify data before publishing to Kafka: https://issues.apache.org/jira/browse/KAFKA-3209 That is not implemented yet, but might be the right way to solve your use case.

By the way, regarding the original question about partitioning based on primary key, https://github.com/confluentinc/kafka-connect-jdbc/pull/67 is close to being merged and would add support for the single-column primary key case.

-Ewen

On Mon, Jun 6, 2016 at 5:25 AM David Kalosi <david....@gmail.com> wrote:

Hi Ewen,

I have a similar question - I would need to populate the key with a predefined value - in our case I have multiple DBs with the same structure which are isolated per country. I would need to get let's say an ISO country code to the key of the topic so I don't have to create a dedicated topic for each DB but rather have one topic partitioned by the country code

is this somehow possible ?

thanks
david

Dňa piatok, 1. apríla 2016 22:56:12 UTC+2 Ewen Cheslack-Postava napísal(-a):
Sumit,

This is a feature we want to support in the connector (e.g. something like use.primary.key=true), but the feature isn't there yet. This is information that is available via the DatabaseMetaData in JDBC, so it shouldn't be too hard to get the primary key then extract that and provide it as the key in the SinkRecord.

-Ewen
On Fri, Apr 1, 2016 at 7:40 AM, Sumit Arora <aroras...@gmail.com> wrote:
Good Morning,

any ideas on this?

Thanks,
Sumit

On Wednesday, March 30, 2016 at 7:20:36 PM UTC-6, Sumit Arora wrote:
Hello,

We are planning to use Kafka Connect JDBC connector to extract data from our SQL server database and publish it to Kafka topics. However, we want to use a custom partitioning strategy (for example- using primary key of each table as the partitioning key so all updates for a particular key land in a specific partition) and not the default partitioner while publishing messages to Kafka topics. Is there a way to achieve this ?

I was browsing through KafkaConnect documentation and found that we can override producer configs like partitioner.class to define a custom partitoning strategy but how can we pass keys to the producer?

Thanks,
Sumit

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/da7eca9d-195f-4f3c-a332-50939b593171%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Thanks,
Ewen

--

You received this message because you are subscribed to the Google Groups "Confluent Platform" group.

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/782b9f51-abbb-4867-8fa8-bfd5003180ce%40googlegroups.com.

ago...@redborder.com

unread,

Jun 10, 2016, 5:32:58 AM6/10/16

to Confluent Platform

Hi Ewen,

Currently, I am starting using the JDBC Source Connector to move data from my Postgres into Kafka topic. Later, that is consumed by my Kafka Streams App. I use this data to do an enrichment with other data streams so I need that the data is correctly partitioned. I don't have a unique table on postgres so I use a query with some joins between some tables. For example:

connection.url=jdbc:postgresql://postgres.example.com/test_db?user=bob&password=secret&ssl=true
query=SELECT clients.mac AS client_mac , clients.name AS client_name, locations.zone AS zone FROM clients JOIN locations ON (clients.mac = locations.mac)
mode=bulk

I also use the bulk mode because when I do the query join I don't have anything (timestamp, primary key or incremental id). But, I need to do a partitioning based on client_mac column in order to my kafka streams app consumes the correct clients messages and later it will do a correct join with other streams. Do you think that this use case is correct? Could the issue  https://issues.apache.org/jira/browse/KAFKA-3209 solve my problem? I think that the partitioning must be done by connector, because I can have differents queries and one use an X key and another a Y key.

What do you think? If you think that it is a interesting feature, I will be dispossed to help to develop it.

Regards,Andrés

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/da7eca9d-195f-4f3c-a332-50939b593171%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Thanks,
Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

Ewen Cheslack-Postava

unread,

Jun 11, 2016, 7:24:51 PM6/11/16

to Confluent Platform

Andres,

Yeah, if you want to partition on something data-dependent that doesn't make sense to include as a standard key in a connector, then single message transforms will be the best way to do this since it can easily be applied to any connector/data stream.

-Ewen

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/da7eca9d-195f-4f3c-a332-50939b593171%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Thanks,
Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/782b9f51-abbb-4867-8fa8-bfd5003180ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Confluent Platform" group.

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/98d48e4b-ad3b-4882-9944-ffc71d31c852%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Thanks,
Ewen

Andrés Gómez

unread,

Jun 13, 2016, 5:54:02 AM6/13/16

to confluent...@googlegroups.com

Ok thanks! :)

--

Andrés

You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/aVaqBtMiKkY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/CAE1jLMMUkmrggkbbwmS7s_q41M3_4y-K%2B0grBGNU1AZky8H76g%40mail.gmail.com.

Reply all

Reply to author

Forward