Kafka Connect Incremental Query Mode

431 views
Skip to first unread message

Jag Thind

unread,
Dec 22, 2015, 6:19:40 AM12/22/15
to Confluent Platform

Hi


Some newbie questions based on incremental query modes


- Can you specify different modes for different tables, or is the same mode applied to all tables.


- Where are the offset tracking values maintained. Are these exposed in a file anywhere


Thanks

Jag

Gwen Shapira

unread,
Dec 22, 2015, 10:00:39 PM12/22/15
to confluent...@googlegroups.com
Hi,

- Right now it is the same mode for all tables. If you need different incremental modes you could create two different JDBC connectors by creating two configuration files and giving each a different name, different list of tables and different incremental mode. 

- If you are running KafkaConnect in standalone mode, it is in a file in /tmp. In distributed mode it is stored in a topic in Kafka.

Hope this helps!

Gwen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/18889caa-b16f-48a3-97d0-4dbc222e928f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jag Thind

unread,
Dec 23, 2015, 2:35:16 AM12/23/15
to Confluent Platform
Do you have any recommendations for dealing with deleted data?

As polling for inserted, updated data would not find deleted data.

Thanks for the clarification.

Gwen Shapira

unread,
Dec 23, 2015, 4:21:26 PM12/23/15
to confluent...@googlegroups.com
Deleted data is always an issue for ETL.

Two recommendations:
1. Don't actually delete data. At least not immediately. Add a column marking the record as "deleted" and only really delete it few days later. Aside from the use of Kafka-Connect, this also helps in case someone accidentally deletes something and then regrets it (getting the data from backups is a pain). Then the fact that a record was deleted will show up in Kafka and you can decide how to handle it (this is really use-case dependent)

2.  If you happen to use MySQL, there is a work-in-progress connector that tails the binlog, so it will include all DML including deletes: https://github.com/wushujames/kafka-mysql-connector

Gwen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

Jag Thind

unread,
Dec 24, 2015, 5:39:32 AM12/24/15
to Confluent Platform
I think tailing the transaction log\binlog is the only reliable way to get all changes, including deletes. One can then decide what to do with each operation

Thanks.

On Wednesday, 23 December 2015 21:21:26 UTC, Gwen Shapira wrote:
Deleted data is always an issue for ETL.

Two recommendations:
1. Don't actually delete data. At least not immediately. Add a column marking the record as "deleted" and only really delete it few days later. Aside from the use of Kafka-Connect, this also helps in case someone accidentally deletes something and then regrets it (getting the data from backups is a pain). Then the fact that a record was deleted will show up in Kafka and you can decide how to handle it (this is really use-case dependent)

2.  If you happen to use MySQL, there is a work-in-progress connector that tails the binlog, so it will include all DML including deletes: https://github.com/wushujames/kafka-mysql-connector

Gwen
On Tue, Dec 22, 2015 at 11:35 PM, Jag Thind <jag...@gmail.com> wrote:
Do you have any recommendations for dealing with deleted data?

As polling for inserted, updated data would not find deleted data.

Thanks for the clarification.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

Andrew Stevenson

unread,
Jan 4, 2016, 12:53:38 AM1/4/16
to Confluent Platform
Gwen,

Is Confluent favouring Maxwell over MyPipe? MyPipe integrates with Avro 

Thanks

Andrew

Gwen Shapira

unread,
Jan 4, 2016, 12:59:08 AM1/4/16
to confluent...@googlegroups.com
As far as I know, Confluent didn't do any research comparing these technologies (Ewen will correct me if I'm wrong, he's been here longer).

Maxwell has the KafkaConnect integration (https://github.com/wushujames/kafka-mysql-connector), so this fits a bit better into our framework and best practices. But as long as your data ends up in Kafka in a format that works for your use-case, its all good :)

Gwen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

James Cheng

unread,
Jan 5, 2016, 8:02:06 PM1/5/16
to Confluent Platform
Hi, I'm the author of the Kafka Connect integration for Maxwell (https://github.com/wushujames/kafka-mysql-connector).

If you use Kafka Connect, you can configure it to write to Kafka in the format you want. For an example, see the "value.converter" property in https://github.com/wushujames/kafka-mysql-connector/blob/master/copycat-standalone.properties. Kafka Connect (from trunk) ships with a JsonConverter, but you can write your own to encode things to avro if you want.

This is conceptually similar to how the stock Kafka producer allows you to plug in your own serializer.

I can't speak for Confluent but, just as they provided an "avro serializer for the stock Kafka producer" that uses their Confluent Platform, I would expect them to provide a "kafka connect serializer" that would use their platform to allow Kafka Connect to use avro.

-James


On Sunday, January 3, 2016 at 9:59:08 PM UTC-8, Gwen Shapira wrote:
As far as I know, Confluent didn't do any research comparing these technologies (Ewen will correct me if I'm wrong, he's been here longer).

Maxwell has the KafkaConnect integration (https://github.com/wushujames/kafka-mysql-connector), so this fits a bit better into our framework and best practices. But as long as your data ends up in Kafka in a format that works for your use-case, its all good :)

Gwen
On Sun, Jan 3, 2016 at 9:53 PM, Andrew Stevenson <astev...@outlook.com> wrote:
Gwen,

Is Confluent favouring Maxwell over MyPipe? MyPipe integrates with Avro 

Thanks

Andrew

On Tuesday, 22 December 2015 12:19:40 UTC+1, Jag Thind wrote:

Hi


Some newbie questions based on incremental query modes


- Can you specify different modes for different tables, or is the same mode applied to all tables.


- Where are the offset tracking values maintained. Are these exposed in a file anywhere


Thanks

Jag

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

Liquan Pei

unread,
Jan 5, 2016, 8:21:15 PM1/5/16
to confluent...@googlegroups.com
Hi Andrew,

Kafka Connect can write Avro data. You can simply set the key.converter and value.converter to use the Avro converter. The example on how to config this is in the configuration file 

etc/schema-registry/connect-avro-standalone.properties

of confluent platform 2.0. The Kafka Connect documentation also contains examples on how to work with Avro data. 

Thanks,
Liquan


On Tue, Jan 5, 2016 at 5:02 PM, James Cheng <jch...@tivo.com> wrote:
Hi, I'm the author of the Kafka Connect integration for Maxwell (https://github.com/wushujames/kafka-mysql-connector).

If you use Kafka Connect, you can configure it to write to Kafka in the format you want. For an example, see the "value.converter" property in https://github.com/wushujames/kafka-mysql-connector/blob/master/copycat-standalone.properties. Kafka Connect (from trunk) ships with a JsonConverter, but you can write your own to encode things to avro if you want.

This is conceptually similar to how the stock Kafka producer allows you to plug in your own serializer.

I can't speak for Confluent but, just as they provided an "avro serializer for the stock Kafka producer" that uses their Confluent Platform, I would expect them to provide a "kafka connect serializer" that would use their platform to allow Kafka Connect to use avro.

-James


On Sunday, January 3, 2016 at 9:59:08 PM UTC-8, Gwen Shapira wrote:
As far as I know, Confluent didn't do any research comparing these technologies (Ewen will correct me if I'm wrong, he's been here longer).

Maxwell has the KafkaConnect integration (https://github.com/wushujames/kafka-mysql-connector), so this fits a bit better into our framework and best practices. But as long as your data ends up in Kafka in a format that works for your use-case, its all good :)

Gwen
On Sun, Jan 3, 2016 at 9:53 PM, Andrew Stevenson <astev...@outlook.com> wrote:
Gwen,

Is Confluent favouring Maxwell over MyPipe? MyPipe integrates with Avro 

Thanks

Andrew

On Tuesday, 22 December 2015 12:19:40 UTC+1, Jag Thind wrote:

Hi


Some newbie questions based on incremental query modes


- Can you specify different modes for different tables, or is the same mode applied to all tables.


- Where are the offset tracking values maintained. Are these exposed in a file anywhere


Thanks

Jag

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Liquan Pei | Software Engineer | Confluent | +1 413.230.6855
Download Apache Kafka and Confluent Platform: www.confluent.io/download
Reply all
Reply to author
Forward
0 new messages