Kafka Connect schema version - Debezium vs Schema Registry

720 views
Skip to first unread message

David Szabo

unread,
Jul 24, 2017, 9:43:53 AM7/24/17
to debezium
Hi Everyone,

I am working with Debezium MySQL Connector and Confluent's Schema Registry and AvroConverter. I'd like to use the schema version of Kafka Connect records in my custom Kafka Connect extensions (e.g. transformations, partitioners).

The version that schema registry provides would be perfect for this and in theory I could access it from my code like connectRecord.valueSchema().version(). Unfortunately, what I see there, is a constant the value of 1, even after schema changes.
I've looked into the code of both Confluent and Debezium and it seems that while schema registry version would be available, it gets preceded by the version that a source connector - in this case Debezium - provides*. Which makes sense in general. It seems though that the version Debezium provides at this moment is the hardcoded value of 1.**
While I think it's okay if Debezium cannot propagate a schema version (sure it would be great, if it did :)), wouldn't it make more sense if it would just leave the version empty (rather than filling it with a hardcoded value), and let AvroConverter defer to the schema registry version instead?
(According to Kafka Connect javadoc Schema's version attribute is optional, so it would be acceptable to leave it as null.***)

I am happy to submit a Jira issue and merge request with this change, if this approach makes sense to you, but I'm also open to alternative suggestions to make the schema versions work.
(Please mind that I'm pretty new to both Debezium and Kafka Connect, so I may not see full picture.)

Thank you,
David


Randall Hauch

unread,
Jul 24, 2017, 10:21:53 AM7/24/17
to debezium
Yeah, the MySQL connector should probably not set the version. Go ahead and file a JIRA issue, and if you're interested create a pull request with the change. Should be straightforward. In the meantime, you could use a custom SMT to remove the Schema's version field.

BTW, you mention using the schema in your "custom Kafka Connect extensions (e.g. transformations, partitioners)." Single Message Transforms work with the SourceRecord that the source connector produces (or SinkRecord for sink connectors), and this is *before* Kafka Connect's converter (Avro converter in your case) is used to serialize the record into a binary form that is then written to Kafka. So, if you're using SMTs for transformations and partitioners, then your Transformation implementation will be independent of how the record is serialized.

Best regards,

Randall

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+unsubscribe@googlegroups.com.
To post to this group, send email to debe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/bbf958fc-7768-46e7-a822-998c8416b983%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Szabo

unread,
Jul 25, 2017, 9:23:46 AM7/25/17
to debezium
Thank you for your answer Randall.

I'll file a Jira issue and create pull request sometime in this week.
FYI: What I planned to do is removing the version from Envelope in debezium-core, so it won't just affect MySQL connector. Let me know if you think this would have an undesirable effect on something else.
Other thing is, I've observed the same hardcoded version on other Debezium data types (Bits, Enum, etc.), that doesn't really concern my case, but I can go ahead and remove the version there as well.

Regarding you're comment on being independent from serialization: not sure if I get your meaning, but I am creating Sink transformations and a partitioner that's also used on Sink side (as Confluent S3 connector's partitioner). By the time record arrives there, they already went through an Avro serialization and then deserialization, so they see the schema version AvroConverter put in.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.

David Szabo

unread,
Nov 15, 2017, 12:14:58 PM11/15/17
to debezium
Hi again,

It lasted a "bit" more than expected (had issues with Jira access, a several weeks vacation then I just completely forgot), but I submitted both JIRA issue and pull request.

Regards,
David
Reply all
Reply to author
Forward
0 new messages