Headers Support for Kafka Avro Message

2,146 views
Skip to first unread message

Mana

unread,
Jul 31, 2017, 10:26:14 PM7/31/17
to Confluent Platform
Background:
I am working on the Kafka library for the project. When user of the library passes a record to put into Kafka, the library needs to add additional information on top of the user's record.

For Avro typed values in the record, originally I was thinking that the library will define Kafka record's value Avro schema, which will have "headers" field and can be used to add additional information. In addition, it will have another field for user's record value reference.

For example,

KafkaLibraryRecordValueRecord
    -> field -> headers
    -> field -> KafkaUserRecordValueRecord

Few things I should bring up:
- Library does not know Avro record types of different users' feature models in advance.
- I did not want to ask all users to add "headers" field to the schemas they define for their features. This does not seem user friendly.

Question:
I am very novice in Avro and Schema Registry and am reading up on things. But so far from what I have read, there does not seem to be easy way of doing this.
- First of all, schema registry does not support schema references.
- May be there is a possibility of doing this in a reverse way, where through some Avro code generation customization, during build, headers field can be inserted into each user's schema? I would still have to use reflection in the library code to access the header field from user's Avro object?

Any thoughts on how this can be done?


I went through this discussion, which seems to discuss this issue, but there does not seem to be any concrete solution in the discussion:
https://groups.google.com/forum/#!searchin/confluent-platform/schema$20header%7Csort:relevance/confluent-platform/8xPbjyUE_7E/dyuYxvJ7vC8J

Ewen Cheslack-Postava

unread,
Jul 31, 2017, 11:46:09 PM7/31/17
to Confluent Platform
Mana,

Headers were just recently added natively to Kafka itself, via KIP-82 in 0.11.0.0 https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers 

These look a bit different from the key/value fields that are raw bytes in Kafka but have API support for layering serializers on top. Instead, headers are a set of <String, byte[]> pairs. You could easily put Avro data into each of the byte[] header values, or probably more common would be to use String values as well.

So while you can use an approach that wraps the original value in an envelope that includes headers, that is no longer strictly necessary, and by using native headers you would address some of the issues mentioned in the KIP's motivation, e.g. the fact that wrapping null values in compacted topics results in non-null values that will never be cleaned up.

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/84e6bcf7-a351-4c65-8af4-b0ccc79746d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mana

unread,
Aug 1, 2017, 12:45:46 AM8/1/17
to Confluent Platform
Thanks Ewen. This is great and exactly what I am looking for!
To post to this group, send email to confluent...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages