Approach for passing reader schema while using KafkaAvroDeserializer

66 views
Skip to first unread message

jaga...@helpshift.com

unread,
Aug 3, 2017, 10:30:26 AM8/3/17
to Confluent Platform

I have a question regarding usage of passing explicit schema in consumer (reader schema) using confluent's KafkaAvroDeserializer class.

 

-- While writing a consumer we just set the property "value.deserializer" to KafkaAvroDeserializer class.

-- Now in the lifecycle of the Kafka consumer, it calls deserialize method of the signature `deserialize(String s, byte[] bytes)` because KafkaConsumer only cares about Deserializer interface and it only knows one signture of deserialize. So the method `deserialize(String s, byte[] bytes, Schema readerSchema)` doesn't come in picture at all.

-- If in my consumer I need to have explicit schema then there is no way of passing this to deserializer. Because calling the deserialize method is not under user code control it is called by KafkaConsumer (kafka library) internally

-- If I have to fix it then I can think of writing another class with constructor accepting read schema and if it is set then in the  `deserialize(String s, byte[] bytes)` method, which is called by KafkaConsumer, simply call `deserialize(String s, byte[] bytes, String readerSchema)` method.

-- Can someone confirm is that the only we of doing it? Or what is the alternative way of achieving it without having to write and maintain my code?


 

Also want to understand what is the best practice & reasoning to maintain the explicit schema at the consumer level. Following question depends on the above question.


-- Use case I am trying to solve is as long as producer is changing schema in backward compatible manner, it need not communicate with the team maintaining producer code

-- Imagine producer had schema initially as {type: record , A, B, C all string type with default values set}, consumer first time wrote code with getters for the A,B,C and just using writer's schema without using any explicit reader's schema

-- If producer deletes B field. Even then this schema is backward compatible (as there were default values set), if producer sends data using this new schema consumer will deserialize it correctly based on writer's schema

-- But consumer code will fail because when the getter for B field is called it will not get anything.

-- This can be solved if we maintain explicit schema at the consumer end.

-- Is above  reasoning is correct? If yes then does it mean that if you always need to have above requirement then you must always use reader's schema?

 

jaga...@helpshift.com

unread,
Aug 7, 2017, 12:51:46 AM8/7/17
to Confluent Platform
Bumping this just in case it was missed
Reply all
Reply to author
Forward
0 new messages