Hi,
I'd really like to use Avro-encoded custom types (as opposed to primitives) for message keys, for several reasons:
- The client code is clear and self-documenting (e.g. CustomerKey and ProductKey instead of String are less likely to be mixed up by mistake).
- Key and value encodings are symmetric (both use SpecificAvroSerde), so there is no need to configure specific Serdes for each topic.
- Some Kafka Connect connectors allow defining keys this way.
However, there are several caveats associated with doing this, which leaves me wondering:
Is it considered bad practice to use KafkaAvroSerializer for message keys?
The main caveats I encountered thus far are caused by the fact that the schema subject is determined from the topic name (see AbstractKafkaAvroSerDe#getSubjectName).
This is quite problematic when you need to join different topics on the same key. For example, using selectKey on a KStream and sending it to a new topic will register the schema of the key with the schema registry under a new subject, thus returning a new schema id.
Therefore, the serialized byte stream of the keys from both topics will be different (since the schema id is embedded in the serialized format). This, in turn, will cause a join between the topics to fail (this is also true for looking up a value by key in a StateStore).
This behavior is discussed in
KAFKA-5398, but no clear solution is described there.
All of this can be avoided by using primitive types (e.g. String, Long) as keys, along with their associated Serdes, instead of using Avro.
To be able to join different topics with the same semantic key without having them considered as different keys, it would be necessary to use a custom version of KafkaAvroSerializer that would not generate the subject from the topic name but rather from the class name of the data type.
I hesitate to implement such a solution since it seems non-standard, and I'd like to ask how others are handling this:
- Do you use only primitive types for keys?
- Do you use a custom serializer?
- Do you have an other, better solution?
Thanks,
Gavrie