Hi, Christos.
The Avro Converter does a bit more work than just serializing the records to Avro. It works with the Schema Registry to make sure that the proper Avro schema is actually registered in the registry, and will then include the registry's ID of that Avro schema inside each messages as the first 4 bytes of the serialized message. The remaining content of the serialized message is indeed just the Avro serialization of the message.
Right now Kafka Connect does not have an ability to define a `byte[]` value as an Avro-encoded serialized representation. Therefore, the approach you outline above is the most straightforward, robust, and maintainable way I know of to do this, and it uses public APIs that. But as you mention it is not the most efficient approach. It might help a bit to cache the Kafka Connect `Schema` object to amortize that work over many messages. But it still would have to deserialize every record to the Kafka Connect value representation.
The most efficient would be to write a custom Converter that does the same interaction with the schema registry that the Avro Converter does to ensure the Avro schema is registered and get the ID (using caching to avoid unnecessary work), to write out the schema ID, and then to assume that any `byte[]` value is already Avro encoded and to simply pass that through. Your source connector would have to pass the Avro-encoded value as a `byte[]` value. This approach would work and would be more efficient, but it would require a fair amount of non-trivial code and adopting some conventions on your part. You're the only one that can decide whether the slight improvement in efficiency justifies this extra effort.
Anyone else have any better ideas?
Randall