If avro messages carry the schema with them, why do we need the schema registry?

1,435 views
Skip to first unread message

dk.h...@gmail.com

unread,
Jun 21, 2016, 10:52:14 AM6/21/16
to Confluent Platform
Does the message go in kafka without the schema to reduce the message size?

For each and every message consumed I have to hit the schema registry to fetch the schema?

Is it possible to configure some kind of cache?

Will a schema associated with the ID ever change, or if it does a new schema_id is created for the new schema?

Thanks!

-Derek

Michael Noll

unread,
Jun 21, 2016, 11:32:07 AM6/21/16
to confluent...@googlegroups.com
Does the message go in kafka without the schema to reduce the message size?

Yes and no.

Yes, if you use the Confluent Avro serializers (combined with e.g. Confluent schema registry), then the message is goes into Kafka with only a reference id to the schema to reduce the message size.  This approach is better than embedding the full Avro schema into every message particularly for those use cases where the message sizes are small, i.e. where the embedded Avro schema contributes a significant portion to the full message size.

No, if your question is about Kafka messages in general.  Kafka doesn't force you to use Avro, it also doesn't force you to use the reference-id-for-avro-schema approach, or to use the embed-the-avro-schema approach.  This is totally up to you.  So I suppose your question was specifically for the case when you do use Avro and when you do use the Confluent Avro serializers and schema registry?


For each and every message consumed I have to hit the schema registry to fetch the schema?

The Confluent Avro serializers/deserializers cache previously retrieved schemas, so you do not nit the schema registry for every message -- doing so would kill performance.  So if you use Confluent's Avro serdes, this is not a problem.


> Will a schema associated with the ID ever change, or if it does a new schema_id
> is created for the new schema?

Once a schema is registered with the schema registry (which means it has an ID), then this schema will never change.  So whenever you need to update the schema to a newer version, you'd end up with registering a new schema.

Lastly, you might want to read through the docs (http://docs.confluent.io/3.0.0/schema-registry/docs/index.html) on how schemas are associated with Kafka topics (or "subjects", in the terminology of the schema registry).  For example, you could associate both the old schema and the new schema with a particular Kafka topic.

Hope this helps,
Michael




--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/9fc2c89a-6378-4a3a-ba42-d61d22cdd69b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



dk.h...@gmail.com

unread,
Jun 21, 2016, 11:40:27 AM6/21/16
to Confluent Platform
Thanks, Michael. I may be missing something here but the schema registry sounds like an overkill when you can just send the whole json schema with each and every message, even if you send 1 million messages with the exact same schema header. Who cares about message size and hard-drive storage these days? And the performance hit for having to read a couple of more bytes for each and every message? Nanoseconds...

-Derek

On Tuesday, June 21, 2016 at 10:32:07 AM UTC-5, Michael Noll wrote:
Does the message go in kafka without the schema to reduce the message size?

Yes and no.

Yes, if you use the Confluent Avro serializers (combined with e.g. Confluent schema registry), then the message is goes into Kafka with only a reference id to the schema to reduce the message size.  This approach is better than embedding the full Avro schema into every message particularly for those use cases where the message sizes are small, i.e. where the embedded Avro schema contributes a significant portion to the full message size.

No, if your question is about Kafka messages in general.  Kafka doesn't force you to use Avro, it also doesn't force you to use the reference-id-for-avro-schema approach, or to use the embed-the-avro-schema approach.  This is totally up to you.  So I suppose your question was specifically for the case when you do use Avro and when you do use the Confluent Avro serializers and schema registry?


For each and every message consumed I have to hit the schema registry to fetch the schema?

The Confluent Avro serializers/deserializers cache previously retrieved schemas, so you do not nit the schema registry for every message -- doing so would kill performance.  So if you use Confluent's Avro serdes, this is not a problem.


> Will a schema associated with the ID ever change, or if it does a new schema_id
> is created for the new schema?

Once a schema is registered with the schema registry (which means it has an ID), then this schema will never change.  So whenever you need to update the schema to a newer version, you'd end up with registering a new schema.

Lastly, you might want to read through the docs (http://docs.confluent.io/3.0.0/schema-registry/docs/index.html) on how schemas are associated with Kafka topics (or "subjects", in the terminology of the schema registry).  For example, you could associate both the old schema and the new schema with a particular Kafka topic.

Hope this helps,
Michael



On Tue, Jun 21, 2016 at 4:52 PM, <dk.h...@gmail.com> wrote:
Does the message go in kafka without the schema to reduce the message size?

For each and every message consumed I have to hit the schema registry to fetch the schema?

Is it possible to configure some kind of cache?

Will a schema associated with the ID ever change, or if it does a new schema_id is created for the new schema?

Thanks!

-Derek

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages