Schema compatibility models

38 views
Skip to first unread message

Andy Chambers

unread,
Apr 4, 2017, 12:56:51 PM4/4/17
to Confluent Platform
There's a little section in the docs describing the different compatibility models supported by avro.


Has anyone given much thought to the different kinds of applications supported by each model. Traditionally
we've tried to just make everything backwards compatible but I recently noticed that the schema registry supports
per-topic compatibility models it seems like it might make sense to think carefully about what makes sense for each
topic rather than try to force a global standard.

I'm also a little confused about how using the readerSchema works. I haven't been able to find any docs on
this but I get the impression this is about "projecting" schemas. This seems like it might be useful to protect
a consumer from changes to an upstream changes as alluded to in the link above. That description
implies that this is only applicable with forward compatibility. Does that mean you should only use a readerSchema
when the topic is marked as forward compatible?

I'm thinking of a use-case where lets say we define a topic "foo-v1", then later on decide that "foo" needs an
incompatible change (e.g. a new required field), so we define "foo-v2". Lets also assume that foo has more than
one consumer, one of which doesn't actually care about the whole foo. It can do it's job on the subset of foo's
fields that remain "backwards compatible".

In this case, it seems like it might be possible to define a 3rd schema which includes only that subset of fields
and use this schema to deserialize messages from both topics. This consumer would not need to be updated
just because the upstream producer wanted to send along some additional required fields.

Cheers,
Andy

Tianxiang Xiong

unread,
Apr 6, 2017, 5:13:58 PM4/6/17
to Confluent Platform
*crickets*

Ewen Cheslack-Postava

unread,
Apr 7, 2017, 1:28:24 AM4/7/17
to Confluent Platform
On Thu, Apr 6, 2017 at 2:13 PM, 'Tianxiang Xiong' via Confluent Platform <confluent...@googlegroups.com> wrote:
*crickets*


On Tuesday, 4 April 2017 09:56:51 UTC-7, Andy Chambers wrote:
There's a little section in the docs describing the different compatibility models supported by avro.


Has anyone given much thought to the different kinds of applications supported by each model. Traditionally
we've tried to just make everything backwards compatible but I recently noticed that the schema registry supports
per-topic compatibility models it seems like it might make sense to think carefully about what makes sense for each
topic rather than try to force a global standard.

Backwards compatibility means you can read older data and use it with a newer schema. This lets you:
1. Store data with an older schema and still process it even if you've evolved your schema since then.
2. Update only some (or no) producer applications and consumer applications to use a newer format, but still have consumers process all data using the old format. Your consumers can upgrade to a newer schema before all your producer applications have been updated.

Forward compatibility means you can read newer data and use it with an older schema. This lets you:
1. Store data with the new schema and still process it even if the code processing it is still using the old schema.
2. Update producer applications to a new schema before all downstream consumers have been upgraded to the new schema.

Full compatibility combines these two. It's more restrictive than either of the two individually, but it requires no real coordination between upstream producers and downstream consumers since none of the changes that are possible would break schema projection.
 

I'm also a little confused about how using the readerSchema works. I haven't been able to find any docs on
this but I get the impression this is about "projecting" schemas. This seems like it might be useful to protect
a consumer from changes to an upstream changes as alluded to in the link above. That description
implies that this is only applicable with forward compatibility. Does that mean you should only use a readerSchema
when the topic is marked as forward compatible?

You're right, this is about projecting schemas. The reader schema is simply the "target" schema for the application. You can use it in different compatibility modes -- you may be projecting either forwards or backwards with respect to the versions of schemas for the topic. If you're using backwards compatibility, the reader schema should have a higher or equal version than the writer schema. If you're using forwards compatibility, the reader schema should have a lower or equal version than the writer schema. If you're using full compatibility, the reader schema can be any version.
 

I'm thinking of a use-case where lets say we define a topic "foo-v1", then later on decide that "foo" needs an
incompatible change (e.g. a new required field), so we define "foo-v2". Lets also assume that foo has more than
one consumer, one of which doesn't actually care about the whole foo. It can do it's job on the subset of foo's
fields that remain "backwards compatible".

In this case, it seems like it might be possible to define a 3rd schema which includes only that subset of fields
and use this schema to deserialize messages from both topics. This consumer would not need to be updated
just because the upstream producer wanted to send along some additional required fields.

Yeah, for an application this is definitely possible to do. In Avro, you always deserialize using the original schema and project (during the deserialization) to the target schema. So if your reader schema only contains a subset of the fields, then you'll actually only care about compatibility for those fields.

That said, going down this route means that you're defining schemas for applications and never registering them in the schema registry. Which means nobody else can reason about the schemas you're using and how they might impact you.

-Ewen
 

Cheers,
Andy

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/3dece3f2-5ad5-4093-8fad-72a11db0f902%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages