Schema registry support for aliases

1,109 views
Skip to first unread message

CJ Woolard

unread,
Jul 30, 2015, 10:57:02 PM7/30/15
to Confluent Platform
Hello,

I was wondering if the following is possibly a bug with the schema registry, or expected behavior. As a test around schema evolution I am posting a schema (below), followed by an updated version of the schema where I've renamed a field and given it an alias for the old/original field name (i.e. I've renamed field 'foo' to 'newFoo' with an associated alias). 

curl -X POST -i -H "Content-Type: application/vnd.schemaregistry.v1+json" \
    --data '{"schema": "{\"type\":\"record\",\"name\":\"EvolutionTest\",\"namespace\":\"myNamespace\",\"fields\":[{\"name\":\"foo\",\"type\":\"string\"}]}"}' \

curl -X POST -i -H "Content-Type: application/vnd.schemaregistry.v1+json" \
    --data '{"schema": "{\"type\":\"record\",\"name\":\"EvolutionTest\",\"namespace\":\"myNamespace\",\"fields\":[{\"name\":\"newFoo\",\"type\":\"string\",\"aliases\":[\"foo\"]}]}"}' \


Which returns the following:

HTTP/1.1 200 OK
Content-Length: 8
Content-Type: application/vnd.schemaregistry.v1+json
Server: Jetty(8.1.16.v20140903)

{"id":1}HTTP/1.1 409 Conflict
Content-Length: 93
Content-Type: application/vnd.schemaregistry.v1+json
Server: Jetty(8.1.16.v20140903)

{"error_code":409,"message":"Schema being registered is incompatible with the latest schema"}

Note this is with the default config of "Backward" compatibility in the schema registry. However using the same two schemas I tested with the native Avro SchemaCompatibility class (https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html), and it returned a response of compatible. I was wondering if this was expected behavior or not.


Thanks in advance.
CJ

CJ Woolard

unread,
Aug 2, 2015, 9:25:26 PM8/2/15
to Confluent Platform
As a quick follow up, I read through the code for the schema registry, and noticed that there's an explicit test case asserting that changing a field name with an alias is not considered a backward compatible change: https://github.com/confluentinc/schema-registry/blob/master/core/src/test/java/io/confluent/kafka/schemaregistry/avro/AvroCompatibilityTest.java#L73. This appears (at least to me) to be somewhat in conflict with https://avro.apache.org/docs/1.7.7/spec.html#Aliases. I was wondering if someone could please comment as to whether or not renaming a field (with an alias) is intended to be a backward compatible change. (We are currently evaluating Avro along with the Confluent schema registry, and support for schema evolution was one of the primary factors for considering this approach).

Thanks again.
CJ

Geoffrey Anderson

unread,
Aug 3, 2015, 8:19:56 PM8/3/15
to confluent...@googlegroups.com
Hi CJ,

Thanks for your question, I'll do my best to clarify why this isn't supported today.

First off, I should point out that in the spec you linked to (https://avro.apache.org/docs/1.7.7/spec.html#Aliases), it states that the Avro implementations may **optionally** use aliases to map a writer's schema to a reader's schema, so this is not required behavior, and Apache Avro's own compatibility checks do not support this.

In fact, the compatibility checks within the schema registry directly uses Apache Avro's compatibility checks under the hood (e.g. https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/avro/AvroCompatibilityChecker.java#L31)

One issue with supporting aliases is the question of whether all of the downstream consumers support this too. If we did add alias support with respect to compatibility, we would want to be sure that all potential down stream systems supported aliases as well, e.g., in Hive. It would be bad if "backward compatible" schema evolution was backward compatible for only some subset of downstream consumers.

Another tricky issue is that aliases can potentially break 'transitive compatibility'. Let's say we register schema A, and then B, and then C with schemas that look sort of like (pseudo-avro)

A - {name: f1}
B - {name: f2, aliases: [f1]}
C - {name: f2}

Notice that if we support aliases, each incremental schema update would be backward compatible. However, we have the unfortunate situation that C can read B, B can read A, but C cannot be used to read data written with A! This is bad because we would expect backward compatible guarantees to hold across schema evolution.

So, in summary:

1. According to spec, Avro implementations may *optionally* support this
2. Apache Avro compatibility checks do not actually support this
3. We just use Apache Avro's compatibility checks under the hood
4. There are some other considerations to think about including other systems' support of this feature, as well as potential issues raised by 'transitivity of compatibility'.

Hope this helps!
Thanks,
Geoff










--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/6ed4963f-a6da-4104-96d8-bc5a9adb4583%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Félix GV

unread,
Aug 3, 2015, 9:02:25 PM8/3/15
to confluent...@googlegroups.com
Hi Geoff,

It seems like the current schema registry code only ever validates a new schema against the current latest, not against all historical schema. So it seems like there is no guarantee that C can be used to read A, even without aliases in the picture.

Is that intentional or an oversight?

--
Felix GV
Senior Software Engineer
Data Infrastructure
LinkedIn
 
f...@linkedin.com
linkedin.com/in/felixgv



For more options, visit https://groups.google.com/d/optout.




--
--
Félix

Geoffrey Anderson

unread,
Aug 4, 2015, 10:00:59 PM8/4/15
to confluent...@googlegroups.com
Hi Félix,

That's a great question and one which we did think about. 

One question to ask is whether compatibility as defined in the Avro spec  is transitive. I.e. if C reads B, and B reads A, can C read A? 
The answer is generally yes (), with two edge cases that we're aware of:
- first with aliases, when an alias is removed (this doesn't come up in our implementation as discussed earlier in the thread)
- second, with default values, when a default value is removed (documented here: https://github.com/granders/avro-experiment/blob/master/src/main/java/geoff/AvroThingy.java)

I've added a tracker issue for the second case (https://github.com/confluentinc/schema-registry/issues/209)

Ok, so there are edge cases... so why not just check down the whole line of schema versions and eliminate this possibility altogether? 
The use case we considered here is, what if for some reason a user wants to force a new incompatible schema?
- If compatibility checks are incremental, we'll just have introduced a new chain of compatible schemas starting at the force point.
- If compatibility checks go all the way down the line, you'd have to drop compatibility requirements altogether to continue with schema evolution after the force point.

Thanks,
Geoff


Reply all
Reply to author
Forward
0 new messages