Avro Schema Repo Question

172 views
Skip to first unread message

Jonathan Hodges

unread,
Mar 8, 2014, 1:18:20 PM3/8/14
to camu...@googlegroups.com
Hi,

We just setup a REST schema repo (AVRO-1124) to store all of our Avro schemas for Camus to reference.  We are curious how people are managing referenced schemas.  For instance say we have the following Date type.

{"name": "Date", 
  "namespace": "com.example.avro", 
  "type": "record", 
  "fields": [ { "name":"year", "type":"int" }, 
              { "name":"month", "type":"int" }, 
              { "name":"day", "type":"int" } ] } 

We might want to use this Date type in other schemas.  Reusing enum types are also another example.  This is managed with IDL and the maven plugin by specifying a directory for the schema files.  It would be nice if you could import a schema in JSON by specifying a URI.

So are we missing something?  Should we just embed types in all the schemas with some pre-processing step?

Any help would be great appreciated!

Jonathan

Félix GV

unread,
Mar 8, 2014, 1:44:07 PM3/8/14
to Jonathan Hodges, camu...@googlegroups.com
Hi Jon (:

This was discussed on the AVRO-1124 ticket.

The general consensus is that you should expand referenced schemas so that every schema in the repo is self-contained.

When/if you update a referenced schema, you will want to to update the ID used to reference any schema that includes the referenced schema, so it seems a lot simpler to require self-contained schemas across the board.

Otherwise, if you wanted to leave nested schemas as just references, you would need to include the ID for the desired version of the referenced schema. This makes everything more complex and isn't supported at the moment by the AVRO-1124 repo. Furthermore, this hypothetical approach would also require you to update every schema that contains a reference to an updated schema, just to bump up its ID, so it doesn't really alleviate any overhead work...

-F
--
You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
--
Félix

Jonathan Hodges

unread,
Mar 8, 2014, 1:55:44 PM3/8/14
to Félix GV, camu...@googlegroups.com
Thanks for the quick reply Felix.  Sorry we missed that explanation in AVRO-1124, but that makes sense.  As you say you can't really get around the update problem across schemas.
Reply all
Reply to author
Forward
0 new messages