Schema Registry with union types?

2,974 views
Skip to first unread message

Andrew Otto

unread,
Apr 2, 2015, 12:59:48 PM4/2/15
to confluent...@googlegroups.com
Do union types work with the schema registry and/or the rest proxy?

I’d like to use a schema that has optional values, using Avro’s union type with a null. E.g

{
"type": "record",
"name": "ChangeEventSmall",
"fields" : [
{"name": "id", "type": ["null", "long"] }
]
}

The Schema Registry accepts this if I register as a schema, but, when trying to produce via the rest proxy using this schema and a single record of {"id": 1}, I get

{"error_code":42203,"message":"Conversion of JSON to Avro failed.”}

However, if I use the exact same schema, but use a simple “long” type rather than the union, the same produce requests succeeds.

I also tried the same test with a types of [“double”, “long”] and [“int”, “long”], and got the same results (Schema Registry accepts the schema, but I cannot produce with it).

If I try to produce via the rest proxy AND specify a value_schema (instead of an already registered value_schema_id) with the schema that has a union type, I get the same "Conversion of JSON to Avro failed.” error.

Maybe the Kafka Rest Proxy doesn’t know how to convert JSON records to Avro schemas with union types?

Thanks,
-Ao

Andrew Otto

unread,
Apr 2, 2015, 1:14:14 PM4/2/15
to confluent...@googlegroups.com
Hm, and also, what about nested records? The schema registry doesn’t seem to like those either? (Maybe I’m just doing them wrong).

Ewen Cheslack-Postava

unread,
Apr 2, 2015, 1:16:04 PM4/2/15
to confluent...@googlegroups.com
For union types, Avro's encoding requires specifying a type. See https://groups.google.com/d/msg/confluent-platform/Bigk_FeTXh0/B77_zCRhqUoJ for a previous discussion about this.

For nested records, can you give an example that isn't working?

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/F6720C91-D1CE-4483-9826-FD87D41F930B%40wikimedia.org.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

Andrew Otto

unread,
Apr 2, 2015, 1:36:40 PM4/2/15
to confluent...@googlegroups.com
The full Avro schema I’m currently trying to use is pasted below.  However, I’m starting smaller, trying to work up to getting this big one working.  I likely don’t need everything to be optional, but I’m just starting with this as a test case.  I’ve tried the nested values without any of the unions, and get invalid schemas.  I’m probably just doing something wrong.  If nested schemas are supported, don’t worry about me yet, I will probably get it.  If I really get stuck on that I’ll ask again.

As for the unions: maybe I had been doing optional fields wrong.  If I have “type": [“null”, “string”], do I also  need “default": “null” in order to not have to specify the field in the record?  I am trying this now…

-Ao





{
    "namespace": "org.wikimedia.mediawiki",
    "type": "record",
    "name": "ChangeEvent",
    "fields" : [
        {"name": "id", "type": ["null", "long"]},
        {"name": "type", "type": ["null", "string"]},
        {"name": "namespace", "type": ["null", "string"]},
        {"name": "title", "type": ["null", "string"]},
        {"name": "comment", "type": ["null", "string"]},
        {"name": "timestamp", "type": "long" },
        {"name": "user", "type": ["null", "string"]},
        {"name": "bot", "type": ["null", "string"]},
        {"name": "server_url", "type": ["null", "string"]},
        {"name": "server_name", "type": ["null", "string"]},
        {"name": "server_script_path", "type": ["null", "string"]},
        {"name": "minor", "type": ["null", "boolean"]},
        {"name": "patrolled", "type": ["null", "boolean"]},
        {"name": "log_id", "type": ["null", "long"]},
        {"name": "log_type", "type": ["null", "string"]},
        {"name": "log_action", "type": ["null", "string"]},
        {"name": "log_action_commit", "type": ["null", "string"]},
        {"name": "log_params", "type": ["null", {
                "type": "map",
                "values": ["null", "string"]
            }]
        },
        {
            "name": "length",
            "type": ["null", {
                "name": "ChangeEventLengths",
                "type": "record",
                "fields": [
                    {"name": "old", "type": ["null", "long"] },
                    {"name": "new", "type": ["null", "long"] }
                ]
            }]
        },
        {
            "name": "revision",
            "type": ["null", {
                "name": "ChangeEventRevisions",
                "type": "record",
                "fields": [
                    {"name": "old", "type": ["null", "long"] },
                    {"name": "new", "type": ["null", "long"] }
                ]
            }]
        }
    ]
}




Roger Hoover

unread,
Apr 2, 2015, 1:51:46 PM4/2/15
to confluent...@googlegroups.com
For the special case of a union of null and one other type, it would be really handy to wrap/augment the JSON parser to not require the type name.  This is a very common gotcha with Avro JSON that seems like it would be simple to solve.

Sent from my iPhone

Andrew Otto

unread,
Apr 2, 2015, 1:56:53 PM4/2/15
to confluent...@googlegroups.com
Ok, even with “default”, none of this is working.


Let’s say I have a 2 field schema, a mandatory “id” long field, and an optional “title” string field.  None of the following seem to work:


  {"name": “title", "type":  ["null", “string"] }
  {"name": “title", "type":  ["null", “string”], “default”: “null” }
  {"name": “title", "type":  “string”, “default”: “” }


That is, I’d like to be able to produce a record with only the “id” included, and get a default filled in for the unspecified “title”.  I should be able to produce to this schema with the record

  { “id”: 1 }

What is the proper way for defining the optional string title field?  

-Ao





Andrew Otto

unread,
Apr 2, 2015, 2:16:05 PM4/2/15
to confluent...@googlegroups.com
(BTW, nested records are working just fine, they just weren’t with the weird unions I was trying.)

Roger Hoover

unread,
Apr 2, 2015, 2:32:58 PM4/2/15
to confluent...@googlegroups.com
Andrew,

I think that Avro only applies defaults on read when the readers schema is different from the writers schema.  Otherwise, it assumes that all fields in the schema should have been present on write.


Cheers,

Roger

Ewen Cheslack-Postava

unread,
Apr 2, 2015, 4:01:15 PM4/2/15
to confluent...@googlegroups.com
Andrew,

I think the problem you're having with union types is due to the encoding I mentioned earlier. If you have a schema with a field like this:


{"name": “title", "type":  ["null", “string"] }

You'll need to encode the value to explicitly specify the type (except for null):

{ "title": { "string": "value" } }

This is because the REST proxy just uses Avro's JSON serialization: http://avro.apache.org/docs/current/spec.html#json_encoding I don't really like this requirement, especially when the types can't be ambiguous (as in this case), but it keeps everything consistent in the cases where the types can be ambiguous.

-Ewen


For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

Andrew Otto

unread,
Apr 2, 2015, 4:40:01 PM4/2/15
to confluent...@googlegroups.com
I get what you are saying, but doesn’t that defeat the point of having an optional value?  How would this work with schema evolution?

Say I needed to add a new field to a schema, but want to still be able to read old data with it, that doesn’t have any records with that field.  Or, say I wanted to add a field to a schema, but there were producers out there running old code, and didn’t produce records with that field.

Shouldn’t I be able to produce records without fields that have default values defined in the schema?
Thanks!
-AO


Jun Rao

unread,
Apr 2, 2015, 6:48:38 PM4/2/15
to confluent...@googlegroups.com
Andrew,

A producer needs to provide a schema (or schema id). The requirement is that the json avro data that the producer sends has to match the provided schema. If the producer changes the schema, it needs to change the json format as well. However, it's possible for two producers to produce json with different schemas at the same time.

Thanks,

Jun

Andrew Otto

unread,
Apr 3, 2015, 11:23:26 AM4/3/15
to confluent...@googlegroups.com
Hm, interesting.  So, the error I’m getting isn’t because Avro wouldn’t allow what I’m doing, but because Kafka-Rest enforces JSON conversion to Avro more strictly than Avro would?  Or, is this an issue with the Avro-JSON encoding needing all fields even if some of those fields are technically optional?

That is, I could still use an Avro schema with an optional field to read Avro data that didn’t have that field defined, right?  

 However, it's possible for two producers to produce json with different schemas at the same time.
To the same subject, aye, as long as the schemas are compatible.  Hm, ok, I will try this.




Jun Rao

unread,
Apr 3, 2015, 12:49:01 PM4/3/15
to confluent...@googlegroups.com
Andrew,

The JSON formatting is an Avro issue. Basically, for a given Avro schema, the JSON representation has to match the Avro spec.

You brought up another issue. What's the JSON representation in the consumer? Currently, the consumer simply outputs the JSON according to the Avro (writer) schema associated with the message. So, if you have two versions of schemas in a topic, the JSON output in the consumer will have mixed format. 

It's probably reasonable to allow the consumer to bind the output according to a particular version of the Avro schema. Then all the JSON output will have the same format. Of course, this requires that the schemas are evolved in a compatible way (see http://confluent.io/docs/current/avro.html for details). I filed the following issue to track this.


Thanks,

Jun

Reply all
Reply to author
Forward
0 new messages