issue with confluent-3.0.1 Avro schema evolution

762 views
Skip to first unread message

Danielmi

unread,
Nov 10, 2016, 10:51:58 AM11/10/16
to Confluent Platform
hi , 

i am using confluent-3.0.1, and want to try the Avro schema evolution, but it seems that there is some issue when i am trying the backward compatibility with hive.

first i declared this schema, send some messages corresponding to it to kafka, queried hive and got all the data.

public static final String USER_SCHEMA = "{" +
        " \"namespace\": \"example.avro\",\n" +
        " \"type\": \"record\",\n" +
        " \"name\": \"user\",\n" +
        " \"fields\": [\n" +
        "     {\"name\": \"name\", \"type\": \"string\"},\n" +
        "     {\"name\": \"now\", \"type\": \"long\"},\n" +
        "     {\"name\": \"favorite_number\",  \"type\": \"int\"}" +
        " ]\n" +
        "}";

then i,upgraded the schema by adding new optional field "a"
, and send some messages to kafka.
public static final String USER_SCHEMA = "{" +
        " \"namespace\": \"example.avro\",\n" +
        " \"type\": \"record\",\n" +
        " \"name\": \"user\",\n" +
        " \"fields\": [\n" +
        "     {\"name\": \"name\", \"type\": \"string\"},\n" +
        "     {\"name\": \"now\", \"type\": \"long\"},\n" +
        "     {\"name\": \"favorite_number\",  \"type\": \"int\"}" +
        "     ,\n{\"name\": \"a\", \"type\": [\"null\",\"string\"], \"default\": null}" +
        " ]\n" +
        "}";

but when i queried hive, i got this exception:

" Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found example.avro.user, expecting example.avro.user, missing required field a"

it looks like it can't receive null value as default.i think it caused by the first rows that don't have the new field. (in hadoop all the looks fine).


it seems to have problems only with null values, because if i am adding this field to the schema (register new schema after the first one) i do manage to query hive after adding new field. and really getting the default value as i defined, for the first rows that written without this field

public static final String USER_SCHEMA = "{" +
        " \"namespace\": \"example.avro\",\n" +
        " \"type\": \"record\",\n" +
        " \"name\": \"user\",\n" +
        " \"fields\": [\n" +
        "     {\"name\": \"name\", \"type\": \"string\"},\n" +
        "     {\"name\": \"now\", \"type\": \"long\"},\n" +
        "     {\"name\": \"favorite_number\",  \"type\": \"int\"}" +
        "     ,\n{\"name\": \"favorite_color\", \"type\": \"string\", \"default\": \"green\"}" +
        " ]\n" +
        "}";

do you have any idea what could be the problem ?
 i want to keep the option not write all the fields from the producer, that is use optional fields with string.
thanks !

Ewen Cheslack-Postava

unread,
Nov 12, 2016, 12:18:28 AM11/12/16
to Confluent Platform
Are you using the HDFS connector or delivering data into HDFS & Hive via some other mechanism? It sounds like you might have a single file with multiple schemas since it looks like Hive is trying to deserialize some messages with the wrong schema.

If you're using the HDFS connector, files should get rolled automatically as needed when a schema change occurs. If you're not, make sure you close out a file and start a new one (and add relevant schema info to the Hive metastore) when the schema changes.

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/508ef6ca-d113-4d2b-bec1-3fd4c1cc5039%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

Danielmi

unread,
Nov 13, 2016, 5:42:30 AM11/13/16
to Confluent Platform
hi, 

i am using the HDFS connector.
the problem seems to happen only if i am using a field with optional null value.

like {\"name\": \"a\", \"type\": [\"null\",\"string\"], \"default\": null} 



To post to this group, send email to confluent...@googlegroups.com.



--
Thanks,
Ewen

Danielmi

unread,
Nov 28, 2016, 4:03:06 AM11/28/16
to Confluent Platform
anyone? please.

it looks like it happens only with filed who has null as the default.
it works perfectly with non null default values.

is there anyone who managed to make it work ?

thanks!

Ewen Cheslack-Postava

unread,
Nov 29, 2016, 4:18:24 PM11/29/16
to Confluent Platform
It seems like this *could* be related to https://github.com/confluentinc/schema-registry/issues/267 but those issues were already fixed for 3.0.1. So my guess is that there's another code path that isn't translating the union schema properly. Another possibility is that it's an issue with the Hive metastore integration, although that just uses the same schema/data conversion code.

I think the next steps would be to test AvroData conversions for schemas round trip (from Avro -> Connect -> Avro) because this is ultimately the path that schema is taking and seems to be losing the union type. It might also be helpful to check what the registered schema is in Hive to see if there are any other unexpected differences.

-Ewen

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.



--
Thanks,
Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/ce3e0253-6cf0-43f4-9609-bde370d2e358%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen
Reply all
Reply to author
Forward
0 new messages