Hi,
I do have a weird error where using a connector gives me the error:
{"error_code":50002,"message":"Kafka error: java.io.CharConversionException: Invalid UTF-32 character 0x15020631(above 10ffff) at char #1, byte #7)"}
Now I am not sure what happens so here are the steps I am following. Note that I test this under vagrant + puppet, so all changes are tested under a brand new untainted box.
Some informations:
- server is centos 7
- client is debian
1) uploading schema:
curl
-X POST
-H "Content-Type: application/vnd.schemaregistry.v1+json"
--data @$AVSC.cf
http://$SERVER:8081/subjects/event-value/versions
Where $AVSC.cf is a standard avro schema, surrounded by {"schema": "..."} and with the " inside escaped. Output is
{"id":21}
2) sending data
curl -X POST
-H "Content-Type: application/vnd.kafka.avro.v1+json"
--data "$data"
http://$SERVER:8082/topics/$TOPIC
Where $data is a valid json data file, surrounded by "{\"value_schema_id\": $SCHEMAID, \"records\": [{\"value\": ${data}}]}". Ouput is:
{"offsets":[{"partition":0,"offset":0,"error_code":null,"error":null}],"key_schema_id":null,"value_schema_id":21}
I do see the offset increasing if I send more events so it's all looks fine.
3) creating a consumer
curl
-X POST
-H "Content-Type: application/vnd.kafka.v1+json"
--data "{\"name\": \"${CONSUMER}_instance\", \"format\": \"json\", \"auto.offset.reset\": \"smallest\"}"
http://$SERVER:8082/consumers/$CONSUMER
Nothing specific here, output is:
4) using the customer:
curl
-X GET
-H "Accept: application/vnd.kafka.json.v1+json"
http://$SERVER:8082/consumers/$CONSUMER/instances/${CONSUMER}_instance/topics/$TOPIC
This is were things go wrong. I get as output:
{"error_code":50002,"message":"Kafka error: java.io.CharConversionException: Invalid UTF-32 character 0x15020631(above 10ffff) at char #1, byte #7)"}
If I get a look on the server, the full message is:
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: [2016-02-04 09:27:50,510] INFO 172.28.128.1 - - [04/Feb/2016:09:27:49 +0000] "
GET /consumers/vreten/instances/vreten_instance/topics/events HTTP/1.1" 500 150 728 (io.confluent.rest-utils.requests:77)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: [2016-02-04 09:27:50,511] ERROR Unexpected exception in consumer read thread:
(io.confluent.kafkarest.ConsumerReadTask:153)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: org.apache.kafka.common.errors.SerializationException: java.io.CharConversionE
xception: Invalid UTF-32 character 0x15020631(above 10ffff) at char #1, byte #7)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: Caused by: java.io.CharConversionException: Invalid UTF-32 character 0x15020631(above 10ffff) at char #1, byte #7)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at com.fasterxml.jackson.core.io.UTF32Reader.reportInvalid(UTF32Reader.java:189)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at com.fasterxml.jackson.core.io.UTF32Reader.read(UTF32Reader.java:150)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.loadMore(ReaderBasedJsonParser.java:153)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:1854)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:571)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3604)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3549)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2673)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at io.confluent.kafkarest.JsonConsumerState.deserialize(JsonConsumerState.java:76)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at io.confluent.kafkarest.JsonConsumerState.createConsumerRecord(JsonConsumerState.java:66)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at io.confluent.kafkarest.ConsumerReadTask.doPartialRead(ConsumerReadTask.java:118)
Feb 04 09:27:50 confluent.wp.local kafka-rest-start[6927]: at io.confluent.kafkarest.ConsumerWorker.run(ConsumerWorker.java:90)
I did try different variations of LC_ALL and LANG (namely the 4 combinations of C and en_US.UTF-8) on the confluent server and my client, it did no make any difference.
The schema and data file I use are completely fine when used with the avro tools:
java -jar ./avro-tools-1.7.7.jar fromjson --schema-file $AVSC $DATA > $DATA.avro
java -jar ./avro-tools-1.7.7.jar tojson $DATA.avro
Outputs the data back properly, no matter what LC_ALL is set to.
According to me the data I send is correct, pure ASCII so I do not understand what is going on. I could not find any help during my searches either.
Any hint would be appreciated!
Guillaume