Hi,
We have Json data in protobuf format. I was trying to create an external table on top of the Protobuf files of Json data in HDFS following the
https://github.com/kevinweil/elephant-bird/wiki/How-to-use-Elephant-Bird-with-Hive – Reading Protocol Buffers.
I added the following jars : (Also included them in auxpath)
add jar /homes/xyz/elephant-bird-hive-3.0.5.jar;
add jar /homes/xyz/impression.jar;
add jar /homes/xyz/elephant-bird-core-3.0.5.jar;
add jar /homes/xyz/Impproto.jar;
CREATE EXTERNAL TABLE EXT_IMP_PROTO
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'com.twitter.elephantbird.hive.serde.ProtobufDeserializer'
with serdeproperties (
"serialization.class"="org.myorg.ImpProto.Data")
LOCATION '/user/xyz/data/';
ALTER TABLE EXT_IMP_PROTO ADD PARTITION(dt='
2014032504') LOCATION '/user/xyz/data/
2014032504/';
The table and partition got created.
Describe EXT_IMP_PROTO ;
Is giving the expected schema for the table as per the proto class.
But when I try to do a select * from EXT_IMP_PROTO where dt='
2014032504' or select userid from EXT_IMP_PROTO where dt='
2014032504';
I get the following error :
Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BytesWritable
Time taken: 1.656 secondsI also tried altering the table to add inputformat and outformat as com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat and org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat respectively. But still I am getting the ClassCastException.
Is there something that I am missing here ?
It will be great if someone can tell me why I am getting this exception.
Regards
Narayan